WO2023056730A1 - Video image augmentation method, network training method, electronic device and storage medium - Google Patents

Video image augmentation method, network training method, electronic device and storage medium Download PDF

Info

Publication number
WO2023056730A1
WO2023056730A1 PCT/CN2022/081245 CN2022081245W WO2023056730A1 WO 2023056730 A1 WO2023056730 A1 WO 2023056730A1 CN 2022081245 W CN2022081245 W CN 2022081245W WO 2023056730 A1 WO2023056730 A1 WO 2023056730A1
Authority
WO
WIPO (PCT)
Prior art keywords
video image
short
exposure
frame
network
Prior art date
Application number
PCT/CN2022/081245
Other languages
French (fr)
Chinese (zh)
Inventor
游晶
徐科
孔德辉
艾吉松
刘欣
任聪
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2023056730A1 publication Critical patent/WO2023056730A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/60
    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • Embodiments of the present disclosure relate to but are not limited to the technical field of video image processing, in particular, to a video image enhancement method, a network training method, electronic equipment, and a computer-readable storage medium.
  • Night scene video image enhancement algorithm based on neural network is one of the main research directions at present.
  • these algorithms mainly focus on one of the directions such as denoising, brightness enhancement, and color restoration, and often cannot take into account the problems of denoising and avoiding distortion at the same time, so the enhancement effect is poor.
  • Embodiments of the present disclosure provide a video image enhancement method, a network training method, electronic equipment, and a computer-readable storage medium.
  • an embodiment of the present disclosure provides a video image enhancement method, including: using the first network to extract the first feature of the i-th frame of the short-exposure video image; wherein, i is sequentially selected from 0, 1, 2, 3, The variable of ...; adopt the second network to denoise the (i-N)th frame short exposure video image to the (i+N) frame short exposure video image to obtain the denoising corresponding to the i frame short exposure video image A video image; wherein, N is a constant greater than or equal to 1; the first feature of the i-th frame short-exposure video image and the denoised video image corresponding to the i-th frame short-exposure video image are subjected to the first The fusion process obtains the enhanced video image corresponding to the short-exposure video image of the i-th frame.
  • an embodiment of the present disclosure provides a network training method, including: for each training sample, using any of the video image enhancement methods described above to obtain an enhanced video image corresponding to the training sample; wherein the training sample includes : (2N+1) frame short exposure video images; determine the total objective function value according to the enhanced video image corresponding to the training sample and the long exposure video image, update the training parameter value according to the total objective function value; for each For training samples, according to the updated training parameter values, use any one of the above video image enhancement methods to obtain enhanced video images corresponding to the training samples until the total objective function value converges.
  • an embodiment of the present disclosure provides an electronic device, including: at least one processor; a memory, at least one program is stored in the memory, and when the at least one program is executed by the at least one processor, any of the above-mentioned A video image enhancement method, or any one of the above-mentioned network training methods.
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned video image enhancement methods, or any of the above-mentioned video image enhancement methods, can be realized.
  • a network training method any one of the above-mentioned video image enhancement methods, or any of the above-mentioned video image enhancement methods, can be realized.
  • the video image enhancement method provided by the embodiment of the present disclosure realizes the feature extraction and denoising processing of the short-exposure video image respectively through two different branches, and then fuses the processing results of the two branches to obtain the enhanced video image , while denoising is taken into account, distortion is avoided through feature extraction, and the enhancement effect is improved.
  • the network training method provided by the embodiment of the present disclosure performs training on the above video image enhancement method to obtain an optimal training parameter value, which further improves the enhancement effect.
  • FIG. 1 is a flowchart of a video image enhancement method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a realizable first network provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of another realizable first network provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a realizable second network provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a realizable codec sub-module provided by an embodiment of the present disclosure
  • Fig. 6 is a flowchart of a network training method provided by another embodiment of the present disclosure.
  • Fig. 1 is a flowchart of a video image enhancement method provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a video image enhancement method, including:
  • Step 100 using the first network to extract the first feature of the i-th frame of the short-exposure video image; wherein, i is a variable sequentially selected from 0, 1, 2, 3, . . . .
  • the i-th frame of short-exposure video image is a video image in a sequence of video images in Bayer format obtained from a CMOS (Complementary Metal Oxide Semiconductor) sensor, and each frame of short-exposure video image corresponds to a frame length Expose the image.
  • CMOS Complementary Metal Oxide Semiconductor
  • the first network may include: a first subnetwork, a second subnetwork, and a fusion submodule.
  • the fusion sub-module is configured to implement the second fusion process.
  • the first network may further include: a down-sampling sub-module and an up-sampling sub-module submodule.
  • the down-sampling sub-module is configured to realize down-sampling processing
  • the up-sampling sub-module is configured to realize up-sampling processing.
  • using the first network to extract the first feature of the i-th frame of short-exposure video image includes: using The first sub-network extracts the second feature of the i-th frame short-exposure video image, and uses the second sub-network to extract the third feature of the i-th frame short-exposure video image; the second feature of the i-th frame short-exposure video image and the ith The second fusion process is performed on the third feature of the frame of short-exposure video image to obtain the first feature of the i-th frame of short-exposure video image.
  • the second feature may be a local feature
  • the third feature may be a global feature
  • the local feature refers to local information of the short-exposure video image, such as local maximum value, local minimum value, etc., which can reflect information of local details of the video image.
  • the global feature refers to global information of the short-exposure video image, such as brightness histogram distribution information, color information, etc., at least one of which can perform brightness adjustment and color correction.
  • the first network when the first network includes, besides the first sub-network, the second sub-network and the fusion sub-module: a down-sampling sub-module and an up-sampling sub-module, the first network is used to extract the i-th
  • the first feature of the frame short-exposure video image includes: performing down-sampling processing on the i-th frame short-exposure video image to obtain a down-sampled video image corresponding to the i-th frame short-exposure video image; using the first sub-network to extract the i-th frame short-exposure video image
  • the fourth feature of the down-sampled video image corresponding to the video image; the second sub-network is used to extract the fifth feature of the down-sampled video image corresponding to the i-th frame short-exposure video image;
  • the fourth feature and the fifth feature of the sampled video image are subjected to the second fusion process to obtain the sixth feature of the down-sampled video image corresponding to the
  • the fourth feature may be a local feature
  • the fifth feature may be a global feature
  • the first subnetwork includes: N1 first convolutional layers; wherein, N1 is an integer greater than or equal to 3, and the first convolutional layer is configured to implement a first convolutional operation.
  • the first sub-network may be L-NET.
  • the second sub-network includes: N2 second convolutional layers and 3 fully connected layers (FC, Fully Connected layers); wherein, N2 is an integer greater than or equal to 3, and the second convolutional layer
  • the layer is configured to implement the second convolution operation
  • the FC is configured to implement the FC operation.
  • the second sub-network can be G-NET.
  • the 3 FCs in the second sub-network are located after the N2 second convolutional layers.
  • each neuron in the FC is fully connected to all neurons in the previous layer.
  • FC can use the receptive field covering the entire short-exposure video image to learn global features.
  • the down-sampling sub-module may be implemented by any operation or network that can realize the down-sampling function. For example, any one of the convolution operation with a step size of 2, the conversion (S2D, Space to Depth) operation from the space dimension to the depth dimension, and the pooling operation.
  • the downsampling submodule includes: N3 third convolutional layers and N4 first pooling layers; wherein, N3 and N4 are integers greater than or equal to 1, and the third convolutional layer is configured to implement the third convolutional layer product operation, the first pooling layer is configured to implement the first pooling operation.
  • the upsampling submodule can be implemented by any operation or network that can realize the upsampling function. For example, any one of deconvolution operations, depth-to-space conversion (D2S, Depth to Space) operations, interpolation operations, etc.
  • D2S depth-to-space conversion
  • interpolation operations etc.
  • the downsampling multiple of the downsampling process can be set according to actual needs.
  • the upsampling process is the inverse of the downsampling process to restore the original resolution.
  • the size of the short-exposure video image of the i-th frame is H ⁇ W ⁇ 1
  • H is the height
  • W is the width
  • 1 is the number of channels
  • the short-exposure video image of the i-th frame is down-sampled to a size of 256 ⁇ 256 ⁇ 1
  • the video image, then the fourth feature with a size of 256 ⁇ 256 ⁇ 1 is output through the first sub-network, and the fifth feature with a size of 256 ⁇ 256 ⁇ 1 is output through the second sub-network, and then the fourth feature is combined through the fusion sub-module Perform the second fusion process with the fifth feature, and finally send it to the upsampling sub-module to restore the first feature with a size of H ⁇ W ⁇ 1.
  • the fusion sub-module may employ any operation or network that implements the fusion function.
  • the corresponding pixel addition operation for example, the corresponding pixel addition operation, cascade operation, convolution operation, etc.
  • Step 101 using the second network to perform denoising processing on the short-exposure video image from the (i-N) frame to the short-exposure video image in the (i+N) frame to obtain a denoised video image corresponding to the short-exposure video image of the i-th frame;
  • N is a constant greater than or equal to 1.
  • the short-exposure video image from the (i-N)th frame to the -1-th frame short-exposure video image is the same as the 0th frame short-exposure video image; when i is greater than M and N
  • the (M+1) frame short exposure video image to the (i+N) frame short exposure video image is the same as the last frame short exposure video image, and M is the total number of frames of the short exposure video image.
  • the value of N may be set according to actual needs. The larger the value of N is, the higher the complexity of the alignment operation is.
  • the second network includes: an alignment submodule, a codec submodule, and an output submodule.
  • the alignment submodule is configured to implement an alignment operation
  • the codec submodule is configured to implement encoding and decoding processing
  • the output submodule is configured to implement output processing.
  • the alignment submodule may include: N5 fourth convolutional layers and N6 first residual connection layers; wherein, N5 and N6 are integers greater than or equal to 1, and the fourth convolutional layer is configured to implement the fourth convolutional layer product operation, the first residual connection layer is configured to implement the first residual connection operation.
  • the alignment sub-module can also adopt all other operations or network implementations that can realize the alignment function of multi-frame video images, such as optical flow network, deformable (Deformable) convolutional network, motion estimation and motion compensation (MEMC, Motion Estimate and Motion Compensation) network and other implementations.
  • the second network is used to denoise the (i-N) frame short exposure video image to the (i+N) frame short exposure video image to obtain the denoising corresponding to the i frame short exposure video image
  • the final video image includes: aligning the (i-N) frame short-exposure video image to the (i+N) frame short-exposure video image to obtain the aligned video image; performing encoding and decoding processing on the aligned video image Obtaining the codec-processed video image; performing output processing on the codec-processed video image to obtain a denoised video image corresponding to the i-th frame of the short-exposure video image.
  • the codec submodule may include: a first encoding submodule, a second encoding submodule, a third encoding submodule, a fourth encoding submodule, a second output submodule, a An F submodule, a second F submodule, a third F submodule, a fourth F submodule, a first decoding submodule, a second decoding submodule, a third decoding submodule, and a fourth decoding submodule.
  • the codec sub-module can also be realized by using any codec structure network that can realize the image denoising function, such as U-shaped network (UNET, U-shape Network), convolutional blind denoising network (CBDNet, Convolutional Blind Denoising Network) and so on.
  • U-shaped network U-shaped network
  • CBDNet convolutional blind denoising network
  • the first encoding submodule is configured to implement the first encoding process
  • the second encoding submodule is configured to implement the second encoding process
  • the third encoding submodule is configured to implement the third encoding process
  • the fourth encoding submodule is configured to implement the second encoding process.
  • the second output submodule is configured to implement a second output process
  • the first F submodule is configured to implement a first F operation
  • the second F submodule is configured to implement a second F operation
  • the third F submodule is configured to implement the third F operation
  • the fourth F submodule is configured to implement the fourth F operation
  • the first decoding submodule is configured to implement the first decoding process
  • the second decoding submodule is configured to The second decoding process is implemented
  • the third decoding sub-module is configured to implement the third decoding process
  • the fourth decoding sub-module is configured to implement the fourth decoding process.
  • the first encoding sub-module includes: N7 fifth convolutional layers, N8 second residual connection layers and 1 second pooling layer; where N7, N8 are integers greater than or equal to 1, the fifth volume
  • the product layer is configured to implement a fifth convolution operation
  • the second residual connection layer is configured to implement a second residual connection operation
  • the second pooling layer is configured to implement a second pooling operation
  • the second encoding submodule includes : N9 sixth convolutional layers, N10 third residual connection layers and 1 third pooling layer; wherein, N9, N10 are integers greater than or equal to 1, and the sixth convolutional layer is configured to implement the sixth Convolution operation
  • the third residual connection layer is configured to implement the third residual connection operation
  • the third pooling layer is configured to implement the third pooling operation
  • the third coding sub-module includes: N11 seventh convolutional layers , N12 fourth residual connection layers and 1 fourth pooling layer; wherein, N11, N12 are integers greater than or equal to 1, the seventh convolutional
  • the first decoding submodule includes: N15 ninth convolutional layers, N16 sixth residual connection layers and 1 first deconvolution layer; wherein, N15 and N16 are integers greater than or equal to 1, and the ninth The convolution layer is configured to implement the ninth convolution operation, the sixth residual connection layer is configured to implement the sixth residual connection operation, the first deconvolution layer is configured to implement the first deconvolution operation; the second decoding The sub-module includes: N17 tenth convolutional layers, N18 seventh residual connection layers and one second deconvolutional layer; wherein, N17 and N18 are integers greater than or equal to 1, and the tenth convolutional layer is configured To realize the tenth convolution operation, the seventh residual connection layer is configured to realize the seventh residual connection operation, and the second deconvolution layer is configured to realize the second deconvolution operation; the third decoding submodule includes: N19 An eleventh convolutional layer, N20 eighth residual connection layers, and a third deconvolutional layer; wherein, N19, N20 are integers greater
  • the first F sub-module, the second F sub-module, the third F sub-module, and the fourth F sub-module can be implemented by any operation that can realize interconnection.
  • the first F sub-module, the second F sub-module, the third F sub-module, and the fourth F sub-module can be implemented by any one of addition (add) operation, combination (concate) operation, convolution operation, etc. .
  • performing codec processing on the aligned video image to obtain the encoded video image includes: performing a first encoding process on the aligned video image to obtain the first encoded video image A video image; wherein, the first encoding process includes: a fifth convolution operation, a second residual connection operation, and a second pooling operation; performing a second encoding process on the video image after the first encoding process to obtain the first The video image after the second encoding process; wherein, the second encoding process includes: the sixth convolution operation, the third residual connection operation, and the third pooling operation; performing the second encoding process on the video image after the second encoding process
  • the third encoding process obtains the video image after the third encoding process; wherein, the third encoding process includes: the seventh convolution operation, the fourth residual connection operation, and the fourth pooling operation; after the third encoding process
  • the video image is subjected to the fourth encoding process to obtain the video image after the fourth encoding process; wherein,
  • the resolution of the video image after the first encoding process is half of the resolution of the video image after the alignment operation; the resolution of the video image after the second encoding process is half of the resolution of the video image after the first encoding process Half of the resolution of the video image; the resolution of the video image after the third encoding process is half of the resolution of the video image after the second encoding process; the resolution of the video image after the fourth encoding process is the third encoding process Half the resolution of the video image after.
  • the resolution of the video image after the first decoding process is twice the resolution of the video image after the fourth encoding process
  • the resolution of the video image after the second decoding process is the first decoding process 2 times of the resolution of the video image after processing
  • the resolution of the video image after the third decoding process is 2 times of the resolution of the video image after the second decoding process
  • the resolution of the video image after the fourth decoding process It is twice the resolution of the video image after the third decoding process.
  • the size of the video image after the alignment operation is H ⁇ W ⁇ (2N+1), H is the height, W is the width, and 2N+1 is the number of channels
  • the size of the video image before the second pooling operation is H ⁇ W ⁇ 32
  • the size of the video image after the first encoding process is H/2 ⁇ W/2 ⁇ 128
  • the size of the video image before the third pooling operation is H/2 ⁇ W/2 ⁇ 64
  • the second The size of the encoded video image is H/4 ⁇ W/4 ⁇ 256
  • the size of the video image before the fourth pooling operation is H/4 ⁇ W/4 ⁇ 128, and the size of the third encoded video image is
  • the size is H/8 ⁇ W/8 ⁇ 512
  • the size of the video image before the fifth pooling operation is H/8 ⁇ W/8 ⁇ 256
  • the size of the video image after the fourth encoding process is H/16 ⁇ W /16 ⁇ 1024.
  • the size of the video image before the first deconvolution operation is H/16 ⁇ W/16 ⁇ 512, the size of the video image after the first decoding process is H/8 ⁇ W/8 ⁇ 256, and the size of the video image after the first F operation
  • the size of the video image is H/8 ⁇ W/8 ⁇ 512;
  • the size of the video image before the third deconvolution operation is H/4 ⁇ W/4 ⁇ 128, the size of the video image after the third decoding process is H/2 ⁇ W/2 ⁇ 64, the size of the video image after the third F operation is H/2 ⁇ W/2 ⁇ 128;
  • the fourth deconvolution The size of the video image before the operation is H/2 ⁇ W/2 ⁇ 64, the size of the video image after the fourth decoding process is H ⁇ W ⁇ 32, and the size
  • the output submodule includes: N23 thirteenth convolutional layers; wherein, N23 is an integer greater than or equal to 3, and the thirteenth convolutional layer is configured to implement a thirteenth convolutional operation.
  • Step 102 Perform first fusion processing on the first feature of the short-exposure video image of the i-th frame and the denoised video image corresponding to the short-exposure video image of the i-th frame to obtain an enhanced video corresponding to the short-exposure video image of the i-th frame image.
  • the first fusion process may be to compare the first feature of the short-exposure video image of the i-th frame with the pixel value at the same position in the denoised video image corresponding to the short-exposure video image of the i-th frame Multiply to obtain the enhanced video image corresponding to the i-th frame of short-exposure video image.
  • the first fusion process may be implemented by any operation or network that realizes the fusion function.
  • the corresponding pixel addition operation for example, the corresponding pixel addition operation, cascade operation, convolution operation, etc.
  • the enhanced video image corresponding to the i-th short-exposure video image is sent to a subsequent image signal processing (ISP, Image Signal Processing) module and other data processing modules for corresponding processing.
  • ISP image signal processing
  • the video image enhancement method provided by the embodiment of the present disclosure realizes the feature extraction and denoising processing of the short-exposure video image respectively through two different branches, and then fuses the processing results of the two branches to obtain the enhanced video image , while denoising is taken into account, distortion is avoided through feature extraction, and the enhancement effect is improved.
  • Fig. 6 is a flowchart of a network training method provided by another embodiment of the present disclosure.
  • FIG. 6 another embodiment of the present disclosure provides a network training method, including:
  • Step 600 For each training sample, use any one of the above-mentioned video image enhancement methods to obtain an enhanced video image corresponding to the training sample; wherein, the training sample includes: (2N+1) frames of short-exposure video images.
  • each frame of short-exposure video image corresponds to a frame of long-exposure video image.
  • the i-th training sample includes: (2N+1) frames of short-exposure video images centered on the i-th frame of short-exposure video images.
  • Step 601 Determine the total objective function value according to the enhanced video image and the long-exposure video image corresponding to the training samples, and update the training parameter value according to the total objective function value.
  • determining the total objective function value according to the enhanced video image and the long-exposure video image corresponding to the training sample includes: for each training sample, according to the enhanced video image and the long-exposure video image corresponding to the training sample The image determines the objective function value corresponding to the training sample; the total objective function value is determined according to the objective function value corresponding to the training sample.
  • the objective function value corresponding to the training sample when calculating the objective function value corresponding to each training sample, should be calculated according to the enhanced video image corresponding to the training sample and the long exposure video image corresponding to the training sample.
  • I out (i, j) is the pixel value of the i-th row and j-column of the enhanced video image corresponding to the training sample
  • I GT (i, j) is the i-th row of the long-exposure video image corresponding to the training sample
  • the pixel value of the jth column, m is the total number of rows, and n is the total number of columns.
  • the total objective function value is an average value of the objective function values corresponding to all training samples.
  • the training parameter value is updated according to the total objective function value; or, the training parameter value is updated according to the total objective function value and a preset index.
  • the preset index may be at least one of Peak Signal to Noise Ratio (PSNR, Peak Signal to Noise Ratio), Structural Similarity Index (SSIM, Structural Similarity Index) and the like.
  • PSNR Peak Signal to Noise Ratio
  • SSIM Structural Similarity Index
  • the training parameters may be parameters that need to be trained in the first sub-network, the second sub-network, and the fusion sub-module.
  • Step 602 for each training sample, according to the updated training parameter value, use any one of the above video image enhancement methods to obtain the enhanced video image corresponding to the training sample until the total objective function value converges.
  • the updated training parameter values use any one of the above video image enhancement methods to obtain the enhanced video images corresponding to the training samples until the total objective function value converges; or, according to the updated training For parameter values, use any of the above video image enhancement methods to obtain the enhanced video images corresponding to the training samples until the total objective function value converges and the preset index reaches the best.
  • the network training method provided by the embodiment of the present disclosure trains the above-mentioned video image enhancement method to obtain the optimal training parameter value, which further improves the enhancement effect.
  • another embodiment of the present disclosure provides an electronic device, including: at least one processor; a memory, at least one program is stored in the memory, and when at least one program is executed by at least one processor, any of the above-mentioned A video image enhancement method, or any of the above-mentioned network training methods.
  • the processor is a device with data processing capability, which includes but not limited to central processing unit (CPU), etc.
  • the memory is a device with data storage capability, which includes but not limited to random access memory (RAM, more specifically SDRAM , DDR, etc.), read-only memory (ROM), charged erasable programmable read-only memory (EEPROM), flash memory (FLASH).
  • RAM random access memory
  • ROM read-only memory
  • EEPROM charged erasable programmable read-only memory
  • FLASH flash memory
  • the processor and the memory are connected to each other through a bus, and further connected to other components of the computing device.
  • another embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned video image enhancement methods, or any of the above-mentioned video image enhancement methods can be implemented.
  • a network training method is also provided.
  • the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
  • Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
  • Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage, or may be used Any other medium that stores desired information and can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)

Abstract

Embodiments of the present disclosure provide a video image augmentation method and device, a network training method, an electronic device and a computer-readable storage medium. The video image enhancement method comprises: using a first network to extract a first feature of the i-th frame of a short-exposure video image, where i is a variable sequentially selected from 0, 1, 2, 3...; using a second network to denoise the (i-N)th frame of the short-exposure video image to the (i+N)th frame of the short-exposure video image and obtain a denoised video image corresponding to the i-th frame of the short-exposure video image, where N is a constant greater than or equal to 1; and performing first fusion processing on the first feature of the i-th frame of the short-exposure video image and the denoised video image corresponding to the i-th frame of the short-exposure video image to obtain an augmented video image corresponding to the i-th frame of the short-exposure video image. FIG. 1

Description

视频图像增强方法、网络训练方法、电子设备、存储介质Video image enhancement method, network training method, electronic device, storage medium
相关申请的交叉引用Cross References to Related Applications
本公开要求在2021年10月9日提交国家知识产权局、申请号为202111174652.3、发明名称为“视频图像增强方法、网络训练方法、电子设备、存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application submitted to the State Intellectual Property Office on October 9, 2021, with the application number 202111174652.3, and the title of the invention is "video image enhancement method, network training method, electronic equipment, storage medium". The entire contents of are incorporated by reference in this disclosure.
技术领域technical field
本公开的实施例涉及但不限于视频图像处理技术领域,具体地说,涉及视频图像增强方法、网络训练方法、电子设备、计算机可读存储介质。Embodiments of the present disclosure relate to but are not limited to the technical field of video image processing, in particular, to a video image enhancement method, a network training method, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
随着视频监控及手机等终端设备的普及和发展,设备成像受环境光线的影响依然是目前存在的主要问题之一。视频成像系统在白天可以提供清晰、色彩还原度高的图像。然而在晚上由于图像传感器曝光不足,导致成像质量大幅下降,造成图像噪声大、亮度低、细节损失、颜色失真等问题。在低光照下的这些不足严重影响了后续的目标识别,分割等任务。因此针对这一不足,研究夜景下视频图像增强算法是非常有必要的。With the popularization and development of terminal equipment such as video surveillance and mobile phones, the impact of equipment imaging by ambient light is still one of the main problems currently existing. The video imaging system can provide clear images with high color reproduction during daylight hours. However, at night, due to the insufficient exposure of the image sensor, the image quality is greatly reduced, resulting in problems such as large image noise, low brightness, loss of details, and color distortion. These deficiencies in low light seriously affect subsequent tasks such as object recognition and segmentation. Therefore, in view of this deficiency, it is very necessary to study video image enhancement algorithms in night scenes.
基于神经网络的夜景视频图像增强算法是当前主要研究方向之一。但这些算法主要都是集中在去噪,亮度增强,色彩还原等其中一个方向,往往无法同时兼顾去噪和避免失真的问题,从而增强效果较差。Night scene video image enhancement algorithm based on neural network is one of the main research directions at present. However, these algorithms mainly focus on one of the directions such as denoising, brightness enhancement, and color restoration, and often cannot take into account the problems of denoising and avoiding distortion at the same time, so the enhancement effect is poor.
发明内容Contents of the invention
本公开实施例提供一种视频图像增强方法、网络训练方法、电子设备、计算机可读存储介质。Embodiments of the present disclosure provide a video image enhancement method, a network training method, electronic equipment, and a computer-readable storage medium.
第一方面,本公开实施例提供一种视频图像增强方法,包括:采用第一网络提取第i帧短曝光视频图像的第一特征;其中,i为依次取自0,1,2,3,……的变量;采用第二网络对第(i-N)帧短曝光视频图像到第(i+N)帧短曝光视频图像进行去噪处理得到所述第i帧短曝光视频图像对应的去噪后的视频图像;其中,N为大于或等于1的常量;将所述第i帧短曝光视频图像的第一特征和所述第i帧短曝光视频图像对应的去噪后的视频图像进行第一融合处理得到所述第i帧短曝光视频图像对应的增强后的视频图像。In the first aspect, an embodiment of the present disclosure provides a video image enhancement method, including: using the first network to extract the first feature of the i-th frame of the short-exposure video image; wherein, i is sequentially selected from 0, 1, 2, 3, The variable of ...; adopt the second network to denoise the (i-N)th frame short exposure video image to the (i+N) frame short exposure video image to obtain the denoising corresponding to the i frame short exposure video image A video image; wherein, N is a constant greater than or equal to 1; the first feature of the i-th frame short-exposure video image and the denoised video image corresponding to the i-th frame short-exposure video image are subjected to the first The fusion process obtains the enhanced video image corresponding to the short-exposure video image of the i-th frame.
第二方面,本公开实施例提供一种网络训练方法,包括:针对每一个训练样本,采用上述任意一种视频图像增强方法获得训练样本对应的增强后的视频图像;其中,所述训练样本包括:(2N+1)帧短曝光视频图像;根据所述训练样本对应的增强后的视频图像和长曝光视频图像确定总目标函数值,根据所述总目标函数值更新训练参数值;针对每一个训练样本,根据更新后的训练参数值,采用上述任意一种视频图像增强方法获得所述训练样本对应的增强后的视频图像,直到所述总目标函数值收敛。In a second aspect, an embodiment of the present disclosure provides a network training method, including: for each training sample, using any of the video image enhancement methods described above to obtain an enhanced video image corresponding to the training sample; wherein the training sample includes : (2N+1) frame short exposure video images; determine the total objective function value according to the enhanced video image corresponding to the training sample and the long exposure video image, update the training parameter value according to the total objective function value; for each For training samples, according to the updated training parameter values, use any one of the above video image enhancement methods to obtain enhanced video images corresponding to the training samples until the total objective function value converges.
第三方面,本公开实施例提供一种电子设备,包括:至少一个处理器;存储器,存储器上存储有至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,实现上述任意一种视频图像增强方法,或上述任意一种网络训练方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; a memory, at least one program is stored in the memory, and when the at least one program is executed by the at least one processor, any of the above-mentioned A video image enhancement method, or any one of the above-mentioned network training methods.
第四方面,本公开实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一种视频图像增强方法,或上述任意一种网络训练方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned video image enhancement methods, or any of the above-mentioned video image enhancement methods, can be realized. A network training method.
本公开实施例提供的视频图像增强方法,通过两个不同的支路分别实现对短曝光视频图像的特征提取和去噪处理,再对两个支路的处理结果进行融合得到增强后的视频图像,在兼顾去噪的同时,通过特征提取避免失真,提高了增强效果。The video image enhancement method provided by the embodiment of the present disclosure realizes the feature extraction and denoising processing of the short-exposure video image respectively through two different branches, and then fuses the processing results of the two branches to obtain the enhanced video image , while denoising is taken into account, distortion is avoided through feature extraction, and the enhancement effect is improved.
本公开实施例提供的网络训练方法,对上述视频图像增强方法进行训练得到最优训练参数值,进一步提高了增强效果。The network training method provided by the embodiment of the present disclosure performs training on the above video image enhancement method to obtain an optimal training parameter value, which further improves the enhancement effect.
附图说明Description of drawings
图1为本公开一个实施例提供的视频图像增强方法的流程图;FIG. 1 is a flowchart of a video image enhancement method provided by an embodiment of the present disclosure;
图2为本公开实施例提供的第一网络的一种可实现的结构示意图;FIG. 2 is a schematic structural diagram of a realizable first network provided by an embodiment of the present disclosure;
图3为本公开实施例提供的第一网络的另一种可实现的结构示意图;FIG. 3 is a schematic structural diagram of another realizable first network provided by an embodiment of the present disclosure;
图4为本公开实施例提供的第二网络的一种可实现的结构示意图;FIG. 4 is a schematic structural diagram of a realizable second network provided by an embodiment of the present disclosure;
图5为本公开实施例提供的编解码子模块的一种可实现的结构示意图;FIG. 5 is a schematic structural diagram of a realizable codec sub-module provided by an embodiment of the present disclosure;
图6为本公开另一个实施例提供的网络训练方法的流程图。Fig. 6 is a flowchart of a network training method provided by another embodiment of the present disclosure.
具体实施方式Detailed ways
为使本领域的技术人员更好地理解本公开的技术方案,下面结合附图对本公开提供的视频图像增强方法、网络训练方法、电子设备、计算机可读存储介质进行详细描述。In order for those skilled in the art to better understand the technical solutions of the present disclosure, the video image enhancement method, network training method, electronic equipment, and computer-readable storage medium provided by the present disclosure will be described in detail below with reference to the accompanying drawings.
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
在不冲突的情况下,本公开各实施例及实施例中的各特征可相互组合。In the case of no conflict, various embodiments of the present disclosure and various features in the embodiments can be combined with each other.
如本文所使用的,术语“和/或”包括至少一个相关列举条目的任何和所有组合。As used herein, the term "and/or" includes any and all combinations of at least one of the associated listed items.
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加至少一个其它特征、整体、步骤、操作、元件、组件和/或其群组。The terminology used herein is for describing particular embodiments only and is not intended to limit the present disclosure. As used herein, the singular forms "a" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that when the terms "comprising" and/or "consisting of" are used in this specification, the stated features, integers, steps, operations, elements and/or components are specified to be present but not excluded to be present or Add at least one other feature, entity, step, operation, element, component and/or group thereof.
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含 义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant art and the present disclosure, and will not be interpreted as having idealized or excessive formal meanings, Unless expressly so limited herein.
图1为本公开一个实施例提供的视频图像增强方法的流程图。Fig. 1 is a flowchart of a video image enhancement method provided by an embodiment of the present disclosure.
在第一方面中,参照图1,本公开的一个实施例提供一种视频图像增强方法,包括:In the first aspect, referring to FIG. 1 , an embodiment of the present disclosure provides a video image enhancement method, including:
步骤100、采用第一网络提取第i帧短曝光视频图像的第一特征;其中,i为依次取自0,1,2,3,……的变量。 Step 100, using the first network to extract the first feature of the i-th frame of the short-exposure video image; wherein, i is a variable sequentially selected from 0, 1, 2, 3, . . . .
在一些示例性实施例中,第i帧短曝光视频图像为从CMOS(Complementary Metal Oxide Semiconductor)传感器获得的Bayer格式的视频图像序列中的视频图像,每一帧短曝光视频图像都对应有一帧长曝光图像。In some exemplary embodiments, the i-th frame of short-exposure video image is a video image in a sequence of video images in Bayer format obtained from a CMOS (Complementary Metal Oxide Semiconductor) sensor, and each frame of short-exposure video image corresponds to a frame length Expose the image.
在一些示例性实施例中,如图2所示,第一网络可以包括:第一子网络、第二子网络和融合子模块。其中,融合子模块被配置成实现第二融合处理。In some exemplary embodiments, as shown in FIG. 2 , the first network may include: a first subnetwork, a second subnetwork, and a fusion submodule. Wherein, the fusion sub-module is configured to implement the second fusion process.
在另一些示例性实施例中,如图3所示,为了减少计算量,除了第一子网络、第二子网络和融合子模块以外,第一网络还可以包括:下采样子模块和上采样子模块。其中,下采样子模块被配置成实现下采样处理,上采样子模块被配置成实现上采样处理。In some other exemplary embodiments, as shown in FIG. 3 , in order to reduce the amount of calculation, in addition to the first sub-network, the second sub-network and the fusion sub-module, the first network may further include: a down-sampling sub-module and an up-sampling sub-module submodule. Wherein, the down-sampling sub-module is configured to realize down-sampling processing, and the up-sampling sub-module is configured to realize up-sampling processing.
在一些示例性实施例中,在第一网络包括:第一子网络、第二子网络、融合子模块的情况下,采用第一网络提取第i帧短曝光视频图像的第一特征包括:采用第一子网络提取第i帧短曝光视频图像的第二特征,采用第二子网络提取第i帧短曝光视频图像的第三特征;将第i帧短曝光视频图像的第二特征和第i帧短曝光视频图像的第三特征进行第二融合处理得到第i帧短曝光视频图像的第一特征。In some exemplary embodiments, when the first network includes: a first sub-network, a second sub-network, and a fusion sub-module, using the first network to extract the first feature of the i-th frame of short-exposure video image includes: using The first sub-network extracts the second feature of the i-th frame short-exposure video image, and uses the second sub-network to extract the third feature of the i-th frame short-exposure video image; the second feature of the i-th frame short-exposure video image and the ith The second fusion process is performed on the third feature of the frame of short-exposure video image to obtain the first feature of the i-th frame of short-exposure video image.
在一些示例性实施例中,第二特征可以是局部特征,第三特征可以是全局特征。In some exemplary embodiments, the second feature may be a local feature, and the third feature may be a global feature.
在一些示例性实施例中,局部特征是指短曝光视频图像的局部信息,如局部极大值、局部极小值等,可以体现视频图像的局部细节的信息。In some exemplary embodiments, the local feature refers to local information of the short-exposure video image, such as local maximum value, local minimum value, etc., which can reflect information of local details of the video image.
在一些示例性实施例中,全局特征是指短曝光视频图像的全局信息,如亮度直方图分布信息、颜色信息等中的至少一个可以进行亮度调节和颜色校正的信息。In some exemplary embodiments, the global feature refers to global information of the short-exposure video image, such as brightness histogram distribution information, color information, etc., at least one of which can perform brightness adjustment and color correction.
在一些示例性实施例中,在第一网络除了第一子网络、第二子网络和融合子模块以外还包括:下采样子模块和上采样子模块的情况下,采用第一网络提取第i帧短曝光视频图像的第一特征包括:对第i帧短曝光视频图像进行下采样处理得到第i帧短曝光视频图像对应的下采样的视频图像;采用第一子网络提取第i帧短曝光视频图像对应的下采样的视频图像的第四特征;采用第二子网络提取第i帧短曝光视频图像对应的下采样的视频图像的第五特征;将第i帧短曝光视频图像对应的下采样的视频图像的第四特征和第五特征进行第二融合处理得到第i帧短曝光视频图像对应的下采样的视频图像的第六特征;对第i帧短曝光视频图像对应的下采样的视频图像的第六特征进行上采样处理得到第i帧短曝光视频图像的第一特征。In some exemplary embodiments, when the first network includes, besides the first sub-network, the second sub-network and the fusion sub-module: a down-sampling sub-module and an up-sampling sub-module, the first network is used to extract the i-th The first feature of the frame short-exposure video image includes: performing down-sampling processing on the i-th frame short-exposure video image to obtain a down-sampled video image corresponding to the i-th frame short-exposure video image; using the first sub-network to extract the i-th frame short-exposure video image The fourth feature of the down-sampled video image corresponding to the video image; the second sub-network is used to extract the fifth feature of the down-sampled video image corresponding to the i-th frame short-exposure video image; The fourth feature and the fifth feature of the sampled video image are subjected to the second fusion process to obtain the sixth feature of the down-sampled video image corresponding to the i-th frame short-exposure video image; The sixth feature of the video image is up-sampled to obtain the first feature of the short-exposure video image of the i-th frame.
在一些示例性实施例中,第四特征可以是局部特征,第五特征可以是全局特征。In some exemplary embodiments, the fourth feature may be a local feature, and the fifth feature may be a global feature.
在一些示例性实施例中,第一子网络包括:N1个第一卷积层;其中,N1为大于或等于3的整数,第一卷积层被配置成实现第一卷积操作。例如,第一子网络可以是L-NET。In some exemplary embodiments, the first subnetwork includes: N1 first convolutional layers; wherein, N1 is an integer greater than or equal to 3, and the first convolutional layer is configured to implement a first convolutional operation. For example, the first sub-network may be L-NET.
在一些示例性实施例中,第二子网络包括:N2个第二卷积层和3个全连接层(FC,Fully Connected layers);其中,N2为大于或等于3的整数,第二卷积层被配置成实现第二卷积操作,FC被配置成实现FC操作。例如,第二子网络可以是G-NET。In some exemplary embodiments, the second sub-network includes: N2 second convolutional layers and 3 fully connected layers (FC, Fully Connected layers); wherein, N2 is an integer greater than or equal to 3, and the second convolutional layer The layer is configured to implement the second convolution operation, and the FC is configured to implement the FC operation. For example, the second sub-network can be G-NET.
在一些示例性实施例中,第二子网络中的3个FC位于N2个第二卷积层后面。In some exemplary embodiments, the 3 FCs in the second sub-network are located after the N2 second convolutional layers.
在一些示例性实施例中,FC中的每个神经元与前一层的所有神经元进行全连接。In some exemplary embodiments, each neuron in the FC is fully connected to all neurons in the previous layer.
在一些示例性实施例中,FC可以采用覆盖整个短曝光视频图像的感受野学习全局特征。In some exemplary embodiments, FC can use the receptive field covering the entire short-exposure video image to learn global features.
在一些示例性实施例中,下采样子模块可以采用任何可以实现下采样功能的操作或网络实现。例如步长为2的卷积操作、空间维度到深度维度的转换(S2D,Space to Depth)操作、池化操作等中的任意一个操作。又如,下采样子模块包括:N3个第三卷积层和N4个第一池化层;其中,N3,N4为大于或等于1的整数,第三卷积层被配置成实现第三卷积操作,第一池化层被配置成实现第一池化操作。In some exemplary embodiments, the down-sampling sub-module may be implemented by any operation or network that can realize the down-sampling function. For example, any one of the convolution operation with a step size of 2, the conversion (S2D, Space to Depth) operation from the space dimension to the depth dimension, and the pooling operation. As another example, the downsampling submodule includes: N3 third convolutional layers and N4 first pooling layers; wherein, N3 and N4 are integers greater than or equal to 1, and the third convolutional layer is configured to implement the third convolutional layer product operation, the first pooling layer is configured to implement the first pooling operation.
在一些示例性实施例中,上采样子模块可以采用任何可以实现上采样功能的操作或网络实现。例如反卷积操作,深度维度到空间维度的转换(D2S,Depth to Space)操作,插值操作等中的任意一个操作。In some exemplary embodiments, the upsampling submodule can be implemented by any operation or network that can realize the upsampling function. For example, any one of deconvolution operations, depth-to-space conversion (D2S, Depth to Space) operations, interpolation operations, etc.
在一些示例性实施例中,下采样处理的下采样倍数可以根据实际需要自行设定。In some exemplary embodiments, the downsampling multiple of the downsampling process can be set according to actual needs.
在一些示例性实施例中,上采样处理为下采样处理的逆过程,以恢复原始分辨率。例如,假设第i帧短曝光视频图像的大小为H×W×1,H为高度,W为宽度,1为通道数,将第i帧短曝光视频图像下采样到大小为256×256×1的视频图像,然后经过第一子网络输出大小为256×256×1的第四特征,经过第二子网络输出大小为256×256×1的第五特征,之后经过融合子模块将第四特征和第五特征进行第二融合处理,最后送入上采样子模块恢复到大小为H×W×1的第一特征。In some exemplary embodiments, the upsampling process is the inverse of the downsampling process to restore the original resolution. For example, assuming that the size of the short-exposure video image of the i-th frame is H×W×1, H is the height, W is the width, and 1 is the number of channels, the short-exposure video image of the i-th frame is down-sampled to a size of 256×256×1 The video image, then the fourth feature with a size of 256×256×1 is output through the first sub-network, and the fifth feature with a size of 256×256×1 is output through the second sub-network, and then the fourth feature is combined through the fusion sub-module Perform the second fusion process with the fifth feature, and finally send it to the upsampling sub-module to restore the first feature with a size of H×W×1.
在一些示例性实施例中,融合子模块可以采用任何实现融合功能的操作或网络。例如对应像素相加操作,级联操作,卷积操作等。In some exemplary embodiments, the fusion sub-module may employ any operation or network that implements the fusion function. For example, the corresponding pixel addition operation, cascade operation, convolution operation, etc.
步骤101、采用第二网络对第(i-N)帧短曝光视频图像到第(i+N)帧短曝光视频图像进行去噪处理得到第i帧短曝光视频图像对应的去噪后的视频图像;其中,N为大于或等于1的常量。 Step 101, using the second network to perform denoising processing on the short-exposure video image from the (i-N) frame to the short-exposure video image in the (i+N) frame to obtain a denoised video image corresponding to the short-exposure video image of the i-th frame; Wherein, N is a constant greater than or equal to 1.
在一些示例性实施例中,在i小于N的情况下,第(i-N)帧短曝光视频图像到第-1帧短曝光视频图像与第0帧短曝光视频图像相同;在i大于M与N之差的情况下,第(M+1)帧短曝光视频图像到第(i+N)帧短曝 光视频图像与最后一帧短曝光视频图像相同,M为短曝光视频图像的总帧数。在一个实施例中,当i=0时,第-N帧短曝光视频图像到第-1帧短曝光视频图像与第0帧短曝光视频图像相同;假设短曝光视频图像的总帧数为50,那么当i=50时,第51帧短曝光视频图像到第(50+N)帧短曝光视频图像与第50帧短曝光视频图像相同。In some exemplary embodiments, when i is less than N, the short-exposure video image from the (i-N)th frame to the -1-th frame short-exposure video image is the same as the 0th frame short-exposure video image; when i is greater than M and N In the case of the difference, the (M+1) frame short exposure video image to the (i+N) frame short exposure video image is the same as the last frame short exposure video image, and M is the total number of frames of the short exposure video image. In one embodiment, when i=0, the short-exposure video image from the -N frame to the -1 frame short-exposure video image is the same as the 0-th frame short-exposure video image; assuming that the total number of frames of the short-exposure video image is 50 , then when i=50, the short exposure video image from the 51st frame to the (50+N) frame short exposure video image is the same as the 50th frame short exposure video image.
在一些示例性实施例中,N的值可以根据实际需要设定,N取值越大,对齐操作的复杂度越高。In some exemplary embodiments, the value of N may be set according to actual needs. The larger the value of N is, the higher the complexity of the alignment operation is.
在一些示例性实施例中,如图4所示,第二网络包括:对齐子模块、编解码子模块和输出子模块。其中,对齐子模块被配置成实现对齐操作,编解码子模块被配置成实现编解码处理,输出子模块被配置成实现输出处理。In some exemplary embodiments, as shown in FIG. 4 , the second network includes: an alignment submodule, a codec submodule, and an output submodule. Wherein, the alignment submodule is configured to implement an alignment operation, the codec submodule is configured to implement encoding and decoding processing, and the output submodule is configured to implement output processing.
本公开实施例对对齐子模块的具体实现不作限定。例如,对齐子模块可以包括:N5个第四卷积层和N6个第一残差连接层;其中,N5,N6为大于或等于1的整数,第四卷积层被配置成实现第四卷积操作,第一残差连接层被配置成实现第一残差连接操作。又如,对齐子模块也可以采用其他一切可以实现多帧视频图像对齐功能的操作或网络实现,如光流网络、可变形的(Deformable)卷积网络、运动估计和运动补偿(MEMC,Motion Estimate and Motion Compensation)网络等实现。The embodiment of the present disclosure does not limit the specific realization of the alignment sub-module. For example, the alignment submodule may include: N5 fourth convolutional layers and N6 first residual connection layers; wherein, N5 and N6 are integers greater than or equal to 1, and the fourth convolutional layer is configured to implement the fourth convolutional layer product operation, the first residual connection layer is configured to implement the first residual connection operation. As another example, the alignment sub-module can also adopt all other operations or network implementations that can realize the alignment function of multi-frame video images, such as optical flow network, deformable (Deformable) convolutional network, motion estimation and motion compensation (MEMC, Motion Estimate and Motion Compensation) network and other implementations.
在一些示例性实施例中,采用第二网络对第(i-N)帧短曝光视频图像到第(i+N)帧短曝光视频图像进行去噪处理得到第i帧短曝光视频图像对应的去噪后的视频图像包括:对第(i-N)帧短曝光视频图像到第(i+N)帧短曝光视频图像进行对齐操作得到对齐操作后的视频图像;对对齐操作后的视频图像进行编解码处理得到编解码处理后的视频图像;对编解码处理后的视频图像进行输出处理得到第i帧短曝光视频图像对应的去噪后的视频图像。In some exemplary embodiments, the second network is used to denoise the (i-N) frame short exposure video image to the (i+N) frame short exposure video image to obtain the denoising corresponding to the i frame short exposure video image The final video image includes: aligning the (i-N) frame short-exposure video image to the (i+N) frame short-exposure video image to obtain the aligned video image; performing encoding and decoding processing on the aligned video image Obtaining the codec-processed video image; performing output processing on the codec-processed video image to obtain a denoised video image corresponding to the i-th frame of the short-exposure video image.
本公开实施例对编解码子模块的具体实现不作限定。在一个实施例中,如图5所示,编解码子模块可以包括:第一编码子模块、第二编码子模块、第三编码子模块、第四编码子模块、第二输出子模块、第一F子模块、第二F子模块、第三F子模块、第四F子模块、第一解码子模块、第二解码 子模块、第三解码子模块、第四解码子模块。又如,编解码子模块也可以采用一切可以实现图像去噪功能的编解码结构的网络实现,如U形网络(UNET,U-shape Network),卷积盲去噪网络(CBDNet,Convolutional Blind Denoising Network)等。The embodiment of the present disclosure does not limit the specific implementation of the codec sub-module. In one embodiment, as shown in Figure 5, the codec submodule may include: a first encoding submodule, a second encoding submodule, a third encoding submodule, a fourth encoding submodule, a second output submodule, a An F submodule, a second F submodule, a third F submodule, a fourth F submodule, a first decoding submodule, a second decoding submodule, a third decoding submodule, and a fourth decoding submodule. As another example, the codec sub-module can also be realized by using any codec structure network that can realize the image denoising function, such as U-shaped network (UNET, U-shape Network), convolutional blind denoising network (CBDNet, Convolutional Blind Denoising Network) and so on.
其中,第一编码子模块被配置成实现第一编码处理,第二编码子模块被配置成实现第二编码处理,第三编码子模块被配置成实现第三编码处理,第四编码子模块被配置成实现第四编码处理,第二输出子模块被配置成实现第二输出处理,第一F子模块被配置成实现第一F操作,第二F子模块被配置成实现第二F操作,第三F子模块被配置成实现第三F操作,第四F子模块被配置成实现第四F操作,第一解码子模块被配置成实现第一解码处理,第二解码子模块被配置成实现第二解码处理,第三解码子模块被配置成实现第三解码处理,第四解码子模块被配置成实现第四解码处理。Wherein, the first encoding submodule is configured to implement the first encoding process, the second encoding submodule is configured to implement the second encoding process, the third encoding submodule is configured to implement the third encoding process, and the fourth encoding submodule is configured to implement the second encoding process. configured to implement a fourth encoding process, the second output submodule is configured to implement a second output process, the first F submodule is configured to implement a first F operation, the second F submodule is configured to implement a second F operation, The third F submodule is configured to implement the third F operation, the fourth F submodule is configured to implement the fourth F operation, the first decoding submodule is configured to implement the first decoding process, and the second decoding submodule is configured to The second decoding process is implemented, the third decoding sub-module is configured to implement the third decoding process, and the fourth decoding sub-module is configured to implement the fourth decoding process.
其中,第一编码子模块包括:N7个第五卷积层、N8个第二残差连接层和1个第二池化层;其中,N7,N8为大于或等于1的整数,第五卷积层被配置成实现第五卷积操作,第二残差连接层被配置成实现第二残差连接操作,第二池化层被配置成实现第二池化操作;第二编码子模块包括:N9个第六卷积层、N10个第三残差连接层和1个第三池化层;其中,N9,N10为大于或等于1的整数,第六卷积层被配置成实现第六卷积操作,第三残差连接层被配置成实现第三残差连接操作,第三池化层被配置成实现第三池化操作;第三编码子模块包括:N11个第七卷积层、N12个第四残差连接层和1个第四池化层;其中,N11,N12为大于或等于1的整数,第七卷积层被配置成实现第七卷积操作,第四残差连接层被配置成实现第四残差连接操作,第四池化层被配置成实现第四池化操作;第四编码子模块包括:N13个第八卷积层、N14个第五残差连接层和1个第五池化层;其中,N13,N14为大于或等于1的整数,第八卷积层被配置成实现第八卷积操作,第五残差连接层被配置成实现第五残差连接操作,第五池化层被配置成实现第五池化操作。Among them, the first encoding sub-module includes: N7 fifth convolutional layers, N8 second residual connection layers and 1 second pooling layer; where N7, N8 are integers greater than or equal to 1, the fifth volume The product layer is configured to implement a fifth convolution operation, the second residual connection layer is configured to implement a second residual connection operation, and the second pooling layer is configured to implement a second pooling operation; the second encoding submodule includes : N9 sixth convolutional layers, N10 third residual connection layers and 1 third pooling layer; wherein, N9, N10 are integers greater than or equal to 1, and the sixth convolutional layer is configured to implement the sixth Convolution operation, the third residual connection layer is configured to implement the third residual connection operation, the third pooling layer is configured to implement the third pooling operation; the third coding sub-module includes: N11 seventh convolutional layers , N12 fourth residual connection layers and 1 fourth pooling layer; wherein, N11, N12 are integers greater than or equal to 1, the seventh convolutional layer is configured to implement the seventh convolution operation, the fourth residual The connection layer is configured to implement the fourth residual connection operation, and the fourth pooling layer is configured to implement the fourth pooling operation; the fourth encoding submodule includes: N13 eighth convolutional layers, N14 fifth residual connections layer and a fifth pooling layer; wherein, N13, N14 are integers greater than or equal to 1, the eighth convolutional layer is configured to implement the eighth convolutional operation, and the fifth residual connection layer is configured to implement the fifth For the residual connection operation, the fifth pooling layer is configured to implement a fifth pooling operation.
其中,第一解码子模块包括:N15个第九卷积层、N16个第六残差连接层和1个第一反卷积层;其中,N15,N16为大于或等于1的整数,第 九卷积层被配置成实现第九卷积操作,第六残差连接层被配置成实现第六残差连接操作,第一反卷积层被配置成实现第一反卷积操作;第二解码子模块包括:N17个第十卷积层、N18个第七残差连接层和1个第二反卷积层;其中,N17,N18为大于或等于1的整数,第十卷积层被配置成实现第十卷积操作,第七残差连接层被配置成实现第七残差连接操作,第二反卷积层被配置成实现第二反卷积操作;第三解码子模块包括:N19个第十一卷积层、N20个第八残差连接层和1个第三反卷积层;其中,N19,N20为大于或等于1的整数,第十一卷积层被配置成实现第十一卷积操作,第八残差连接层被配置成实现第八残差连接操作,第三反卷积层被配置成实现第三反卷积操作;第四解码子模块包括:N21个第十二卷积层、N22个第九残差连接层和1个第四反卷积层;其中,N21,N22为大于或等于1的整数,第十二卷积层被配置成实现第十二卷积操作,第九残差连接层被配置成实现第九残差连接操作,第四反卷积层被配置成实现第四反卷积操作。Wherein, the first decoding submodule includes: N15 ninth convolutional layers, N16 sixth residual connection layers and 1 first deconvolution layer; wherein, N15 and N16 are integers greater than or equal to 1, and the ninth The convolution layer is configured to implement the ninth convolution operation, the sixth residual connection layer is configured to implement the sixth residual connection operation, the first deconvolution layer is configured to implement the first deconvolution operation; the second decoding The sub-module includes: N17 tenth convolutional layers, N18 seventh residual connection layers and one second deconvolutional layer; wherein, N17 and N18 are integers greater than or equal to 1, and the tenth convolutional layer is configured To realize the tenth convolution operation, the seventh residual connection layer is configured to realize the seventh residual connection operation, and the second deconvolution layer is configured to realize the second deconvolution operation; the third decoding submodule includes: N19 An eleventh convolutional layer, N20 eighth residual connection layers, and a third deconvolutional layer; wherein, N19, N20 are integers greater than or equal to 1, and the eleventh convolutional layer is configured to implement the first Eleven convolution operations, the eighth residual connection layer is configured to implement the eighth residual connection operation, and the third deconvolution layer is configured to implement the third deconvolution operation; the fourth decoding submodule includes: N21th Twelve convolutional layers, N22 ninth residual connection layers and one fourth deconvolutional layer; wherein, N21, N22 are integers greater than or equal to 1, and the twelfth convolutional layer is configured to implement the twelfth The convolution operation, the ninth residual connection layer is configured to implement the ninth residual connection operation, and the fourth deconvolution layer is configured to implement the fourth deconvolution operation.
其中,第一F子模块、第二F子模块、第三F子模块、第四F子模块可以采用任何可以实现互连操作的操作实现。例如,第一F子模块、第二F子模块、第三F子模块、第四F子模块可以采用相加(add)操作、合并(concate)操作、卷积操作等中的任何一个操作实现。Wherein, the first F sub-module, the second F sub-module, the third F sub-module, and the fourth F sub-module can be implemented by any operation that can realize interconnection. For example, the first F sub-module, the second F sub-module, the third F sub-module, and the fourth F sub-module can be implemented by any one of addition (add) operation, combination (concate) operation, convolution operation, etc. .
在一些示例性实施例中,对对齐操作后的视频图像进行编解码处理得到编解码处理后的视频图像包括:对所述对齐操作后的视频图像进行第一编码处理得到第一编码处理后的视频图像;其中,所述第一编码处理包括:第五卷积操作、第二残差连接操作、第二池化操作;对所述第一编码处理后的视频图像进行第二编码处理得到第二编码处理后的视频图像;其中,所述第二编码处理包括:第六卷积操作、第三残差连接操作、第三池化操作;对所述第二编码处理后的视频图像进行第三编码处理得到第三编码处理后的视频图像;其中,所述第三编码处理包括:第七卷积操作、第四残差连接操作、第四池化操作;对所述第三编码处理后的视频图像进行第四编码处理得到第四编码处理后的视频图像;其中,所述第四编码处理包括:第八卷积操作、第五残差连接操作、第五池化操作;对所述第四编码处理 后的视频图像进行第一解码处理得到第一解码处理后的视频图像;对第五池化操作之前的视频图像和所述第一解码处理后的视频图像进行第一F操作得到第一F操作后的视频图像,对所述第一F操作后的视频图像进行第二解码处理得到第二解码处理后的视频图像;对所述第四池化操作之前的视频图像和所述第二解码处理后的视频图像进行第二F操作得到第二F操作后的视频图像,对所述第二F操作后的视频图像进行第三解码处理得到第三解码处理后的视频图像;对所述第三池化操作之前的视频图像和所述第三解码处理后的视频图像进行第三F操作得到第三F操作后的视频图像,对所述第三F操作后的视频图像进行第四解码处理得到所述编解码处理后的视频图像;对所述第二池化操作之前的视频图像和所述第四解码处理后的视频图像进行第四F操作得到所述编解码处理后的视频图像。In some exemplary embodiments, performing codec processing on the aligned video image to obtain the encoded video image includes: performing a first encoding process on the aligned video image to obtain the first encoded video image A video image; wherein, the first encoding process includes: a fifth convolution operation, a second residual connection operation, and a second pooling operation; performing a second encoding process on the video image after the first encoding process to obtain the first The video image after the second encoding process; wherein, the second encoding process includes: the sixth convolution operation, the third residual connection operation, and the third pooling operation; performing the second encoding process on the video image after the second encoding process The third encoding process obtains the video image after the third encoding process; wherein, the third encoding process includes: the seventh convolution operation, the fourth residual connection operation, and the fourth pooling operation; after the third encoding process The video image is subjected to the fourth encoding process to obtain the video image after the fourth encoding process; wherein, the fourth encoding process includes: the eighth convolution operation, the fifth residual connection operation, and the fifth pooling operation; Performing the first decoding process on the video image after the fourth encoding process to obtain the video image after the first decoding process; performing the first F operation on the video image before the fifth pooling operation and the video image after the first decoding process to obtain For the video image after the first F operation, perform a second decoding process on the video image after the first F operation to obtain a second decoded video image; for the video image before the fourth pooling operation and the Performing a second F operation on the video image after the second decoding process to obtain a video image after the second F operation, performing a third decoding process on the video image after the second F operation to obtain a video image after the third decoding process; Performing a third F operation on the video image before the third pooling operation and the video image after the third decoding process to obtain a video image after the third F operation, and performing the third F operation on the video image after the third F operation Four decoding processing to obtain the codec-processed video image; performing a fourth F operation on the video image before the second pooling operation and the fourth decoding-processed video image to obtain the codec-processed video image video image.
在一些示例性实施例中,第一编码处理后的视频图像的分辨率为对齐操作后的视频图像的分辨率的一半;第二编码处理后的视频图像的分辨率为第一编码处理后的视频图像的分辨率的一半;第三编码处理后的视频图像的分辨率为第二编码处理后的视频图像的分辨率的一半;第四编码处理后的视频图像的分辨率为第三编码处理后的视频图像的分辨率的一半。In some exemplary embodiments, the resolution of the video image after the first encoding process is half of the resolution of the video image after the alignment operation; the resolution of the video image after the second encoding process is half of the resolution of the video image after the first encoding process Half of the resolution of the video image; the resolution of the video image after the third encoding process is half of the resolution of the video image after the second encoding process; the resolution of the video image after the fourth encoding process is the third encoding process Half the resolution of the video image after.
在一些示例性实施例中,第一解码处理后的视频图像的分辨率为第四编码处理后的视频图像的分辨率的2倍;第二解码处理后的视频图像的分辨率为第一解码处理后的视频图像的分辨率的2倍;第三解码处理后的视频图像的分辨率为第二解码处理后的视频图像的分辨率的2倍;第四解码处理后的视频图像的分辨率为第三解码处理后的视频图像的分辨率的2倍。In some exemplary embodiments, the resolution of the video image after the first decoding process is twice the resolution of the video image after the fourth encoding process; the resolution of the video image after the second decoding process is the first decoding process 2 times of the resolution of the video image after processing; the resolution of the video image after the third decoding process is 2 times of the resolution of the video image after the second decoding process; the resolution of the video image after the fourth decoding process It is twice the resolution of the video image after the third decoding process.
例如,对齐操作后的视频图像的大小为H×W×(2N+1),H为高度,W为宽度,2N+1为通道数,则第二池化操作之前的视频图像的大小为H×W×32,第一编码处理后的视频图像的大小为H/2×W/2×128;第三池化操作之前的视频图像的大小为H/2×W/2×64,第二编码处理后的视频图像的大小为H/4×W/4×256;第四池化操作之前的视频图像的大小为H/4×W/4×128,第三编码处理后的视频图像的大小为H/8×W/8×512;第五池化操作之前的视频图像的大小为H/8×W/8×256,第四编码处理后的视频图像的大小为H/16×W/16×1024。For example, the size of the video image after the alignment operation is H×W×(2N+1), H is the height, W is the width, and 2N+1 is the number of channels, then the size of the video image before the second pooling operation is H ×W×32, the size of the video image after the first encoding process is H/2×W/2×128; the size of the video image before the third pooling operation is H/2×W/2×64, the second The size of the encoded video image is H/4×W/4×256; the size of the video image before the fourth pooling operation is H/4×W/4×128, and the size of the third encoded video image is The size is H/8×W/8×512; the size of the video image before the fifth pooling operation is H/8×W/8×256, and the size of the video image after the fourth encoding process is H/16×W /16×1024.
第一反卷积操作之前的视频图像的大小为H/16×W/16×512,第一解码处理后的视频图像的大小为H/8×W/8×256,第一F操作后的视频图像的大小为H/8×W/8×512;第二反卷积操作之前的视频图像的大小为H/8×W/8×256,第二解码处理后的视频图像的大小为H/4×W/4×128,第二F操作后的视频图像的大小为H/4×W/4×256;第三反卷积操作之前的视频图像的大小为H/4×W/4×128,第三解码处理后的视频图像的大小为H/2×W/2×64,第三F操作后的视频图像的大小为H/2×W/2×128;第四反卷积操作之前的视频图像的大小为H/2×W/2×64,第四解码处理后的视频图像的大小为H×W×32,第四F操作后的视频图像的大小为H×W×64。The size of the video image before the first deconvolution operation is H/16×W/16×512, the size of the video image after the first decoding process is H/8×W/8×256, and the size of the video image after the first F operation The size of the video image is H/8×W/8×512; the size of the video image before the second deconvolution operation is H/8×W/8×256, and the size of the video image after the second decoding process is H /4×W/4×128, the size of the video image after the second F operation is H/4×W/4×256; the size of the video image before the third deconvolution operation is H/4×W/4 ×128, the size of the video image after the third decoding process is H/2×W/2×64, the size of the video image after the third F operation is H/2×W/2×128; the fourth deconvolution The size of the video image before the operation is H/2×W/2×64, the size of the video image after the fourth decoding process is H×W×32, and the size of the video image after the fourth F operation is H×W× 64.
在一些示例性实施例中,输出子模块包括:N23个第十三卷积层;其中,N23为大于或等于3的整数,第十三卷积层被配置成实现第十三卷积操作。In some exemplary embodiments, the output submodule includes: N23 thirteenth convolutional layers; wherein, N23 is an integer greater than or equal to 3, and the thirteenth convolutional layer is configured to implement a thirteenth convolutional operation.
步骤102、将第i帧短曝光视频图像的第一特征和第i帧短曝光视频图像对应的去噪后的视频图像进行第一融合处理得到第i帧短曝光视频图像对应的增强后的视频图像。Step 102: Perform first fusion processing on the first feature of the short-exposure video image of the i-th frame and the denoised video image corresponding to the short-exposure video image of the i-th frame to obtain an enhanced video corresponding to the short-exposure video image of the i-th frame image.
在一些示例性实施例中,第一融合处理可以是将第i帧短曝光视频图像的第一特征和第i帧短曝光视频图像对应的去噪后的视频图像中位于相同位置的像素值相乘得到第i帧短曝光视频图像对应的增强后的视频图像。In some exemplary embodiments, the first fusion process may be to compare the first feature of the short-exposure video image of the i-th frame with the pixel value at the same position in the denoised video image corresponding to the short-exposure video image of the i-th frame Multiply to obtain the enhanced video image corresponding to the i-th frame of short-exposure video image.
在一些示例性实施例中,第一融合处理可以采用任何实现融合功能的操作或网络实现。例如对应像素相加操作,级联操作,卷积操作等。In some exemplary embodiments, the first fusion process may be implemented by any operation or network that realizes the fusion function. For example, the corresponding pixel addition operation, cascade operation, convolution operation, etc.
在一些示例性实施例中,第i帧短曝光视频图像对应的增强后的视频图像被送到后续的图像信号处理(ISP,Image Signal Processing)模块等其他数据处理模块进行相应的处理。In some exemplary embodiments, the enhanced video image corresponding to the i-th short-exposure video image is sent to a subsequent image signal processing (ISP, Image Signal Processing) module and other data processing modules for corresponding processing.
本公开实施例提供的视频图像增强方法,通过两个不同的支路分别实现对短曝光视频图像的特征提取和去噪处理,再对两个支路的处理结果进行融合得到增强后的视频图像,在兼顾去噪的同时,通过特征提取避免失真,提高了增强效果。The video image enhancement method provided by the embodiment of the present disclosure realizes the feature extraction and denoising processing of the short-exposure video image respectively through two different branches, and then fuses the processing results of the two branches to obtain the enhanced video image , while denoising is taken into account, distortion is avoided through feature extraction, and the enhancement effect is improved.
图6为本公开另一个实施例提供的网络训练方法的流程图。Fig. 6 is a flowchart of a network training method provided by another embodiment of the present disclosure.
在第二方面中,参照图6,本公开另一个实施例提供一种网络训练方法,包括:In the second aspect, referring to FIG. 6, another embodiment of the present disclosure provides a network training method, including:
步骤600、针对每一个训练样本,采用上述任意一种视频图像增强方法获得训练样本对应的增强后的视频图像;其中,训练样本包括:(2N+1)帧短曝光视频图像。 Step 600. For each training sample, use any one of the above-mentioned video image enhancement methods to obtain an enhanced video image corresponding to the training sample; wherein, the training sample includes: (2N+1) frames of short-exposure video images.
在本公开实施例中,每一帧短曝光视频图像都对应有一帧长曝光视频图像。In the embodiment of the present disclosure, each frame of short-exposure video image corresponds to a frame of long-exposure video image.
在本公开实施例中,第i个训练样本包括:以第i帧短曝光视频图像为中心的(2N+1)帧短曝光视频图像。In the embodiment of the present disclosure, the i-th training sample includes: (2N+1) frames of short-exposure video images centered on the i-th frame of short-exposure video images.
步骤601、根据训练样本对应的增强后的视频图像和长曝光视频图像确定总目标函数值,根据总目标函数值更新训练参数值。Step 601: Determine the total objective function value according to the enhanced video image and the long-exposure video image corresponding to the training samples, and update the training parameter value according to the total objective function value.
在一些示例性实施例中,根据训练样本对应的增强后的视频图像和长曝光视频图像确定总目标函数值包括:针对每一个训练样本,根据训练样本对应的增强后的视频图像和长曝光视频图像确定训练样本对应的目标函数值;根据训练样本对应的目标函数值确定总目标函数值。In some exemplary embodiments, determining the total objective function value according to the enhanced video image and the long-exposure video image corresponding to the training sample includes: for each training sample, according to the enhanced video image and the long-exposure video image corresponding to the training sample The image determines the objective function value corresponding to the training sample; the total objective function value is determined according to the objective function value corresponding to the training sample.
在一些示例性实施例中,计算每一个训练样本对应的目标函数值时,应该根据训练样本对应的增强后的视频图像和训练样本对应的长曝光视频图像计算训练样本对应的目标函数值。In some exemplary embodiments, when calculating the objective function value corresponding to each training sample, the objective function value corresponding to the training sample should be calculated according to the enhanced video image corresponding to the training sample and the long exposure video image corresponding to the training sample.
在一些示例性实施例中,根据训练样本对应的增强后的视频图像和长曝光视频图像确定训练样本对应的目标函数值包括:按照公式L=αL enh+(1-α)L color确定训练样本对应的目标函数值;其中,L为训练样本对应的目标函数值,α为权重系数,L enh为训练样本对应的增强后的视频图像和长曝光视频图像的差值的绝对值的L 1范数,L color为颜色一致性损失函数。 In some exemplary embodiments, determining the objective function value corresponding to the training sample according to the enhanced video image and the long-exposure video image corresponding to the training sample includes: determining the training sample according to the formula L=αL enh +(1-α)L color The corresponding objective function value; where, L is the objective function value corresponding to the training sample, α is the weight coefficient, L enh is the L1 norm of the absolute value of the difference between the enhanced video image corresponding to the training sample and the long-exposure video image Number, L color is the color consistency loss function.
在一些示例性实施例中,
Figure PCTCN2022081245-appb-000001
In some exemplary embodiments,
Figure PCTCN2022081245-appb-000001
其中,I out(i,j)为训练样本对应的增强后的视频图像的第i行第j列的像素值,I GT(i,j)为训练样本对应的长曝光视频图像的第i行第j列的像素值,m为总行数,n为总列数。 Wherein, I out (i, j) is the pixel value of the i-th row and j-column of the enhanced video image corresponding to the training sample, and I GT (i, j) is the i-th row of the long-exposure video image corresponding to the training sample The pixel value of the jth column, m is the total number of rows, and n is the total number of columns.
在一些示例性实施例中,In some exemplary embodiments,
Figure PCTCN2022081245-appb-000002
Figure PCTCN2022081245-appb-000002
在一些示例性实施例中,总目标函数值为所有训练样本对应的目标函数值的平均值。In some exemplary embodiments, the total objective function value is an average value of the objective function values corresponding to all training samples.
在一些示例性实施例中,根据总目标函数值更新训练参数值;或者,根据总目标函数值和预设指标更新训练参数值。In some exemplary embodiments, the training parameter value is updated according to the total objective function value; or, the training parameter value is updated according to the total objective function value and a preset index.
本公开实施例对预设指标不作限定。例如预设指标可以是峰值信噪比(PSNR,Peak Signal to Noise Ratio)、结构相似性(SSIM,Structural Similarity Index)等中的至少一个。The embodiment of the present disclosure does not limit the preset index. For example, the preset index may be at least one of Peak Signal to Noise Ratio (PSNR, Peak Signal to Noise Ratio), Structural Similarity Index (SSIM, Structural Similarity Index) and the like.
本公开实施例对训练参数不作限定。例如训练参数可以是第一子网络、第二子网络、融合子模块中需要训练的参数。The embodiment of the present disclosure does not limit the training parameters. For example, the training parameters may be parameters that need to be trained in the first sub-network, the second sub-network, and the fusion sub-module.
步骤602、针对每一个训练样本,根据更新后的训练参数值,采用上述任意一种视频图像增强方法获得训练样本对应的增强后的视频图像,直到总目标函数值收敛。 Step 602 , for each training sample, according to the updated training parameter value, use any one of the above video image enhancement methods to obtain the enhanced video image corresponding to the training sample until the total objective function value converges.
在一些示例性实施例中,根据更新后的训练参数值,采用上述任意一种视频图像增强方法获得训练样本对应的增强后的视频图像,直到总目标函数值收敛;或者,根据更新后的训练参数值,采用上述任意一种视频图像增强方法获得训练样本对应的增强后的视频图像,直到总目标函数值收敛,且预设指标达到最佳。In some exemplary embodiments, according to the updated training parameter values, use any one of the above video image enhancement methods to obtain the enhanced video images corresponding to the training samples until the total objective function value converges; or, according to the updated training For parameter values, use any of the above video image enhancement methods to obtain the enhanced video images corresponding to the training samples until the total objective function value converges and the preset index reaches the best.
本公开实施例提供的网络训练方法,对上述视频图像增强方法进行训 练得到最优训练参数值,进一步提高了增强效果。The network training method provided by the embodiment of the present disclosure trains the above-mentioned video image enhancement method to obtain the optimal training parameter value, which further improves the enhancement effect.
第三方面,本公开另一个实施例提供一种电子设备,包括:至少一个处理器;存储器,存储器上存储有至少一个程序,当至少一个程序被至少一个处理器执行时,实现上述任意一种视频图像增强方法,或上述任意一种网络训练方法。In a third aspect, another embodiment of the present disclosure provides an electronic device, including: at least one processor; a memory, at least one program is stored in the memory, and when at least one program is executed by at least one processor, any of the above-mentioned A video image enhancement method, or any of the above-mentioned network training methods.
其中,处理器为具有数据处理能力的器件,其包括但不限于中央处理器(CPU)等;存储器为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,更具体如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、闪存(FLASH)。Wherein, the processor is a device with data processing capability, which includes but not limited to central processing unit (CPU), etc.; the memory is a device with data storage capability, which includes but not limited to random access memory (RAM, more specifically SDRAM , DDR, etc.), read-only memory (ROM), charged erasable programmable read-only memory (EEPROM), flash memory (FLASH).
在一些实施例中,处理器、存储器通过总线相互连接,进而与计算设备的其它组件连接。In some embodiments, the processor and the memory are connected to each other through a bus, and further connected to other components of the computing device.
第四方面,本公开另一个实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述任意一种视频图像增强方法,或上述任意一种网络训练方法。In a fourth aspect, another embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned video image enhancement methods, or any of the above-mentioned video image enhancement methods can be implemented. A network training method.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储器、或者可以用于存储期望的信息并且可以被计算机访问的任何其它 的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage, or may be used Any other medium that stores desired information and can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其它实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。Example embodiments have been disclosed herein, and while specific terms have been employed, they are used and should be construed in a generic descriptive sense only and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be described in combination with other embodiments, unless explicitly stated otherwise. Combinations of features and/or elements. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims (13)

  1. 一种视频图像增强方法,包括:A video image enhancement method, comprising:
    采用第一网络提取第i帧短曝光视频图像的第一特征;其中,i为依次取自0,1,2,3,……的变量;Adopting the first network to extract the first feature of the i-th frame short-exposure video image; wherein, i is a variable sequentially taken from 0,1,2,3,...;
    采用第二网络对第(i-N)帧短曝光视频图像到第(i+N)帧短曝光视频图像进行去噪处理得到所述第i帧短曝光视频图像对应的去噪后的视频图像;其中,N为大于或等于1的常量;以及Using the second network to denoise the (i-N) frame short exposure video image to the (i+N) frame short exposure video image to obtain the denoised video image corresponding to the i frame short exposure video image; wherein , N is a constant greater than or equal to 1; and
    将所述第i帧短曝光视频图像的第一特征和所述第i帧短曝光视频图像对应的去噪后的视频图像进行第一融合处理得到所述第i帧短曝光视频图像对应的增强后的视频图像。performing a first fusion process on the first feature of the short-exposure video image of the i-th frame and the denoised video image corresponding to the short-exposure video image of the i-th frame to obtain the enhancement corresponding to the short-exposure video image of the i-th frame after the video image.
  2. 根据权利要求1所述的视频图像增强方法,其中,所述采用第一网络提取第i帧短曝光视频图像的第一特征包括:The video image enhancement method according to claim 1, wherein said first feature of extracting the i-th frame short-exposure video image using the first network comprises:
    采用第一子网络提取所述第i帧短曝光视频图像的第二特征,采用第二子网络提取所述第i帧短曝光视频图像的第三特征;以及Using the first sub-network to extract the second feature of the i-th frame of short-exposure video image, using the second sub-network to extract the third feature of the i-th frame of short-exposure video image; and
    将所述第i帧短曝光视频图像的第二特征和所述第i帧短曝光视频图像的第三特征进行第二融合处理得到所述第i帧短曝光视频图像的第一特征。performing a second fusion process on the second feature of the i-th frame of the short-exposure video image and the third feature of the i-th frame of the short-exposure video image to obtain the first feature of the i-th frame of the short-exposure video image.
  3. 根据权利要求1所述的视频图像增强方法,其中,所述采用第一网络提取第i帧短曝光视频图像的第一特征包括:The video image enhancement method according to claim 1, wherein said first feature of extracting the i-th frame short-exposure video image using the first network comprises:
    对所述第i帧短曝光视频图像进行下采样处理得到所述第i帧短曝光视频图像对应的下采样的视频图像;Performing downsampling processing on the i-th frame of short-exposure video image to obtain a down-sampled video image corresponding to the i-th frame of short-exposure video image;
    采用所述第一子网络提取所述第i帧短曝光视频图像对应的下采样的视频图像的第四特征;Using the first sub-network to extract the fourth feature of the downsampled video image corresponding to the i-th frame short-exposure video image;
    采用所述第二子网络提取所述第i帧短曝光视频图像对应的下采样的 视频图像的第五特征;The fifth feature of the down-sampled video image corresponding to the i-th frame short-exposure video image is extracted using the second sub-network;
    将所述第i帧短曝光视频图像对应的下采样的视频图像的第四特征和第五特征进行第二融合处理得到所述第i帧短曝光视频图像对应的下采样的视频图像的第六特征;以及performing a second fusion process on the fourth and fifth features of the downsampled video image corresponding to the i-th frame of short-exposure video image to obtain the sixth feature of the down-sampled video image corresponding to the i-th frame of short-exposure video image characteristics; and
    对所述第i帧短曝光视频图像对应的下采样的视频图像的第六特征进行上采样处理得到所述第i帧短曝光视频图像的第一特征。Perform upsampling processing on the sixth feature of the downsampled video image corresponding to the i-th frame of short-exposure video image to obtain the first feature of the i-th frame of short-exposure video image.
  4. 根据权利要求2-3任意一项所述的视频图像增强方法,其中,所述第一子网络包括:N1个第一卷积层;其中,N1为大于或等于3的整数。The video image enhancement method according to any one of claims 2-3, wherein the first sub-network includes: N1 first convolutional layers; wherein, N1 is an integer greater than or equal to 3.
  5. 根据权利要求2-3任意一项所述的视频图像增强方法,其中,所述第二子网络包括:N2个第二卷积层和3个全连接层FC;其中,N2为大于或等于3的整数。The video image enhancement method according to any one of claims 2-3, wherein the second subnetwork comprises: N2 second convolutional layers and 3 fully connected layers FC; wherein N2 is greater than or equal to 3 an integer of .
  6. 根据权利要求1所述的视频图像增强方法,其中,所述采用第二网络对第(i-N)帧短曝光视频图像到第(i+N)帧短曝光视频图像进行去噪处理得到所述第i帧短曝光视频图像对应的去噪后的视频图像包括:The video image enhancement method according to claim 1, wherein the second network is used to denoise the (i-N)th frame short-exposure video image to the (i+N)th frame short-exposure video image to obtain the first The denoised video image corresponding to the i-frame short-exposure video image includes:
    对所述第(i-N)帧短曝光视频图像到所述第(i+N)帧短曝光视频图像进行对齐操作得到对齐操作后的视频图像;performing an alignment operation on the (i-N)th frame short-exposure video image to the (i+N)th frame short-exposure video image to obtain a video image after the alignment operation;
    对所述对齐操作后的视频图像进行编解码处理得到编解码处理后的视频图像;以及performing codec processing on the video image after the alignment operation to obtain a codec-processed video image; and
    对所述编解码处理后的视频图像进行输出处理得到所述第i帧短曝光视频图像对应的去噪后的视频图像。Output processing is performed on the codec-processed video image to obtain a denoised video image corresponding to the ith frame of short-exposure video image.
  7. 根据权利要求6所述的视频图像增强方法,其中,所述对所述对齐操作后的视频图像进行编解码处理得到编解码处理后的视频图像包括:The video image enhancement method according to claim 6, wherein said performing codec processing on the video image after the alignment operation to obtain the codec processed video image comprises:
    对所述对齐操作后的视频图像进行第一编码处理得到第一编码处理 后的视频图像;其中,所述第一编码处理包括:第五卷积操作、第二残差连接操作、第二池化操作;Performing a first encoding process on the video image after the alignment operation to obtain a video image after the first encoding process; wherein, the first encoding process includes: a fifth convolution operation, a second residual connection operation, and a second pooling operation. operation;
    对所述第一编码处理后的视频图像进行第二编码处理得到第二编码处理后的视频图像;其中,所述第二编码处理包括:第六卷积操作、第三残差连接操作、第三池化操作;Performing a second encoding process on the video image after the first encoding process to obtain a video image after the second encoding process; wherein, the second encoding process includes: a sixth convolution operation, a third residual connection operation, and a second Three-pool operation;
    对所述第二编码处理后的视频图像进行第三编码处理得到第三编码处理后的视频图像;其中,所述第三编码处理包括:第七卷积操作、第四残差连接操作、第四池化操作;Performing a third encoding process on the video image after the second encoding process to obtain a video image after the third encoding process; wherein, the third encoding process includes: a seventh convolution operation, a fourth residual connection operation, and a fourth residual connection operation. Four-pool operation;
    对所述第三编码处理后的视频图像进行第四编码处理得到第四编码处理后的视频图像;其中,所述第四编码处理包括:第八卷积操作、第五残差连接操作、第五池化操作;Performing a fourth encoding process on the video image after the third encoding process to obtain a video image after the fourth encoding process; wherein, the fourth encoding process includes: an eighth convolution operation, a fifth residual connection operation, and a fourth encoding process. Five-pool operation;
    对所述第四编码处理后的视频图像进行第一解码处理得到第一解码处理后的视频图像;performing a first decoding process on the video image after the fourth encoding process to obtain a video image after the first decoding process;
    对第五池化操作之前的视频图像和所述第一解码处理后的视频图像进行第一F操作得到第一F操作后的视频图像,对所述第一F操作后的视频图像进行第二解码处理得到第二解码处理后的视频图像;Performing the first F operation on the video image before the fifth pooling operation and the video image after the first decoding process to obtain the video image after the first F operation, and performing the second F operation on the video image after the first F operation The decoding process obtains the video image after the second decoding process;
    对所述第四池化操作之前的视频图像和所述第二解码处理后的视频图像进行第二F操作得到第二F操作后的视频图像,对所述第二F操作后的视频图像进行第三解码处理得到第三解码处理后的视频图像;Performing a second F operation on the video image before the fourth pooling operation and the video image after the second decoding process to obtain a video image after the second F operation, and performing a second F operation on the video image after the second F operation The third decoding process obtains the video image after the third decoding process;
    对所述第三池化操作之前的视频图像和所述第三解码处理后的视频图像进行第三F操作得到第三F操作后的视频图像,对所述第三F操作后的视频图像进行第四解码处理得到所述编解码处理后的视频图像;以及Performing a third F operation on the video image before the third pooling operation and the video image after the third decoding process to obtain a video image after the third F operation, and performing a third F operation on the video image after the third F operation The fourth decoding process obtains the codec-processed video image; and
    对所述第二池化操作之前的视频图像和所述第四解码处理后的视频图像进行第四F操作得到所述编解码处理后的视频图像。Performing a fourth F operation on the video image before the second pooling operation and the video image after the fourth decoding process to obtain the codec-processed video image.
  8. 一种网络训练方法,包括:A network training method, comprising:
    针对每一个训练样本,采用权利要求1-7任意一项所述的视频图像增 强方法获得所述训练样本对应的增强后的视频图像;其中,所述训练样本包括(2N+1)帧短曝光视频图像;For each training sample, adopt the video image enhancement method described in any one of claims 1-7 to obtain the enhanced video image corresponding to the training sample; wherein, the training sample includes (2N+1) frame short exposure video images;
    根据所述训练样本对应的增强后的视频图像和长曝光视频图像确定总目标函数值,根据所述总目标函数值更新训练参数值;以及determining a total objective function value according to the enhanced video image and the long-exposure video image corresponding to the training sample, and updating the training parameter value according to the total objective function value; and
    针对每一个训练样本,根据更新后的训练参数值,采用权利要求1-7任意一项所述的视频图像增强方法获得所述训练样本对应的增强后的视频图像,直到所述总目标函数值收敛。For each training sample, according to the updated training parameter value, adopt the video image enhancement method described in any one of claims 1-7 to obtain the enhanced video image corresponding to the training sample until the total objective function value convergence.
  9. 根据权利要求8所述的网络训练方法,其中,The network training method according to claim 8, wherein,
    所述根据所述总目标函数值更新训练参数值包括:根据所述总目标函数值和预设指标更新训练参数值;并且The updating the training parameter value according to the total objective function value includes: updating the training parameter value according to the total objective function value and a preset index; and
    所述根据更新后的训练参数值,采用权利要求1-7任意一项所述的视频图像增强方法获得所述训练样本对应的增强后的视频图像,直到所述总目标函数值收敛包括:根据更新后的训练参数值,采用权利要求1-7任意一项所述的视频图像增强方法获得所述训练样本对应的增强后的视频图像,直到所述总目标函数值收敛,且所述预设指标达到最佳。According to the updated training parameter value, using the video image enhancement method described in any one of claims 1-7 to obtain the enhanced video image corresponding to the training sample until the total objective function value converges includes: according to For the updated training parameter value, use the video image enhancement method described in any one of claims 1-7 to obtain the enhanced video image corresponding to the training sample until the total objective function value converges, and the preset The index is the best.
  10. 根据权利要求8-9任意一项所述的网络训练方法,其中,所述根据所述训练样本对应的增强后的视频图像和长曝光视频图像确定总目标函数值包括:The network training method according to any one of claims 8-9, wherein said determining the total objective function value according to the enhanced video image and long-exposure video image corresponding to the training sample includes:
    针对每一个所述训练样本,根据所述训练样本对应的增强后的视频图像和所述长曝光视频图像确定所述训练样本对应的目标函数值;以及For each of the training samples, determine an objective function value corresponding to the training samples according to the enhanced video image corresponding to the training sample and the long exposure video image; and
    根据所述训练样本对应的目标函数值确定所述总目标函数值。The total objective function value is determined according to the objective function value corresponding to the training samples.
  11. 根据权利要求10所述的网络训练方法,其中,所述根据所述训练样本对应的增强后的视频图像和所述长曝光视频图像确定所述训练样本对应的目标函数值包括:The network training method according to claim 10, wherein said determining the objective function value corresponding to the training sample according to the enhanced video image corresponding to the training sample and the long exposure video image comprises:
    按照公式L=αL enh+(1-α)L color确定所述训练样本对应的目标函数值; Determine the objective function value corresponding to the training sample according to the formula L=αL enh +(1-α)L color ;
    其中,L为所述训练样本对应的目标函数值,α为权重系数,L enh为所述训练样本对应的增强后的视频图像和所述长曝光视频图像的差值的绝对值的L 1范数,L color为颜色一致性损失函数。 Wherein, L is the objective function value corresponding to the training sample, α is the weight coefficient, L enh is the L1 norm of the absolute value of the difference between the enhanced video image corresponding to the training sample and the long-exposure video image Number, L color is the color consistency loss function.
  12. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;at least one processor;
    存储器,所述存储器上存储有至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,实现权利要求1-7任意一项所述的视频图像增强方法,或权利要求8-11任意一项所述的网络训练方法。A memory, at least one program is stored on the memory, and when the at least one program is executed by the at least one processor, the video image enhancement method described in any one of claims 1-7 is realized, or the method of claim 8- 11. The network training method described in any item.
  13. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-7任意一项所述的视频图像增强方法,或权利要求8-11任意一项所述的网络训练方法。A computer-readable storage medium, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the video image enhancement method described in any one of claims 1-7 is realized, or the claim The network training method described in any one of 8-11.
PCT/CN2022/081245 2021-10-09 2022-03-16 Video image augmentation method, network training method, electronic device and storage medium WO2023056730A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111174652.3A CN115965560A (en) 2021-10-09 2021-10-09 Video image enhancement method, network training method, electronic device and storage medium
CN202111174652.3 2021-10-09

Publications (1)

Publication Number Publication Date
WO2023056730A1 true WO2023056730A1 (en) 2023-04-13

Family

ID=85803141

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081245 WO2023056730A1 (en) 2021-10-09 2022-03-16 Video image augmentation method, network training method, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN115965560A (en)
WO (1) WO2023056730A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150714A (en) * 2013-03-12 2013-06-12 华东师范大学 Method and device for real-time interactive enhancement of magnetic resonance image
US20140016866A1 (en) * 2012-07-10 2014-01-16 Samsung Electronics Co., Ltd. Method and apparatus for processing image
CN104504652A (en) * 2014-10-10 2015-04-08 中国人民解放军理工大学 Image denoising method capable of quickly and effectively retaining edge and directional characteristics
CN111047532A (en) * 2019-12-06 2020-04-21 广东启迪图卫科技股份有限公司 Low-illumination video enhancement method based on 3D convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140016866A1 (en) * 2012-07-10 2014-01-16 Samsung Electronics Co., Ltd. Method and apparatus for processing image
CN103150714A (en) * 2013-03-12 2013-06-12 华东师范大学 Method and device for real-time interactive enhancement of magnetic resonance image
CN104504652A (en) * 2014-10-10 2015-04-08 中国人民解放军理工大学 Image denoising method capable of quickly and effectively retaining edge and directional characteristics
CN111047532A (en) * 2019-12-06 2020-04-21 广东启迪图卫科技股份有限公司 Low-illumination video enhancement method based on 3D convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XU YAN, ZHOU MIN-XIONG, XU LING, LIU WEI, YANG GUANG: "An Edge Enhancing Scheme for Non-Local Means Denoised MR Images", CHINESE JOURNAL OF MAGNETIC RESONANCE, vol. 30, no. 2, 5 June 2013 (2013-06-05), pages 183 - 193, XP093055861 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN116486107B (en) * 2023-06-21 2023-09-05 南昌航空大学 Optical flow calculation method, system, equipment and medium

Also Published As

Publication number Publication date
CN115965560A (en) 2023-04-14

Similar Documents

Publication Publication Date Title
CN111311629B (en) Image processing method, image processing device and equipment
CN111028177B (en) Edge-based deep learning image motion blur removing method
CN109101975B (en) Image semantic segmentation method based on full convolution neural network
US20230074180A1 (en) Method and apparatus for generating super night scene image, and electronic device and storage medium
CN112602088B (en) Method, system and computer readable medium for improving quality of low light images
CN112150400B (en) Image enhancement method and device and electronic equipment
CN113781320A (en) Image processing method and device, terminal equipment and storage medium
WO2023056730A1 (en) Video image augmentation method, network training method, electronic device and storage medium
Lu et al. Progressive joint low-light enhancement and noise removal for raw images
CN115035011A (en) Low-illumination image enhancement method for self-adaptive RetinexNet under fusion strategy
WO2021227915A1 (en) Method and apparatus for training image restoration model, and electronic device and computer-readable storage medium
López-Tapia et al. Deep learning approaches to inverse problems in imaging: Past, present and future
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
Yadav et al. Frequency-domain loss function for deep exposure correction of dark images
CN117011194B (en) Low-light image enhancement method based on multi-scale dual-channel attention network
WO2022096104A1 (en) Permutation invariant high dynamic range imaging
WO2023169582A1 (en) Image enhancement method and apparatus, device, and medium
CN116977208A (en) Low-illumination image enhancement method for double-branch fusion
CN110555805B (en) Image processing method, device, equipment and storage medium
CN116208812A (en) Video frame inserting method and system based on stereo event and intensity camera
CN111292251A (en) Image color cast correction method, device and computer storage medium
CN115841523A (en) Double-branch HDR video reconstruction algorithm based on Raw domain
US20220303557A1 (en) Processing of Chroma-Subsampled Video Using Convolutional Neural Networks
CN112767264B (en) Image deblurring method and system based on graph convolution neural network
CN112203023B (en) Billion pixel video generation method and device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877757

Country of ref document: EP

Kind code of ref document: A1