WO2021218414A1 - 视频增强方法及装置、电子设备、存储介质 - Google Patents
视频增强方法及装置、电子设备、存储介质 Download PDFInfo
- Publication number
- WO2021218414A1 WO2021218414A1 PCT/CN2021/079872 CN2021079872W WO2021218414A1 WO 2021218414 A1 WO2021218414 A1 WO 2021218414A1 CN 2021079872 W CN2021079872 W CN 2021079872W WO 2021218414 A1 WO2021218414 A1 WO 2021218414A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- scale
- level
- frame
- images
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 123
- 238000012545 processing Methods 0.000 claims abstract description 221
- 238000005070 sampling Methods 0.000 claims abstract description 112
- 230000008569 process Effects 0.000 claims description 74
- 238000012549 training Methods 0.000 claims description 26
- 238000000605 extraction Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 13
- 230000009467 reduction Effects 0.000 claims description 7
- 230000000694 effects Effects 0.000 abstract description 9
- 230000004913 activation Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the present disclosure relates to the field of image processing technology, and in particular, to a video enhancement method, a video enhancement device, an electronic device, and a non-volatile computer-readable storage medium.
- Image enhancement can purposefully emphasize the overall or partial characteristics of the image, make the original unclear image clear or emphasize some interesting features, so that it can improve the image quality and enrich the amount of information to meet the needs of some special analysis. . Therefore, image enhancement technology is widely used in various fields.
- a video enhancement method including:
- the inputting M frames of images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frames of images includes:
- N is an integer greater than 1;
- the input of is the image feature after the superposition processing of the output of the N+1-i-th down-sampling process and the output of the i-1-th up-sampling process; the magnification of the j-th up-sampling process and the N+1-j
- the reduction multiples of the down-sampling processing are the same, i is an integer from 2 to N, and j is an integer from 1 to N;
- the image feature of the third scale and the image feature of the first scale are superimposed to obtain an enhanced image corresponding to the image feature of the first scale.
- the video processing model is obtained by training an original video processing model through a target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss Including the loss of multi-level scale, the loss of each level of the multi-level scale loss is the loss of each level of up-sampling processing in the N-level up-sampling process.
- the loss of each level of upsampling processing is the loss between the first image and the second image
- the first image is the input of M frame sample images into the original video processing model for upsampling of the corresponding level
- the second image is the target image of each level of up-sampling processing, and the resolution of the first image and the second image are the same.
- training the original video processing model to obtain the trained video processing model includes:
- feature extraction is performed on the set of M frames of sample images to obtain at least one sample image feature of the first scale
- N-level down-sampling processing is performed on the sample image feature of the first scale to obtain the sample image feature of the second scale;
- N levels of upsampling processing are performed on the sample image features of the second scale to obtain a predicted output image corresponding to each level of upsampling;
- the difference between the target output image corresponding to this level of upsampling and the predicted output image corresponding to this level of upsampling is taken as the loss of this level of upsampling; where the target output image corresponding to the i-th level of upsampling is N+1-i-level down-sampling processing input to the sample enhanced image corresponding to the set of M frames of sample images;
- the sum of the losses of up-sampling at all levels is used as the target loss, and the network parameter value in the original video processing model is updated according to the target loss.
- each set of M frames of sample images corresponds to one frame of sample enhanced images
- the one frame of sample enhanced images is specifically an enhanced image corresponding to an intermediate frame sample image of the set of M frames of sample images, where M is an odd number greater than 1.
- the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images is specifically:
- the deblurred image corresponding to the intermediate frame sample image is the deblurred image corresponding to the intermediate frame sample image.
- the value of M is 3, 5, or 7.
- the method before the input of M frames of images into a pre-established video processing model, the method further includes:
- the step of inputting the M frame images into a pre-established video processing model is performed to obtain an enhanced image of at least one frame of the M frame images.
- the superimposing the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale includes:
- the superimposed feature is converted into image features of three channels, and an enhanced image corresponding to the image feature of the first scale is obtained.
- the superimposing the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale includes:
- the super-resolution processing is performed on the image feature of the third scale and the image feature of the first scale and then super-resolution processing is performed to obtain a super-resolution image corresponding to the image feature of the first scale.
- N is 4.
- a video enhancement device including:
- the image enhancement processor is configured to input M frames of images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frame images, where M is an integer greater than 1;
- the image enhancement processor is specifically configured to perform feature extraction on the M frame images to obtain at least one image feature of the first scale
- N is an integer greater than 1;
- the input of is the image feature after the superposition processing of the output of the N+1-i-th down-sampling process and the output of the i-1-th up-sampling process; the magnification of the j-th up-sampling process and the N+1-j
- the reduction multiples of the down-sampling processing are the same, i is an integer from 2 to N, and j is an integer from 1 to N;
- the image feature of the third scale and the image feature of the first scale are superimposed to obtain an enhanced image corresponding to the image feature of the first scale.
- the video processing model is obtained by training an original video processing model through a target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss Including the loss of multi-level scale, the loss of each level of the multi-level scale loss is the loss of each level of up-sampling processing in the N-level up-sampling process.
- the loss of each level of upsampling processing is the loss between the first image and the second image
- the first image is the input of M frame sample images into the original video processing model for upsampling of the corresponding level
- the second image is the target image of each level of up-sampling processing, and the resolution of the first image and the second image are the same.
- the video enhancement device of the embodiment of the present disclosure further includes:
- a sample acquisition processor configured to acquire multiple sets of M frame sample images and at least one frame of sample enhanced image corresponding to each set of the M frame sample images
- the model training processor is configured to perform feature extraction on the set of M frame sample images for each set of M frame sample images to obtain at least one sample image feature of the first scale;
- N-level down-sampling processing is performed on the sample image feature of the first scale to obtain the sample image feature of the second scale;
- N levels of upsampling processing are performed on the sample image features of the second scale to obtain a predicted output image corresponding to each level of upsampling;
- the difference between the target output image corresponding to this level of upsampling and the predicted output image corresponding to this level of upsampling is taken as the loss of this level of upsampling; where the target output image corresponding to the i-th level of upsampling is N+1-i-level down-sampling processing input to the sample enhanced image corresponding to the set of M frames of sample images;
- the sum of the losses of up-sampling at all levels is used as the target loss, and the network parameter value in the original video processing model is updated according to the target loss.
- each set of M frames of sample images corresponds to one frame of sample enhanced images
- the one frame of sample enhanced images is specifically an enhanced image corresponding to an intermediate frame sample image of the set of M frames of sample images, where M is an odd number greater than 1.
- the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images is specifically:
- the deblurred image corresponding to the intermediate frame sample image is the deblurred image corresponding to the intermediate frame sample image.
- the value of M is 3, 5, or 7.
- the video enhancement device of the embodiment of the present disclosure further includes:
- the to-be-processed video acquisition processor is configured to acquire L frame images in the to-be-processed video
- a video frame dividing processor configured to divide the L+M-1 frame image into L groups of M frame images, where L is an integer greater than M;
- the image enhancement processor is specifically configured to input M frames of images into a pre-established video processing model for each group of M frame images to obtain an enhanced image of at least one frame of the M frame images.
- the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale by the following steps, to obtain an enhanced image corresponding to the image feature of the first scale:
- the superimposed feature is converted into image features of three channels, and an enhanced image corresponding to the image feature of the first scale is obtained.
- the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale through the following steps, to obtain an enhanced image corresponding to the image feature of the first scale, including: :
- super-resolution processing is performed to obtain a super-resolution image corresponding to the image features of the first scale.
- N is 4.
- an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions Perform any of the methods described above.
- a non-volatile computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.
- Fig. 1 shows a schematic diagram of an exemplary system architecture of a video enhancement method that can be applied to an embodiment of the present disclosure
- Figure 2 shows a schematic structural diagram of a convolutional neural network
- Figure 3 shows a flow chart of a video enhancement method in an embodiment of the present disclosure
- FIG. 4 shows a schematic diagram of a network structure of a video processing model in an embodiment of the present disclosure
- Fig. 5 shows a flow chart of a method for training a video processing model in an embodiment of the present disclosure
- FIG. 6 shows a schematic diagram of another network structure of a video processing model in an embodiment of the present disclosure
- Fig. 7 shows a schematic structural diagram of a video enhancement device in an embodiment of the present disclosure
- FIG. 8 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present disclosure.
- the image can be enhanced based on the convolutional neural network algorithm.
- the video since the video is composed of multiple frames of images, the amount of calculation for video enhancement is relatively large and the calculation efficiency is low. In addition, the video enhancement effect of this algorithm is also poor.
- Fig. 1 shows a schematic diagram of an exemplary system architecture of a video enhancement method that can be applied to an embodiment of the present disclosure.
- the system architecture 100 may include one or more of terminal devices 101 and 102, a network 103 and a server 104.
- the network 103 is used to provide a medium for communication links between the terminal devices 101 and 102 and the server 104.
- the network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
- the terminal devices 101 and 102 may be various electronic devices with display screens, including but not limited to portable computers, smart phones, tablet computers, and so on. It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative, and any number of terminal devices, networks, and servers may be provided according to implementation needs.
- the server 104 may be a server cluster composed of multiple servers.
- the video enhancement method provided by the embodiment of the present disclosure is generally executed by the server 104, and accordingly, the video enhancement device is generally set in the server 104.
- the video enhancement method provided by the embodiments of the present disclosure can also be executed by the terminal devices 101 and 102. Accordingly, the video enhancement device can also be set in the terminal devices 101 and 102.
- the user may upload the to-be-processed video to the server 104 through the terminal devices 101 and 102, and the server 104 performs the processing of the to-be-processed video through the video enhancement method provided in the embodiment of the present disclosure.
- the obtained enhanced video may also be sent to the terminal devices 101 and 102.
- image enhancement can include: enhancement of image effects and enhancement of image morphology.
- the enhancement of image effects may include: image denoising, image deblurring, image restoration, etc.
- the enhancement of image morphology may include: image super-resolution processing, etc.
- Image enhancement can be achieved through convolutional neural networks.
- Convolutional neural network is a special structure of neural network, which can take the original image and the enhanced image of the original image as input and output respectively, and use the convolution kernel to replace the scalar weight.
- a convolutional neural network with a three-layer structure is shown in Figure 2. The network has 4 inputs, 3 outputs in the hidden layer, and 2 outputs in the output layer. The final system outputs two images.
- Per module represents a convolution kernel. k represents the input layer number, and i and j represent the unit numbers of input and output.
- Bias It is a set of scalars superimposed on the output of the convolutional layer. The output of the convolutional layer superimposed with the bias can be input to the activation layer. After training, the convolution kernel and bias are fixed.
- the training process is to optimize the parameters of the convolution kernel and bias through a set of matching inputs and outputs, and an optimization algorithm.
- each convolutional layer can contain dozens or hundreds of convolution kernels, and deep neural networks often contain more than 5 convolutional layers.
- the image enhancement algorithm based on convolutional neural network has many network parameters and low computational efficiency.
- the convolutional neural network cannot extract more image features, resulting in poor image enhancement.
- the embodiments of the present disclosure provide a video enhancement method, which can improve the calculation efficiency of video enhancement and improve the effect of video enhancement.
- M frames of images may be input to a pre-established video processing model to obtain an enhanced image of at least one frame of the M frames of images, where M is an integer greater than 1.
- the enhanced image of at least one frame of image here can be the enhanced image corresponding to the middle frame of the M frame image, or the enhanced image corresponding to other frames in the M frame image except the middle frame; for example, if M is 3, it can be Frame 2; if M is 5, it can be frame 3.
- Fig. 3 shows a flowchart of a video enhancement method in an embodiment of the present disclosure.
- the process of processing M frames of images by the video processing model may include the following steps:
- Step S310 Perform feature extraction on M frames of images to obtain at least one image feature of the first scale.
- step S320 to step S340 can be performed, so that the number of the finally obtained enhanced images is the same as the number of image features of the first scale.
- Step S320 For each image feature of the first scale, perform N-level down-sampling processing on the image feature of the first scale to obtain the image feature of the second scale, where N is an integer greater than 1.
- Step S330 Perform N levels of upsampling processing on the image features of the second scale to obtain image features of the third scale; wherein, the input of the first level of upsampling processing is the image feature of the second scale, and the image features of the i-th level of upsampling are processed.
- the input is the image feature after the superposition processing of the output of the N+1-i-th down-sampling processing and the i-1-th up-sampling processing; the magnification of the j-th upsampling processing and the N+1-j-th stage
- the downsampling processing has the same reduction factor, i is an integer from 2 to N, and j is an integer from 1 to N.
- Step S340 Perform superposition processing on the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale.
- the continuity between frames of the video processing can be ensured, and the occurrence of inter-frame jitter can be avoided.
- N-level down-sampling processing and N-level up-sampling processing on M frame images, that is, multi-scale feature extraction, calculation efficiency can be improved and calculations can be accelerated.
- step-by-step restoration and superimposition with the features in the corresponding down-sampling process high-level features and low-level features are merged, which can improve the feature expression ability, thereby improving the effect of video enhancement.
- step S310 feature extraction is performed on M frames of images to obtain at least one image feature of the first scale.
- M frames of images may be continuous video frames. It should be noted that, in order to ensure inter-frame continuity and avoid inter-frame jitter, M may be a small value, for example, it may be an integer from 2 to 7.
- M may be a small value, for example, it may be an integer from 2 to 7.
- the method for selecting M frames of images in the present disclosure is not limited to this, and 2 frames of images or 4 frames of images can also be selected.
- M the current frame, one frame of the current frame, the next two frames of the current frame, or the current frame, two frames of the current frame, and the next frame of the current frame can be selected.
- the first 3 frames of the current frame or the last 3 frames of the current frame can also be selected, which is not limited here.
- the method for obtaining M frames of images may be specifically as follows: First, L frames of images in the to-be-processed video may be obtained, where L is an integer greater than M. After that, the L frame images are grouped, and each group can be divided into M frame images. Since M is an integer greater than 1, when the grouping is performed in the above manner, the final number of groups obtained is less than L, which ultimately results in that the first few frames of images and the next few frames of images may not be processed when the to-be-processed video is enhanced. In order to avoid this problem, it can be added before the first frame of the L frame image and after the last frame of the image. Frame images to obtain L+M-1 frame images; divide L+M-1 frame images into L groups of M frame images. It can be seen that the number of added images can be determined according to the value of M. All the images added before the first frame of image may be the first frame of image, and all the images added after the last frame of image may be the last frame of image.
- the step of inputting the M frame images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frame images can be performed.
- a group of M-frame images is taken as an example for description. It is understandable that, for the video to be processed, the enhanced video can be obtained after the enhancement processing is performed on the L groups of M frame images.
- the 5 original images can be divided into 5 groups in the following way: the first group P1, P1, P2; the second group P1, P2, P3; the third group P2, P3, P4; the fourth group P3, P4 , P5;
- the fifth group P4, P5, P5; Take the pre-established video processing model for inputting 3 frames of image and outputting the enhanced image corresponding to the intermediate frame as an example, then input the above five groups of images to the pre-established video processing
- the model obtains the enhanced image P11 corresponding to P1 (corresponding to the output of the first group), the enhanced image P21 corresponding to P2 (corresponding to the output of the second group), the enhanced image P31 corresponding to P3 (corresponding to the output of the first group), and P4 corresponding
- the enhanced image P41 Take the pre-established video processing model for inputting 3 frames of image and outputting the enhanced image corresponding to the intermediate frame as an example, then input the above five groups of images to the pre-established video processing
- the model obtains the enhanced image P11 corresponding to P
- FIG. 4 shows a schematic diagram of a network structure of the video processing model in an embodiment of the present disclosure. It can be seen that the network structure of the video processing model may be a U-shaped network. The following describes the processing procedure of the video processing model in conjunction with FIG. 4.
- step S310 feature extraction is performed on M frames of images to obtain at least one image feature of the first scale.
- the scale of each frame of image is (H, W), that is, the resolution of the image is H ⁇ W.
- M frames of images are all RGB images
- the value of the number of channels C of the image is 3.
- M frame images are all grayscale images, then the value of C is 1.
- M frame images can be combined in the C channel, and the M frame image input to the video processing model can be expressed as (H, W, C*M).
- C*M represents the number of characteristic layers, for example, when M is 3 and C is 3, the number of characteristic layers is 9.
- the number of feature layers can be expanded without changing the resolution of the image. Therefore, the first scale is (H, W). For example, the number of feature layers can be expanded from C*M to F. In this way, the input M frame image changes from (H, W, C*M) to (H, W, F).
- F is a preset value, such as 64 or 128.
- the number of feature layers can be changed through a convolution operation.
- a convolution process can be performed on M frames of images to obtain image features of the first scale.
- the convolution kernel The size can be 3 ⁇ 3 and so on. Since the activation function can introduce non-linear factors to the neuron, the neural network can approach any non-linear function arbitrarily. Therefore, after performing convolution processing on M frames of images, an activation operation can also be performed to obtain image features of the first scale.
- the activation function can be a ReLU (linear rectification function) function or a sigmoid function. It should be noted that the method of performing feature extraction on M-frame images in the present disclosure is not limited to this.
- the number of image features of the first scale may be one or more.
- each image feature of the first scale may correspond to a feature of a different image.
- the image feature of the first scale may include: the feature of the second frame of image and the feature of the third frame of image.
- step S320 to step S340 can be performed, so that the enhanced image of the second frame of image and the enhanced image of the third frame of image can be obtained.
- the number of finally obtained enhanced images is the same as the number of image features of the first scale.
- an image feature of the first scale is used for description.
- Step S320 Perform N-level down-sampling processing on the image feature of the first scale to obtain the image feature of the second scale, where N is an integer greater than 1.
- N-level down-sampling refers to performing N down-sampling. After each downsampling, you can get smaller features than the original image, which is equivalent to compressing the image, and the area per unit area that can be sensed becomes larger. In this way, after N-level downsampling, more contour information can be obtained.
- the value of the step size may be 2, that is, the multiple of the downsampling may be twice.
- the image features of the first scale (H, W), after downsampling by 2 times, the scale can be obtained as The image features of, where [] represents the rounding operation.
- the present disclosure does not specifically limit the multiple of downsampling.
- the value of N may be 4. Refer to Figure 4, where the network structure is a U-shaped network when N is 4. In this way, when the down-sampling multiple is 2, the scales of the image features obtained through the 4-level down-sampling processing are as follows: with At this time, the image feature of the second scale is
- the image features after downsampling can also be activated and convolved.
- the downsampling layer after the downsampling layer, it can also include: an activation layer, a first convolutional layer, and another activation layer.
- the activation function in the activation layer can be a ReLU function, etc.
- the size of the convolution kernel in the first convolutional layer can be 3 ⁇ 3 etc.
- the network structure shown in FIG. 4 after the down-sampling layer, it may also include other network structures such as a convolution layer, an activation layer, and a pooling layer.
- Step S330 Perform N-level upsampling processing on the image feature of the second scale to obtain the image feature of the third scale.
- the input of the i-th upsampling process is the image feature after the superposition of the output of the N+1-i-th down-sampling process and the output of the i-1th up-sampling process; i is an integer from 2 to N.
- the N-level upsampling corresponds to the above-mentioned N-level downsampling.
- the N-level upsampling refers to performing N upsampling
- the first-level upsampling refers to performing the first upsampling.
- the input of the upsampling process is the image feature of the second scale.
- the j-th down-sampling and N+1-j-th up-sampling are located in the same layer of the U-shaped network, and the magnification of the j-th up-sampling processing is the same as the reduction of the N+1-j-th down-sampling processing.
- the resolution of the image before the down-sampling process at the jth level is the same as the resolution of the image after the up-sampling process at the N+1-jth level.
- the resolution of the image after the down-sampling process at the jth level is the same as the resolution of the image before the up-sampling process at the N+1-jth level.
- j is an integer of 1 to N.
- the output of the N+1-i-th stage down-sampling processing and the output of the i-1-th stage up-sampling processing can be superimposed and used as the i-th stage up-sampling processing.
- the input of the second-level upsampling process is the superposition of the output of the third-level down-sampling process and the output of the first-level upsampling process.
- the superposition processing refers to the fusion processing of two features, which may be feature superposition or the like.
- the output of the third-level down-sampling process is (a1, a2, a3)
- the output of the first-level up-sampling process is (b1, b2, b3)
- the two are superimposed to be (a1+b1, a2+b2) , A3+b3).
- the image features of each stage of down-sampling can be superimposed during the up-sampling process, that is, in the up-sampling process, the image features of each level can be combined, so that the accuracy of image feature extraction can be improved.
- the scale corresponding to the output of the N+1-i-th down-sampling process may correspond to the output of the i-1-th up-sampling process
- the output of the N+1-ith downsampling process can be cropped first, so that the cropped scale is the same as the scale corresponding to the output of the i-1th upsampling process.
- the scale of the obtained image feature is less than or equal to the first scale. That is, the third scale may be less than or equal to the first scale.
- the third scale is the same as the first scale, that is, (H, W).
- the third scale will be smaller than the first scale due to feature clipping.
- convolution processing and activation operations can also be performed.
- the upsampling layer it may also include: an activation layer, a second convolutional layer, and another activation layer.
- the activation function in the activation layer may be a ReLU function, etc., the size of the convolution kernel in the second convolutional layer It can be 4 ⁇ 4, etc.
- other network structures may also be included, which are not limited here.
- Step S340 Perform superposition processing on the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale.
- the image feature of the third scale and the image feature of the first scale can be directly superimposed to obtain the superimposed feature; the superimposed feature is converted into the image feature of the three channels to obtain the image feature of the first scale Corresponding enhanced image.
- the F value remains unchanged. Therefore, the superimposed feature can be converted into an image feature containing three channels through convolution processing. For example, three-channel RGB images can be output.
- a convolution operation can be corresponding, and the convolution operation can be used to convert the number of feature layers from F to 3, that is, to output three-channel image features.
- the parameters in the convolution operation after each level of upsampling may be shared. For example, as shown in Figure 4, all levels of up-sampling all include the same third convolutional layer. In this way, parameter sharing can reduce the parameters in the video processing model and speed up the network training process.
- the video processing model can be obtained by training the original video processing model through the target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss includes multi-level scale loss, multi-level The loss of each level of scale in the scale loss is the loss of each level of upsampling processing in the N-level upsampling process.
- the loss of each level of upsampling processing is the loss between the first image and the second image.
- the first image is obtained by inputting M frame sample images into the original video processing model and performing the corresponding level of upsampling processing, that is to say After each level of up-sampling, a first image can be output correspondingly.
- the resolution of the first image corresponding to different levels of up-sampling is different.
- each level of up-sampling corresponds to a second image.
- the second image is a target image processed by each level of up-sampling. The resolution of the first image and the second image are the same.
- sample-enhanced images corresponding to M frames of sample images can also be obtained, and the sample-enhanced images can be down-sampled at N-1 levels to obtain N-1 images with different resolutions.
- N-1 images with different resolutions and sample enhanced images can be used as N target images. For example, by performing N-1 level downsampling on the sample enhanced image, the target image of the first level upsampling process can be obtained; performing the 1 level downsampling on the sample enhanced image can obtain the target image of the N-1 level upsampling process; The sample-enhanced image can be used as the target image of the Nth-level upsampling process.
- FIG. 5 shows a flowchart of a training method of a video processing model in an embodiment of the present disclosure, which may include the following steps:
- Step S510 Obtain multiple sets of M frame sample images and at least one frame of sample enhanced image corresponding to each set of M frame sample images.
- the output is one or more frames of enhanced images.
- sample data including M frames of sample images and corresponding one or more frames of sample enhanced images can be obtained.
- the output of the video processing model is a frame of enhanced image
- one frame of sample enhanced image corresponding to each set of M frames of sample images may specifically be an enhanced image corresponding to the middle frame of the set of M frames of sample images.
- it can also be an enhanced image corresponding to a non-intermediate frame sample image.
- the multi-frame sample enhanced image corresponding to each set of M frame sample images can be specifically the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images and the enhanced image corresponding to the intermediate frame.
- the enhanced image of the adjacent sample image can of course also be the enhanced image of other sample enhanced images.
- the purpose of the video processing model is different, and the sample-enhanced image used can also be different.
- the video processing model to be trained is used for video denoising
- the enhanced image corresponding to the intermediate frame sample image of each group of M frame sample images is specifically: the denoising image corresponding to the intermediate frame sample image.
- the video processing model to be trained is used for video deblurring
- the enhanced image corresponding to the intermediate frame sample image of each group of M frame sample images is specifically: the deblurred image corresponding to the intermediate frame sample image.
- the video processing model of the embodiment of the present disclosure is not limited to this.
- Step S520 for each set of M frame sample images, feature extraction is performed on the set of M frame sample images to obtain at least one sample image feature of the first scale.
- steps S530 to S560 are executed:
- Step S530 For each sample image feature of the first scale, perform N-level down-sampling processing on the sample image feature of the first scale to obtain the sample image feature of the second scale.
- the batch size (block size) of model training can be set, that is, the number of sample data simultaneously input to the model. Assuming that the block size is B, the size of the final input model is (B, H, W, C*M).
- step S520 to step S530 Since the processing procedure for each group of sample images of M frames in step S520 to step S530 is similar to the processing procedure in step S310 to step S320 described above, for details, please refer to the description in step S310 to step S320, which will not be repeated here.
- Step S540 Perform N-level up-sampling processing on the sample image features of the second scale to obtain a predicted output image corresponding to each level of up-sampling.
- each level of upsampling can correspond to a convolution operation.
- convolution processing can be performed on the output features of that level of upsampling, so that a multi-scale predicted output image can be obtained.
- N 4
- five images of different scales can be outputted respectively, F1, F2, F3, F4, F5, where the five scales are: (H, W) , with
- Step S550 for each level of upsampling, the difference between the target output image corresponding to the level of upsampling and the predicted output image corresponding to the level of upsampling is taken as the loss of the level of upsampling; where the target corresponding to the i-th level of upsampling
- the output image is the input of performing N+1-i downsampling processing on the sample enhanced image corresponding to the set of M frames of sample images.
- the target output image corresponding to the i-th level upsampling is an image that can be output under ideal conditions, and specifically may be N+1-i level downsampling of the sample enhanced image corresponding to the set of M frames of sample images Processed input.
- the target output image corresponding to the first level of upsampling can be the input of performing N levels of downsampling on the sample enhanced image, that is, the sample is enhanced
- the image is the output of N-1 down-sampling processing.
- step S560 the sum of the losses of up-sampling at all levels is taken as the target loss, and the network parameter value in the original video processing model is updated according to the target loss.
- the gradient descent method can be used to continuously calculate the loss according to the principle of back propagation, and update the network parameter values according to the loss.
- a video processing model can be obtained.
- the preset threshold can be set according to actual applications, which is not limited here.
- multi-scale loss is used to approximate the small-scale features layer by layer, which helps to better restore the details of the high-definition image, thereby improving the effect of video enhancement.
- FIG. 6 shows a schematic diagram of another network structure of the video processing model in an embodiment of the present disclosure. It can be seen that, compared with the network structure shown in FIG. 4, after the Nth level of upsampling processing, an upsampling layer is added. At this time, the video processing model can be used for video super-resolution processing.
- step S350 may specifically include super-resolution processing after superimposing the image features of the third scale and the image features of the first scale to obtain a super-resolution image corresponding to the image features of the first scale. .
- step S350 may specifically include super-resolution processing after superimposing the image features of the third scale and the image features of the first scale to obtain a super-resolution image corresponding to the image features of the first scale.
- the sample-enhanced image in the sample data used in training may be a super-resolution image of the middle frame of the corresponding M frames of sample images.
- the training process is similar to that of the network structure shown in Fig. 4.
- the parameters in the convolution operation after upsampling at all levels can be shared, and all levels can be The sum of the sampled losses is used as the final loss, and the network parameter values are updated according to the final loss.
- a video processing model for super-resolution processing can be obtained.
- the video enhancement method of the embodiment of the present disclosure can save the calculation speed and improve the calculation efficiency through the U-shaped network. And calculating the loss at multiple scales can maximize the effect of the output image.
- the present disclosure can be used for various video enhancement functions such as video denoising, video deblurring, and video super-resolution processing.
- a video enhancement device 700 is also provided, as shown in FIG. 7, including:
- the image enhancement processor 710 is configured to input M frames of images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frames of images, where M is an integer greater than 1;
- the image enhancement processor 710 is specifically configured to perform feature extraction on M frames of images to obtain at least one image feature of the first scale;
- N is an integer greater than 1;
- N-level upsampling processing on the image features of the second scale to obtain the image features of the third scale; among them, the input of the first-level upsampling process is the image feature of the second-scale, and the input of the i-th upsampling process is the first
- the output of the N+1-i downsampling process and the output of the i-1th upsampling process are superimposed on the image features; the magnification of the j-th upsampling process and the N+1-j downsampling process
- the reduction multiple of is the same, i is an integer from 2 to N, and j is an integer from 1 to N;
- the image feature of the third scale and the image feature of the first scale are superimposed to obtain an enhanced image corresponding to the image feature of the first scale.
- the video processing model is obtained by training the original video processing model through the target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss Including the loss of multi-level scale, the loss of each level of the multi-level scale loss is the loss of each level of up-sampling processing in the N-level up-sampling process.
- the loss of each level of up-sampling processing is the loss between the first image and the second image
- the first image is the result of inputting M frames of sample images into the original video processing model for the corresponding level.
- the second image obtained by the up-sampling processing is the target image of each level of up-sampling processing, and the resolution of the first image and the second image are the same.
- the above-mentioned video enhancement device further includes:
- the sample acquisition processor is configured to acquire multiple sets of M frames of sample images and at least one frame of sample enhanced image corresponding to each set of M frames of sample images;
- the model training processor is configured to perform feature extraction on the set of M frame sample images for each set of M frame sample images to obtain at least one sample image feature of the first scale;
- N-level down-sampling processing is performed on the sample image feature of the first scale to obtain the sample image feature of the second scale;
- the difference between the target output image corresponding to this level of upsampling and the predicted output image corresponding to this level of upsampling is taken as the loss of this level of upsampling; where the target output image corresponding to the i-th level of upsampling is N+1-i-level down-sampling processing input to the sample enhanced image corresponding to the set of M frames of sample images;
- the sum of the losses of up-sampling at all levels is taken as the target loss, and the network parameter values in the original video processing model are updated according to the target loss.
- each set of M frames of sample images corresponds to one frame of sample enhanced images
- one frame of sample enhanced images is specifically an enhanced image corresponding to the intermediate frame sample images of the set of M frames of sample images, wherein, M is an odd number greater than 1.
- the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images is specifically:
- the deblurred image corresponding to the sample image in the middle frame is the deblurred image corresponding to the sample image in the middle frame.
- the value of M is 3, 5, or 7.
- the above-mentioned video enhancement device further includes:
- the to-be-processed video acquisition processor is configured to acquire L frame images in the to-be-processed video
- the video frame dividing processor is configured to divide L+M-1 frame images into L groups of M frame images, where L is an integer greater than M;
- the image enhancement processor is specifically configured to input M frames of images into a pre-established video processing model for each group of M frame images to obtain an enhanced image of at least one frame of the M frame images.
- the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale through the following steps, and obtains the image feature corresponding to the image feature of the first scale.
- the superimposed feature is converted into the image feature of the three channels, and the enhanced image corresponding to the image feature of the first scale is obtained.
- the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale through the following steps, and obtains the image feature corresponding to the image feature of the first scale.
- Enhanced images including:
- super-resolution processing is performed on the image feature of the third scale and the image feature of the first scale
- super-resolution processing is performed to obtain a super-resolution image corresponding to the image feature of the first scale.
- the value of N is 4.
- each processor in the above device can be a general-purpose processor, including: a central processing unit, a network processor, etc.; it can also be a digital signal processor, an application specific integrated circuit, a field programmable gate array, or other programmable logic Devices, discrete gates or transistor logic devices, discrete hardware components.
- Each processor in the above-mentioned device may be an independent processor, or may be integrated together.
- modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
- the features and functions of two or more modules or units described above may be embodied in one module or unit.
- the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
- an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the video enhancement method in this exemplary embodiment All or part of the steps.
- Fig. 8 shows a schematic structural diagram of a computer system for implementing an electronic device of an embodiment of the present disclosure. It should be noted that the computer system 800 of the electronic device shown in FIG. 8 is only an example, and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.
- the computer system 800 includes a central processing unit 801, which can perform various appropriate actions and processes according to a program stored in a read-only memory 802 or a program loaded from the storage portion 808 to the random access memory 803 .
- a program stored in a read-only memory 802 or a program loaded from the storage portion 808 to the random access memory 803 In the random access memory 803, various programs and data required for system operation are also stored.
- the central processing unit 801, the read-only memory 802, and the random access memory 803 are connected to each other through a bus 804.
- the input/output interface 805 is also connected to the bus 804.
- the following components are connected to the input/output interface 805: an input part 806 including a keyboard, a mouse, etc.; an output part 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 808 including a hard disk, etc. ; And a communication section 809 including a network interface card such as a local area network (LAN) card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet.
- the driver 810 is also connected to the input/output interface 805 as needed.
- a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that the computer program read from it is installed into the storage section 808 as needed.
- an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from the network through the communication section 809, and/or installed from the removable medium 811.
- the central processing unit 801 various functions defined in the device of the present application are executed.
- a non-volatile computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.
- non-volatile computer-readable storage medium shown in the present disclosure may be, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any of the above combination. More specific examples of non-volatile computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a non-volatile computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than a non-volatile computer-readable storage medium.
- the computer-readable medium may be sent, propagated, or transmitted for use by or in combination with the instruction execution system, apparatus, or device
- the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, radio frequency, etc., or any suitable combination of the above.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (15)
- 一种视频增强方法,其中,包括:将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,M为大于1的整数;所述将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,包括:对所述M帧图像进行特征提取,得到至少一个第一尺度的图像特征;针对每个所述第一尺度的图像特征,均执行以下过程:对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数;对所述第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为所述第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
- 根据权利要求1所述的方法,其中,所述预先建立的视频处理模型是通过目标损失对原始视频处理模型进行训练得到;所述原始视频处理模型被配置为对输入所述原始视频处理模型的视频进行视频增强处理;所述目标损失包括多级尺度的损失,所述多级尺度的损失中每级尺度的损失为N级上采样处理中每级上采样处理的损失。
- 根据权利要求2所述的方法,其中,所述每级上采样处理的损失是第一图像和第二图像之间的损失,所述第一图像是将M帧样本图像输入所述原始视频处理模型进行对应级的上采样处理得到的,所述第二图像是每级上采样处理的目标图像,所述第一图像和所述第二图像的分辨率是相同的。
- 根据权利要求2所述的方法,其中,对所述原始视频处理模型进行训练得到所述预先建立的视频处理模型,包括:获取多组M帧样本图像以及每组所述M帧样本图像对应的至少一帧样本增 强图像;针对每组M帧样本图像,对所述每组M帧样本图像进行特征提取,得到至少一个第一尺度的样本图像特征;针对每个第一尺度的样本图像特征,均执行以下过程:对第一尺度的样本图像特征进行N级下采样处理,得到第二尺度的样本图像特征;对所述第二尺度的样本图像特征进行N级上采样处理,得到每级上采样对应的预测输出图像;针对每级上采样,将所述每级上采样对应的目标输出图像与该级上采样对应的预测输出图像的差值作为该级上采样的损失;其中,第i级上采样对应的目标输出图像为对该组M帧样本图像对应的样本增强图像进行N+1-i级下采样处理的输入;将各级上采样的损失之和作为所述目标损失,根据所述目标损失更新所述原始视频处理模型中的网络参数值。
- 根据权利要求4所述的方法,其中,每组M帧样本图像对应一帧样本增强图像,所述一帧样本增强图像具体为所述每组M帧样本图像的中间帧样本图像对应的增强图像,其中,M为大于1的奇数。
- 根据权利要求1所述的方法,其中,M的值为3、5或7。
- 根据权利要求1所述的方法,其中,所述对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:将所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到叠加特征;将所述叠加特征转换为三个通道的图像特征,得到该第一尺度的图像特征对应的增强图像。
- 根据权利要求1所述的方法,其中,所述对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:对所述第三尺度的图像特征和所述第一尺度的图像特征进行叠加处理之后进行超分辨率处理,得到所述第一尺度的图像特征对应的超分辨率图像。
- 根据权利要求1所述的方法,其中,N的值为4。
- 一种视频增强装置,其中,包括:图像增强处理器,被配置为将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,M为大于1的整数;所述图像增强处理器,具体被配置为对所述M帧图像进行特征提取,得到至少一个第一尺度的图像特征;针对每个所述第一尺度的图像特征,均执行以下过程:对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数;对所述第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为所述第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
- 根据权利要求11所述的装置,其中,所述预先建立的视频处理模型是通过目标损失对原始视频处理模型进行训练得到;所述原始视频处理模型被配置为对输入所述原始视频处理模型的视频进行视频增强处理;所述目标损失包括多级尺度的损失,所述多级尺度的损失中每级尺度的损失为N级上采样处理 中每级上采样处理的损失。
- 根据权利要求12所述的装置,其中,所述每级上采样处理的损失是第一图像和第二图像之间的损失,所述第一图像是将M帧样本图像输入所述原始视频处理模型进行对应级的上采样处理得到的,所述第二图像是每级上采样处理的目标图像,所述第一图像和所述第二图像的分辨率是相同的。
- 一种电子设备,其中,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1~10任一项所述的方法。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1~10任一项所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/630,784 US20220318950A1 (en) | 2020-04-30 | 2021-03-10 | Video enhancement method and apparatus, and electronic device and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010366748.9 | 2020-04-30 | ||
CN202010366748.9A CN113592723B (zh) | 2020-04-30 | 2020-04-30 | 视频增强方法及装置、电子设备、存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021218414A1 true WO2021218414A1 (zh) | 2021-11-04 |
Family
ID=78237621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/079872 WO2021218414A1 (zh) | 2020-04-30 | 2021-03-10 | 视频增强方法及装置、电子设备、存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220318950A1 (zh) |
CN (1) | CN113592723B (zh) |
WO (1) | WO2021218414A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117528147A (zh) * | 2024-01-03 | 2024-02-06 | 国家广播电视总局广播电视科学研究院 | 基于云边协同架构的视频增强传输方法、系统及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160213244A1 (en) * | 2010-08-27 | 2016-07-28 | Sony Corporation | Image processing apparatus and method |
CN108648163A (zh) * | 2018-05-17 | 2018-10-12 | 厦门美图之家科技有限公司 | 一种人脸图像的增强方法及计算设备 |
CN109087306A (zh) * | 2018-06-28 | 2018-12-25 | 众安信息技术服务有限公司 | 动脉血管图像模型训练方法、分割方法、装置及电子设备 |
CN109493317A (zh) * | 2018-09-25 | 2019-03-19 | 哈尔滨理工大学 | 基于级联卷积神经网络的3d多椎骨分割方法 |
CN110458771A (zh) * | 2019-07-29 | 2019-11-15 | 深圳市商汤科技有限公司 | 图像处理方法及装置、电子设备和存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109345449B (zh) * | 2018-07-17 | 2020-11-10 | 西安交通大学 | 一种基于融合网络的图像超分辨率及去非均匀模糊方法 |
WO2020108009A1 (en) * | 2018-11-26 | 2020-06-04 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Method, system, and computer-readable medium for improving quality of low-light images |
US12033301B2 (en) * | 2019-09-09 | 2024-07-09 | Nvidia Corporation | Video upsampling using one or more neural networks |
KR20190117416A (ko) * | 2019-09-26 | 2019-10-16 | 엘지전자 주식회사 | 동영상 프레임 해상도를 향상시키기 위한 방법 및 장치 |
-
2020
- 2020-04-30 CN CN202010366748.9A patent/CN113592723B/zh active Active
-
2021
- 2021-03-10 US US17/630,784 patent/US20220318950A1/en active Pending
- 2021-03-10 WO PCT/CN2021/079872 patent/WO2021218414A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160213244A1 (en) * | 2010-08-27 | 2016-07-28 | Sony Corporation | Image processing apparatus and method |
CN108648163A (zh) * | 2018-05-17 | 2018-10-12 | 厦门美图之家科技有限公司 | 一种人脸图像的增强方法及计算设备 |
CN109087306A (zh) * | 2018-06-28 | 2018-12-25 | 众安信息技术服务有限公司 | 动脉血管图像模型训练方法、分割方法、装置及电子设备 |
CN109493317A (zh) * | 2018-09-25 | 2019-03-19 | 哈尔滨理工大学 | 基于级联卷积神经网络的3d多椎骨分割方法 |
CN110458771A (zh) * | 2019-07-29 | 2019-11-15 | 深圳市商汤科技有限公司 | 图像处理方法及装置、电子设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20220318950A1 (en) | 2022-10-06 |
CN113592723B (zh) | 2024-04-02 |
CN113592723A (zh) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109389556B (zh) | 一种多尺度空洞卷积神经网络超分辨率重构方法及装置 | |
CN111104962B (zh) | 图像的语义分割方法、装置、电子设备及可读存储介质 | |
WO2019120110A1 (zh) | 图像重建方法及设备 | |
CN113240580B (zh) | 一种基于多维度知识蒸馏的轻量级图像超分辨率重建方法 | |
WO2021098362A1 (zh) | 视频分类模型构建、视频分类的方法、装置、设备及介质 | |
US20210209459A1 (en) | Processing method and system for convolutional neural network, and storage medium | |
CN110992265B (zh) | 图像处理方法及模型、模型的训练方法及电子设备 | |
CN106228512A (zh) | 基于学习率自适应的卷积神经网络图像超分辨率重建方法 | |
CN110298851B (zh) | 人体分割神经网络的训练方法及设备 | |
CN117499658A (zh) | 使用神经网络生成视频帧 | |
CN113129212B (zh) | 图像超分辨率重建方法、装置、终端设备及存储介质 | |
EP3952312A1 (en) | Method and apparatus for video frame interpolation, and device and storage medium | |
US20230252605A1 (en) | Method and system for a high-frequency attention network for efficient single image super-resolution | |
CN114298900A (zh) | 图像超分方法和电子设备 | |
CN115409855B (zh) | 图像处理方法、装置、电子设备和存储介质 | |
WO2021057463A1 (zh) | 图像风格化处理方法、装置、电子设备及可读介质 | |
WO2021218414A1 (zh) | 视频增强方法及装置、电子设备、存储介质 | |
CN111667401A (zh) | 多层次渐变图像风格迁移方法及系统 | |
CN113628115A (zh) | 图像重建的处理方法、装置、电子设备和存储介质 | |
US20240155071A1 (en) | Text to video generation | |
Liu et al. | Cross-resolution feature attention network for image super-resolution | |
WO2024001653A9 (zh) | 特征提取方法、装置、存储介质及电子设备 | |
WO2023040813A1 (zh) | 人脸图像处理方法、装置、设备及介质 | |
Ye et al. | Multi-directional feature fusion super-resolution network based on nonlinear spiking neural P systems | |
CN109447900A (zh) | 一种图像超分辨率重建方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21796267 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21796267 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21796267 Country of ref document: EP Kind code of ref document: A1 |