WO2021218414A1 - 视频增强方法及装置、电子设备、存储介质 - Google Patents

视频增强方法及装置、电子设备、存储介质 Download PDF

Info

Publication number
WO2021218414A1
WO2021218414A1 PCT/CN2021/079872 CN2021079872W WO2021218414A1 WO 2021218414 A1 WO2021218414 A1 WO 2021218414A1 CN 2021079872 W CN2021079872 W CN 2021079872W WO 2021218414 A1 WO2021218414 A1 WO 2021218414A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scale
level
frame
images
Prior art date
Application number
PCT/CN2021/079872
Other languages
English (en)
French (fr)
Inventor
朱丹
段然
陈冠男
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/630,784 priority Critical patent/US20220318950A1/en
Publication of WO2021218414A1 publication Critical patent/WO2021218414A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to a video enhancement method, a video enhancement device, an electronic device, and a non-volatile computer-readable storage medium.
  • Image enhancement can purposefully emphasize the overall or partial characteristics of the image, make the original unclear image clear or emphasize some interesting features, so that it can improve the image quality and enrich the amount of information to meet the needs of some special analysis. . Therefore, image enhancement technology is widely used in various fields.
  • a video enhancement method including:
  • the inputting M frames of images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frames of images includes:
  • N is an integer greater than 1;
  • the input of is the image feature after the superposition processing of the output of the N+1-i-th down-sampling process and the output of the i-1-th up-sampling process; the magnification of the j-th up-sampling process and the N+1-j
  • the reduction multiples of the down-sampling processing are the same, i is an integer from 2 to N, and j is an integer from 1 to N;
  • the image feature of the third scale and the image feature of the first scale are superimposed to obtain an enhanced image corresponding to the image feature of the first scale.
  • the video processing model is obtained by training an original video processing model through a target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss Including the loss of multi-level scale, the loss of each level of the multi-level scale loss is the loss of each level of up-sampling processing in the N-level up-sampling process.
  • the loss of each level of upsampling processing is the loss between the first image and the second image
  • the first image is the input of M frame sample images into the original video processing model for upsampling of the corresponding level
  • the second image is the target image of each level of up-sampling processing, and the resolution of the first image and the second image are the same.
  • training the original video processing model to obtain the trained video processing model includes:
  • feature extraction is performed on the set of M frames of sample images to obtain at least one sample image feature of the first scale
  • N-level down-sampling processing is performed on the sample image feature of the first scale to obtain the sample image feature of the second scale;
  • N levels of upsampling processing are performed on the sample image features of the second scale to obtain a predicted output image corresponding to each level of upsampling;
  • the difference between the target output image corresponding to this level of upsampling and the predicted output image corresponding to this level of upsampling is taken as the loss of this level of upsampling; where the target output image corresponding to the i-th level of upsampling is N+1-i-level down-sampling processing input to the sample enhanced image corresponding to the set of M frames of sample images;
  • the sum of the losses of up-sampling at all levels is used as the target loss, and the network parameter value in the original video processing model is updated according to the target loss.
  • each set of M frames of sample images corresponds to one frame of sample enhanced images
  • the one frame of sample enhanced images is specifically an enhanced image corresponding to an intermediate frame sample image of the set of M frames of sample images, where M is an odd number greater than 1.
  • the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images is specifically:
  • the deblurred image corresponding to the intermediate frame sample image is the deblurred image corresponding to the intermediate frame sample image.
  • the value of M is 3, 5, or 7.
  • the method before the input of M frames of images into a pre-established video processing model, the method further includes:
  • the step of inputting the M frame images into a pre-established video processing model is performed to obtain an enhanced image of at least one frame of the M frame images.
  • the superimposing the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale includes:
  • the superimposed feature is converted into image features of three channels, and an enhanced image corresponding to the image feature of the first scale is obtained.
  • the superimposing the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale includes:
  • the super-resolution processing is performed on the image feature of the third scale and the image feature of the first scale and then super-resolution processing is performed to obtain a super-resolution image corresponding to the image feature of the first scale.
  • N is 4.
  • a video enhancement device including:
  • the image enhancement processor is configured to input M frames of images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frame images, where M is an integer greater than 1;
  • the image enhancement processor is specifically configured to perform feature extraction on the M frame images to obtain at least one image feature of the first scale
  • N is an integer greater than 1;
  • the input of is the image feature after the superposition processing of the output of the N+1-i-th down-sampling process and the output of the i-1-th up-sampling process; the magnification of the j-th up-sampling process and the N+1-j
  • the reduction multiples of the down-sampling processing are the same, i is an integer from 2 to N, and j is an integer from 1 to N;
  • the image feature of the third scale and the image feature of the first scale are superimposed to obtain an enhanced image corresponding to the image feature of the first scale.
  • the video processing model is obtained by training an original video processing model through a target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss Including the loss of multi-level scale, the loss of each level of the multi-level scale loss is the loss of each level of up-sampling processing in the N-level up-sampling process.
  • the loss of each level of upsampling processing is the loss between the first image and the second image
  • the first image is the input of M frame sample images into the original video processing model for upsampling of the corresponding level
  • the second image is the target image of each level of up-sampling processing, and the resolution of the first image and the second image are the same.
  • the video enhancement device of the embodiment of the present disclosure further includes:
  • a sample acquisition processor configured to acquire multiple sets of M frame sample images and at least one frame of sample enhanced image corresponding to each set of the M frame sample images
  • the model training processor is configured to perform feature extraction on the set of M frame sample images for each set of M frame sample images to obtain at least one sample image feature of the first scale;
  • N-level down-sampling processing is performed on the sample image feature of the first scale to obtain the sample image feature of the second scale;
  • N levels of upsampling processing are performed on the sample image features of the second scale to obtain a predicted output image corresponding to each level of upsampling;
  • the difference between the target output image corresponding to this level of upsampling and the predicted output image corresponding to this level of upsampling is taken as the loss of this level of upsampling; where the target output image corresponding to the i-th level of upsampling is N+1-i-level down-sampling processing input to the sample enhanced image corresponding to the set of M frames of sample images;
  • the sum of the losses of up-sampling at all levels is used as the target loss, and the network parameter value in the original video processing model is updated according to the target loss.
  • each set of M frames of sample images corresponds to one frame of sample enhanced images
  • the one frame of sample enhanced images is specifically an enhanced image corresponding to an intermediate frame sample image of the set of M frames of sample images, where M is an odd number greater than 1.
  • the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images is specifically:
  • the deblurred image corresponding to the intermediate frame sample image is the deblurred image corresponding to the intermediate frame sample image.
  • the value of M is 3, 5, or 7.
  • the video enhancement device of the embodiment of the present disclosure further includes:
  • the to-be-processed video acquisition processor is configured to acquire L frame images in the to-be-processed video
  • a video frame dividing processor configured to divide the L+M-1 frame image into L groups of M frame images, where L is an integer greater than M;
  • the image enhancement processor is specifically configured to input M frames of images into a pre-established video processing model for each group of M frame images to obtain an enhanced image of at least one frame of the M frame images.
  • the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale by the following steps, to obtain an enhanced image corresponding to the image feature of the first scale:
  • the superimposed feature is converted into image features of three channels, and an enhanced image corresponding to the image feature of the first scale is obtained.
  • the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale through the following steps, to obtain an enhanced image corresponding to the image feature of the first scale, including: :
  • super-resolution processing is performed to obtain a super-resolution image corresponding to the image features of the first scale.
  • N is 4.
  • an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions Perform any of the methods described above.
  • a non-volatile computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture of a video enhancement method that can be applied to an embodiment of the present disclosure
  • Figure 2 shows a schematic structural diagram of a convolutional neural network
  • Figure 3 shows a flow chart of a video enhancement method in an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of a network structure of a video processing model in an embodiment of the present disclosure
  • Fig. 5 shows a flow chart of a method for training a video processing model in an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of another network structure of a video processing model in an embodiment of the present disclosure
  • Fig. 7 shows a schematic structural diagram of a video enhancement device in an embodiment of the present disclosure
  • FIG. 8 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present disclosure.
  • the image can be enhanced based on the convolutional neural network algorithm.
  • the video since the video is composed of multiple frames of images, the amount of calculation for video enhancement is relatively large and the calculation efficiency is low. In addition, the video enhancement effect of this algorithm is also poor.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture of a video enhancement method that can be applied to an embodiment of the present disclosure.
  • the system architecture 100 may include one or more of terminal devices 101 and 102, a network 103 and a server 104.
  • the network 103 is used to provide a medium for communication links between the terminal devices 101 and 102 and the server 104.
  • the network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the terminal devices 101 and 102 may be various electronic devices with display screens, including but not limited to portable computers, smart phones, tablet computers, and so on. It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative, and any number of terminal devices, networks, and servers may be provided according to implementation needs.
  • the server 104 may be a server cluster composed of multiple servers.
  • the video enhancement method provided by the embodiment of the present disclosure is generally executed by the server 104, and accordingly, the video enhancement device is generally set in the server 104.
  • the video enhancement method provided by the embodiments of the present disclosure can also be executed by the terminal devices 101 and 102. Accordingly, the video enhancement device can also be set in the terminal devices 101 and 102.
  • the user may upload the to-be-processed video to the server 104 through the terminal devices 101 and 102, and the server 104 performs the processing of the to-be-processed video through the video enhancement method provided in the embodiment of the present disclosure.
  • the obtained enhanced video may also be sent to the terminal devices 101 and 102.
  • image enhancement can include: enhancement of image effects and enhancement of image morphology.
  • the enhancement of image effects may include: image denoising, image deblurring, image restoration, etc.
  • the enhancement of image morphology may include: image super-resolution processing, etc.
  • Image enhancement can be achieved through convolutional neural networks.
  • Convolutional neural network is a special structure of neural network, which can take the original image and the enhanced image of the original image as input and output respectively, and use the convolution kernel to replace the scalar weight.
  • a convolutional neural network with a three-layer structure is shown in Figure 2. The network has 4 inputs, 3 outputs in the hidden layer, and 2 outputs in the output layer. The final system outputs two images.
  • Per module represents a convolution kernel. k represents the input layer number, and i and j represent the unit numbers of input and output.
  • Bias It is a set of scalars superimposed on the output of the convolutional layer. The output of the convolutional layer superimposed with the bias can be input to the activation layer. After training, the convolution kernel and bias are fixed.
  • the training process is to optimize the parameters of the convolution kernel and bias through a set of matching inputs and outputs, and an optimization algorithm.
  • each convolutional layer can contain dozens or hundreds of convolution kernels, and deep neural networks often contain more than 5 convolutional layers.
  • the image enhancement algorithm based on convolutional neural network has many network parameters and low computational efficiency.
  • the convolutional neural network cannot extract more image features, resulting in poor image enhancement.
  • the embodiments of the present disclosure provide a video enhancement method, which can improve the calculation efficiency of video enhancement and improve the effect of video enhancement.
  • M frames of images may be input to a pre-established video processing model to obtain an enhanced image of at least one frame of the M frames of images, where M is an integer greater than 1.
  • the enhanced image of at least one frame of image here can be the enhanced image corresponding to the middle frame of the M frame image, or the enhanced image corresponding to other frames in the M frame image except the middle frame; for example, if M is 3, it can be Frame 2; if M is 5, it can be frame 3.
  • Fig. 3 shows a flowchart of a video enhancement method in an embodiment of the present disclosure.
  • the process of processing M frames of images by the video processing model may include the following steps:
  • Step S310 Perform feature extraction on M frames of images to obtain at least one image feature of the first scale.
  • step S320 to step S340 can be performed, so that the number of the finally obtained enhanced images is the same as the number of image features of the first scale.
  • Step S320 For each image feature of the first scale, perform N-level down-sampling processing on the image feature of the first scale to obtain the image feature of the second scale, where N is an integer greater than 1.
  • Step S330 Perform N levels of upsampling processing on the image features of the second scale to obtain image features of the third scale; wherein, the input of the first level of upsampling processing is the image feature of the second scale, and the image features of the i-th level of upsampling are processed.
  • the input is the image feature after the superposition processing of the output of the N+1-i-th down-sampling processing and the i-1-th up-sampling processing; the magnification of the j-th upsampling processing and the N+1-j-th stage
  • the downsampling processing has the same reduction factor, i is an integer from 2 to N, and j is an integer from 1 to N.
  • Step S340 Perform superposition processing on the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale.
  • the continuity between frames of the video processing can be ensured, and the occurrence of inter-frame jitter can be avoided.
  • N-level down-sampling processing and N-level up-sampling processing on M frame images, that is, multi-scale feature extraction, calculation efficiency can be improved and calculations can be accelerated.
  • step-by-step restoration and superimposition with the features in the corresponding down-sampling process high-level features and low-level features are merged, which can improve the feature expression ability, thereby improving the effect of video enhancement.
  • step S310 feature extraction is performed on M frames of images to obtain at least one image feature of the first scale.
  • M frames of images may be continuous video frames. It should be noted that, in order to ensure inter-frame continuity and avoid inter-frame jitter, M may be a small value, for example, it may be an integer from 2 to 7.
  • M may be a small value, for example, it may be an integer from 2 to 7.
  • the method for selecting M frames of images in the present disclosure is not limited to this, and 2 frames of images or 4 frames of images can also be selected.
  • M the current frame, one frame of the current frame, the next two frames of the current frame, or the current frame, two frames of the current frame, and the next frame of the current frame can be selected.
  • the first 3 frames of the current frame or the last 3 frames of the current frame can also be selected, which is not limited here.
  • the method for obtaining M frames of images may be specifically as follows: First, L frames of images in the to-be-processed video may be obtained, where L is an integer greater than M. After that, the L frame images are grouped, and each group can be divided into M frame images. Since M is an integer greater than 1, when the grouping is performed in the above manner, the final number of groups obtained is less than L, which ultimately results in that the first few frames of images and the next few frames of images may not be processed when the to-be-processed video is enhanced. In order to avoid this problem, it can be added before the first frame of the L frame image and after the last frame of the image. Frame images to obtain L+M-1 frame images; divide L+M-1 frame images into L groups of M frame images. It can be seen that the number of added images can be determined according to the value of M. All the images added before the first frame of image may be the first frame of image, and all the images added after the last frame of image may be the last frame of image.
  • the step of inputting the M frame images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frame images can be performed.
  • a group of M-frame images is taken as an example for description. It is understandable that, for the video to be processed, the enhanced video can be obtained after the enhancement processing is performed on the L groups of M frame images.
  • the 5 original images can be divided into 5 groups in the following way: the first group P1, P1, P2; the second group P1, P2, P3; the third group P2, P3, P4; the fourth group P3, P4 , P5;
  • the fifth group P4, P5, P5; Take the pre-established video processing model for inputting 3 frames of image and outputting the enhanced image corresponding to the intermediate frame as an example, then input the above five groups of images to the pre-established video processing
  • the model obtains the enhanced image P11 corresponding to P1 (corresponding to the output of the first group), the enhanced image P21 corresponding to P2 (corresponding to the output of the second group), the enhanced image P31 corresponding to P3 (corresponding to the output of the first group), and P4 corresponding
  • the enhanced image P41 Take the pre-established video processing model for inputting 3 frames of image and outputting the enhanced image corresponding to the intermediate frame as an example, then input the above five groups of images to the pre-established video processing
  • the model obtains the enhanced image P11 corresponding to P
  • FIG. 4 shows a schematic diagram of a network structure of the video processing model in an embodiment of the present disclosure. It can be seen that the network structure of the video processing model may be a U-shaped network. The following describes the processing procedure of the video processing model in conjunction with FIG. 4.
  • step S310 feature extraction is performed on M frames of images to obtain at least one image feature of the first scale.
  • the scale of each frame of image is (H, W), that is, the resolution of the image is H ⁇ W.
  • M frames of images are all RGB images
  • the value of the number of channels C of the image is 3.
  • M frame images are all grayscale images, then the value of C is 1.
  • M frame images can be combined in the C channel, and the M frame image input to the video processing model can be expressed as (H, W, C*M).
  • C*M represents the number of characteristic layers, for example, when M is 3 and C is 3, the number of characteristic layers is 9.
  • the number of feature layers can be expanded without changing the resolution of the image. Therefore, the first scale is (H, W). For example, the number of feature layers can be expanded from C*M to F. In this way, the input M frame image changes from (H, W, C*M) to (H, W, F).
  • F is a preset value, such as 64 or 128.
  • the number of feature layers can be changed through a convolution operation.
  • a convolution process can be performed on M frames of images to obtain image features of the first scale.
  • the convolution kernel The size can be 3 ⁇ 3 and so on. Since the activation function can introduce non-linear factors to the neuron, the neural network can approach any non-linear function arbitrarily. Therefore, after performing convolution processing on M frames of images, an activation operation can also be performed to obtain image features of the first scale.
  • the activation function can be a ReLU (linear rectification function) function or a sigmoid function. It should be noted that the method of performing feature extraction on M-frame images in the present disclosure is not limited to this.
  • the number of image features of the first scale may be one or more.
  • each image feature of the first scale may correspond to a feature of a different image.
  • the image feature of the first scale may include: the feature of the second frame of image and the feature of the third frame of image.
  • step S320 to step S340 can be performed, so that the enhanced image of the second frame of image and the enhanced image of the third frame of image can be obtained.
  • the number of finally obtained enhanced images is the same as the number of image features of the first scale.
  • an image feature of the first scale is used for description.
  • Step S320 Perform N-level down-sampling processing on the image feature of the first scale to obtain the image feature of the second scale, where N is an integer greater than 1.
  • N-level down-sampling refers to performing N down-sampling. After each downsampling, you can get smaller features than the original image, which is equivalent to compressing the image, and the area per unit area that can be sensed becomes larger. In this way, after N-level downsampling, more contour information can be obtained.
  • the value of the step size may be 2, that is, the multiple of the downsampling may be twice.
  • the image features of the first scale (H, W), after downsampling by 2 times, the scale can be obtained as The image features of, where [] represents the rounding operation.
  • the present disclosure does not specifically limit the multiple of downsampling.
  • the value of N may be 4. Refer to Figure 4, where the network structure is a U-shaped network when N is 4. In this way, when the down-sampling multiple is 2, the scales of the image features obtained through the 4-level down-sampling processing are as follows: with At this time, the image feature of the second scale is
  • the image features after downsampling can also be activated and convolved.
  • the downsampling layer after the downsampling layer, it can also include: an activation layer, a first convolutional layer, and another activation layer.
  • the activation function in the activation layer can be a ReLU function, etc.
  • the size of the convolution kernel in the first convolutional layer can be 3 ⁇ 3 etc.
  • the network structure shown in FIG. 4 after the down-sampling layer, it may also include other network structures such as a convolution layer, an activation layer, and a pooling layer.
  • Step S330 Perform N-level upsampling processing on the image feature of the second scale to obtain the image feature of the third scale.
  • the input of the i-th upsampling process is the image feature after the superposition of the output of the N+1-i-th down-sampling process and the output of the i-1th up-sampling process; i is an integer from 2 to N.
  • the N-level upsampling corresponds to the above-mentioned N-level downsampling.
  • the N-level upsampling refers to performing N upsampling
  • the first-level upsampling refers to performing the first upsampling.
  • the input of the upsampling process is the image feature of the second scale.
  • the j-th down-sampling and N+1-j-th up-sampling are located in the same layer of the U-shaped network, and the magnification of the j-th up-sampling processing is the same as the reduction of the N+1-j-th down-sampling processing.
  • the resolution of the image before the down-sampling process at the jth level is the same as the resolution of the image after the up-sampling process at the N+1-jth level.
  • the resolution of the image after the down-sampling process at the jth level is the same as the resolution of the image before the up-sampling process at the N+1-jth level.
  • j is an integer of 1 to N.
  • the output of the N+1-i-th stage down-sampling processing and the output of the i-1-th stage up-sampling processing can be superimposed and used as the i-th stage up-sampling processing.
  • the input of the second-level upsampling process is the superposition of the output of the third-level down-sampling process and the output of the first-level upsampling process.
  • the superposition processing refers to the fusion processing of two features, which may be feature superposition or the like.
  • the output of the third-level down-sampling process is (a1, a2, a3)
  • the output of the first-level up-sampling process is (b1, b2, b3)
  • the two are superimposed to be (a1+b1, a2+b2) , A3+b3).
  • the image features of each stage of down-sampling can be superimposed during the up-sampling process, that is, in the up-sampling process, the image features of each level can be combined, so that the accuracy of image feature extraction can be improved.
  • the scale corresponding to the output of the N+1-i-th down-sampling process may correspond to the output of the i-1-th up-sampling process
  • the output of the N+1-ith downsampling process can be cropped first, so that the cropped scale is the same as the scale corresponding to the output of the i-1th upsampling process.
  • the scale of the obtained image feature is less than or equal to the first scale. That is, the third scale may be less than or equal to the first scale.
  • the third scale is the same as the first scale, that is, (H, W).
  • the third scale will be smaller than the first scale due to feature clipping.
  • convolution processing and activation operations can also be performed.
  • the upsampling layer it may also include: an activation layer, a second convolutional layer, and another activation layer.
  • the activation function in the activation layer may be a ReLU function, etc., the size of the convolution kernel in the second convolutional layer It can be 4 ⁇ 4, etc.
  • other network structures may also be included, which are not limited here.
  • Step S340 Perform superposition processing on the image feature of the third scale and the image feature of the first scale to obtain an enhanced image corresponding to the image feature of the first scale.
  • the image feature of the third scale and the image feature of the first scale can be directly superimposed to obtain the superimposed feature; the superimposed feature is converted into the image feature of the three channels to obtain the image feature of the first scale Corresponding enhanced image.
  • the F value remains unchanged. Therefore, the superimposed feature can be converted into an image feature containing three channels through convolution processing. For example, three-channel RGB images can be output.
  • a convolution operation can be corresponding, and the convolution operation can be used to convert the number of feature layers from F to 3, that is, to output three-channel image features.
  • the parameters in the convolution operation after each level of upsampling may be shared. For example, as shown in Figure 4, all levels of up-sampling all include the same third convolutional layer. In this way, parameter sharing can reduce the parameters in the video processing model and speed up the network training process.
  • the video processing model can be obtained by training the original video processing model through the target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss includes multi-level scale loss, multi-level The loss of each level of scale in the scale loss is the loss of each level of upsampling processing in the N-level upsampling process.
  • the loss of each level of upsampling processing is the loss between the first image and the second image.
  • the first image is obtained by inputting M frame sample images into the original video processing model and performing the corresponding level of upsampling processing, that is to say After each level of up-sampling, a first image can be output correspondingly.
  • the resolution of the first image corresponding to different levels of up-sampling is different.
  • each level of up-sampling corresponds to a second image.
  • the second image is a target image processed by each level of up-sampling. The resolution of the first image and the second image are the same.
  • sample-enhanced images corresponding to M frames of sample images can also be obtained, and the sample-enhanced images can be down-sampled at N-1 levels to obtain N-1 images with different resolutions.
  • N-1 images with different resolutions and sample enhanced images can be used as N target images. For example, by performing N-1 level downsampling on the sample enhanced image, the target image of the first level upsampling process can be obtained; performing the 1 level downsampling on the sample enhanced image can obtain the target image of the N-1 level upsampling process; The sample-enhanced image can be used as the target image of the Nth-level upsampling process.
  • FIG. 5 shows a flowchart of a training method of a video processing model in an embodiment of the present disclosure, which may include the following steps:
  • Step S510 Obtain multiple sets of M frame sample images and at least one frame of sample enhanced image corresponding to each set of M frame sample images.
  • the output is one or more frames of enhanced images.
  • sample data including M frames of sample images and corresponding one or more frames of sample enhanced images can be obtained.
  • the output of the video processing model is a frame of enhanced image
  • one frame of sample enhanced image corresponding to each set of M frames of sample images may specifically be an enhanced image corresponding to the middle frame of the set of M frames of sample images.
  • it can also be an enhanced image corresponding to a non-intermediate frame sample image.
  • the multi-frame sample enhanced image corresponding to each set of M frame sample images can be specifically the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images and the enhanced image corresponding to the intermediate frame.
  • the enhanced image of the adjacent sample image can of course also be the enhanced image of other sample enhanced images.
  • the purpose of the video processing model is different, and the sample-enhanced image used can also be different.
  • the video processing model to be trained is used for video denoising
  • the enhanced image corresponding to the intermediate frame sample image of each group of M frame sample images is specifically: the denoising image corresponding to the intermediate frame sample image.
  • the video processing model to be trained is used for video deblurring
  • the enhanced image corresponding to the intermediate frame sample image of each group of M frame sample images is specifically: the deblurred image corresponding to the intermediate frame sample image.
  • the video processing model of the embodiment of the present disclosure is not limited to this.
  • Step S520 for each set of M frame sample images, feature extraction is performed on the set of M frame sample images to obtain at least one sample image feature of the first scale.
  • steps S530 to S560 are executed:
  • Step S530 For each sample image feature of the first scale, perform N-level down-sampling processing on the sample image feature of the first scale to obtain the sample image feature of the second scale.
  • the batch size (block size) of model training can be set, that is, the number of sample data simultaneously input to the model. Assuming that the block size is B, the size of the final input model is (B, H, W, C*M).
  • step S520 to step S530 Since the processing procedure for each group of sample images of M frames in step S520 to step S530 is similar to the processing procedure in step S310 to step S320 described above, for details, please refer to the description in step S310 to step S320, which will not be repeated here.
  • Step S540 Perform N-level up-sampling processing on the sample image features of the second scale to obtain a predicted output image corresponding to each level of up-sampling.
  • each level of upsampling can correspond to a convolution operation.
  • convolution processing can be performed on the output features of that level of upsampling, so that a multi-scale predicted output image can be obtained.
  • N 4
  • five images of different scales can be outputted respectively, F1, F2, F3, F4, F5, where the five scales are: (H, W) , with
  • Step S550 for each level of upsampling, the difference between the target output image corresponding to the level of upsampling and the predicted output image corresponding to the level of upsampling is taken as the loss of the level of upsampling; where the target corresponding to the i-th level of upsampling
  • the output image is the input of performing N+1-i downsampling processing on the sample enhanced image corresponding to the set of M frames of sample images.
  • the target output image corresponding to the i-th level upsampling is an image that can be output under ideal conditions, and specifically may be N+1-i level downsampling of the sample enhanced image corresponding to the set of M frames of sample images Processed input.
  • the target output image corresponding to the first level of upsampling can be the input of performing N levels of downsampling on the sample enhanced image, that is, the sample is enhanced
  • the image is the output of N-1 down-sampling processing.
  • step S560 the sum of the losses of up-sampling at all levels is taken as the target loss, and the network parameter value in the original video processing model is updated according to the target loss.
  • the gradient descent method can be used to continuously calculate the loss according to the principle of back propagation, and update the network parameter values according to the loss.
  • a video processing model can be obtained.
  • the preset threshold can be set according to actual applications, which is not limited here.
  • multi-scale loss is used to approximate the small-scale features layer by layer, which helps to better restore the details of the high-definition image, thereby improving the effect of video enhancement.
  • FIG. 6 shows a schematic diagram of another network structure of the video processing model in an embodiment of the present disclosure. It can be seen that, compared with the network structure shown in FIG. 4, after the Nth level of upsampling processing, an upsampling layer is added. At this time, the video processing model can be used for video super-resolution processing.
  • step S350 may specifically include super-resolution processing after superimposing the image features of the third scale and the image features of the first scale to obtain a super-resolution image corresponding to the image features of the first scale. .
  • step S350 may specifically include super-resolution processing after superimposing the image features of the third scale and the image features of the first scale to obtain a super-resolution image corresponding to the image features of the first scale.
  • the sample-enhanced image in the sample data used in training may be a super-resolution image of the middle frame of the corresponding M frames of sample images.
  • the training process is similar to that of the network structure shown in Fig. 4.
  • the parameters in the convolution operation after upsampling at all levels can be shared, and all levels can be The sum of the sampled losses is used as the final loss, and the network parameter values are updated according to the final loss.
  • a video processing model for super-resolution processing can be obtained.
  • the video enhancement method of the embodiment of the present disclosure can save the calculation speed and improve the calculation efficiency through the U-shaped network. And calculating the loss at multiple scales can maximize the effect of the output image.
  • the present disclosure can be used for various video enhancement functions such as video denoising, video deblurring, and video super-resolution processing.
  • a video enhancement device 700 is also provided, as shown in FIG. 7, including:
  • the image enhancement processor 710 is configured to input M frames of images into a pre-established video processing model to obtain an enhanced image of at least one frame of the M frames of images, where M is an integer greater than 1;
  • the image enhancement processor 710 is specifically configured to perform feature extraction on M frames of images to obtain at least one image feature of the first scale;
  • N is an integer greater than 1;
  • N-level upsampling processing on the image features of the second scale to obtain the image features of the third scale; among them, the input of the first-level upsampling process is the image feature of the second-scale, and the input of the i-th upsampling process is the first
  • the output of the N+1-i downsampling process and the output of the i-1th upsampling process are superimposed on the image features; the magnification of the j-th upsampling process and the N+1-j downsampling process
  • the reduction multiple of is the same, i is an integer from 2 to N, and j is an integer from 1 to N;
  • the image feature of the third scale and the image feature of the first scale are superimposed to obtain an enhanced image corresponding to the image feature of the first scale.
  • the video processing model is obtained by training the original video processing model through the target loss; the original video processing model is configured to perform video enhancement processing on the video input to the original video processing model; the target loss Including the loss of multi-level scale, the loss of each level of the multi-level scale loss is the loss of each level of up-sampling processing in the N-level up-sampling process.
  • the loss of each level of up-sampling processing is the loss between the first image and the second image
  • the first image is the result of inputting M frames of sample images into the original video processing model for the corresponding level.
  • the second image obtained by the up-sampling processing is the target image of each level of up-sampling processing, and the resolution of the first image and the second image are the same.
  • the above-mentioned video enhancement device further includes:
  • the sample acquisition processor is configured to acquire multiple sets of M frames of sample images and at least one frame of sample enhanced image corresponding to each set of M frames of sample images;
  • the model training processor is configured to perform feature extraction on the set of M frame sample images for each set of M frame sample images to obtain at least one sample image feature of the first scale;
  • N-level down-sampling processing is performed on the sample image feature of the first scale to obtain the sample image feature of the second scale;
  • the difference between the target output image corresponding to this level of upsampling and the predicted output image corresponding to this level of upsampling is taken as the loss of this level of upsampling; where the target output image corresponding to the i-th level of upsampling is N+1-i-level down-sampling processing input to the sample enhanced image corresponding to the set of M frames of sample images;
  • the sum of the losses of up-sampling at all levels is taken as the target loss, and the network parameter values in the original video processing model are updated according to the target loss.
  • each set of M frames of sample images corresponds to one frame of sample enhanced images
  • one frame of sample enhanced images is specifically an enhanced image corresponding to the intermediate frame sample images of the set of M frames of sample images, wherein, M is an odd number greater than 1.
  • the enhanced image corresponding to the intermediate frame sample image of the set of M frame sample images is specifically:
  • the deblurred image corresponding to the sample image in the middle frame is the deblurred image corresponding to the sample image in the middle frame.
  • the value of M is 3, 5, or 7.
  • the above-mentioned video enhancement device further includes:
  • the to-be-processed video acquisition processor is configured to acquire L frame images in the to-be-processed video
  • the video frame dividing processor is configured to divide L+M-1 frame images into L groups of M frame images, where L is an integer greater than M;
  • the image enhancement processor is specifically configured to input M frames of images into a pre-established video processing model for each group of M frame images to obtain an enhanced image of at least one frame of the M frame images.
  • the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale through the following steps, and obtains the image feature corresponding to the image feature of the first scale.
  • the superimposed feature is converted into the image feature of the three channels, and the enhanced image corresponding to the image feature of the first scale is obtained.
  • the image enhancement processor implements the superposition processing of the image feature of the third scale and the image feature of the first scale through the following steps, and obtains the image feature corresponding to the image feature of the first scale.
  • Enhanced images including:
  • super-resolution processing is performed on the image feature of the third scale and the image feature of the first scale
  • super-resolution processing is performed to obtain a super-resolution image corresponding to the image feature of the first scale.
  • the value of N is 4.
  • each processor in the above device can be a general-purpose processor, including: a central processing unit, a network processor, etc.; it can also be a digital signal processor, an application specific integrated circuit, a field programmable gate array, or other programmable logic Devices, discrete gates or transistor logic devices, discrete hardware components.
  • Each processor in the above-mentioned device may be an independent processor, or may be integrated together.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the video enhancement method in this exemplary embodiment All or part of the steps.
  • Fig. 8 shows a schematic structural diagram of a computer system for implementing an electronic device of an embodiment of the present disclosure. It should be noted that the computer system 800 of the electronic device shown in FIG. 8 is only an example, and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.
  • the computer system 800 includes a central processing unit 801, which can perform various appropriate actions and processes according to a program stored in a read-only memory 802 or a program loaded from the storage portion 808 to the random access memory 803 .
  • a program stored in a read-only memory 802 or a program loaded from the storage portion 808 to the random access memory 803 In the random access memory 803, various programs and data required for system operation are also stored.
  • the central processing unit 801, the read-only memory 802, and the random access memory 803 are connected to each other through a bus 804.
  • the input/output interface 805 is also connected to the bus 804.
  • the following components are connected to the input/output interface 805: an input part 806 including a keyboard, a mouse, etc.; an output part 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and speakers, etc.; a storage part 808 including a hard disk, etc. ; And a communication section 809 including a network interface card such as a local area network (LAN) card, a modem, and the like. The communication section 809 performs communication processing via a network such as the Internet.
  • the driver 810 is also connected to the input/output interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 810 as needed, so that the computer program read from it is installed into the storage section 808 as needed.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication section 809, and/or installed from the removable medium 811.
  • the central processing unit 801 various functions defined in the device of the present application are executed.
  • a non-volatile computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.
  • non-volatile computer-readable storage medium shown in the present disclosure may be, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any of the above combination. More specific examples of non-volatile computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a non-volatile computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a non-volatile computer-readable storage medium.
  • the computer-readable medium may be sent, propagated, or transmitted for use by or in combination with the instruction execution system, apparatus, or device
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, radio frequency, etc., or any suitable combination of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

一种视频增强方法及装置、电子设备、存储介质,涉及图像处理技术领域。所述方法包括:对M帧图像进行特征提取,得到至少一个第一尺度的图像特征(S310);针对每个第一尺度的图像特征,对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征(S320);对第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征(S330);第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样的倍数和第N+1-j级下采样的倍数相同;对第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像(S340)。可以提高视频增强的效率以及效果。

Description

视频增强方法及装置、电子设备、存储介质
相关申请的交叉引用
本申请要求于2020年04月30日提交的申请号为202010366748.9、名称为“视频增强方法及装置、电子设备、存储介质”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开涉及图像处理技术领域,具体而言,涉及一种视频增强方法、视频增强装置、电子设备以及非易失性计算机可读存储介质。
背景技术
图像增强可以有目的地强调图像的整体或局部特性,将原来不清晰的图像变得清晰或强调某些感兴趣的特征,使之改善图像质量、丰富信息量,以满足某些特殊分析的需要。因此,图像增强技术被广泛地应用于各个领域。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
根据本公开的第一方面,提供一种视频增强方法,包括:
将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,M为大于1的整数;
所述将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,包括:
对所述M帧图像进行特征提取,得到至少一个第一尺度的图像特征;
针对每个所述第一尺度的图像特征,均执行以下过程:
对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特 征,N为大于1的整数;
对所述第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为所述第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;
对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
可选地,所述视频处理模型是通过目标损失对原始视频处理模型进行训练得到;所述原始视频处理模型被配置为对输入所述原始视频处理模型的视频进行视频增强处理;所述目标损失包括多级尺度的损失,所述多级尺度的损失中每级尺度的损失为N级上采样处理中每级上采样处理的损失。
可选地,所述每级上采样处理的损失是第一图像和第二图像之间的损失,所述第一图像是将M帧样本图像输入所述原始视频处理模型进行对应级的上采样处理得到的,所述第二图像是每级上采样处理的目标图像,所述第一图像和所述第二图像的分辨率是相同的。
可选地,对所述原始视频处理模型进行训练得到训练后的所述视频处理模型,包括:
获取多组M帧样本图像以及每组所述M帧样本图像对应的至少一帧样本增强图像;
针对每组M帧样本图像,对该组M帧样本图像进行特征提取,得到至少一个第一尺度的样本图像特征;
针对每个所述第一尺度的样本图像特征,均执行以下过程:
对该第一尺度的样本图像特征进行N级下采样处理,得到第二尺度的样本图像特征;
对所述第二尺度的样本图像特征进行N级上采样处理,得到每级上采样对应的预测输出图像;
针对每级上采样,将该级上采样对应的目标输出图像与该级上采样对应的预测输出图像的差值作为该级上采样的损失;其中,第i级上采样对应的目 标输出图像为对该组M帧样本图像对应的样本增强图像进行N+1-i级下采样处理的输入;
将各级上采样的损失之和作为所述目标损失,根据所述目标损失更新所述原始视频处理模型中的网络参数值。
可选地,每组M帧样本图像对应一帧样本增强图像,所述一帧样本增强图像具体为该组M帧样本图像的中间帧样本图像对应的增强图像,其中,M为大于1的奇数。
可选地,该组M帧样本图像的中间帧样本图像对应的增强图像具体为:
所述中间帧样本图像对应的去噪图像;或者
所述中间帧样本图像对应的去模糊图像。
可选地,M的值为3、5或7。
可选地,在所述将M帧图像输入预先建立的视频处理模型之前,所述方法还包括:
获取待处理视频中的L帧图像;
在所述L帧图像的第一帧图像之前,以及最后一帧图像之后分别增加
Figure PCTCN2021079872-appb-000001
帧图像,得到L+M-1帧图像;
将所述L+M-1帧图像划分为L组M帧图像,L为大于M的整数;
针对每组M帧图像,执行所述将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像的步骤。
可选地,所述对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:
将所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到叠加特征;
将所述叠加特征转换为三个通道的图像特征,得到该第一尺度的图像特征对应的增强图像。
可选地,所述对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:
对所述第三尺度的图像特征和所述第一尺度的图像特征进行叠加处理之后进行超分辨率处理,得到该第一尺度的图像特征对应的超分辨率图 像。
可选地,N的值为4。
根据本公开的第二方面,提供一种视频增强装置,包括:
图像增强处理器,被配置为将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,M为大于1的整数;
所述图像增强处理器,具体被配置为对所述M帧图像进行特征提取,得到至少一个第一尺度的图像特征;
针对每个所述第一尺度的图像特征,均执行以下过程:
对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数;
对所述第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为所述第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;
对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
可选地,所述视频处理模型是通过目标损失对原始视频处理模型进行训练得到;所述原始视频处理模型被配置为对输入所述原始视频处理模型的视频进行视频增强处理;所述目标损失包括多级尺度的损失,所述多级尺度的损失中每级尺度的损失为N级上采样处理中每级上采样处理的损失。
可选地,所述每级上采样处理的损失是第一图像和第二图像之间的损失,所述第一图像是将M帧样本图像输入所述原始视频处理模型进行对应级的上采样处理得到的,所述第二图像是每级上采样处理的目标图像,所述第一图像和所述第二图像的分辨率是相同的。
可选地,本公开实施例的视频增强装置,还包括:
样本获取处理器,被配置为获取多组M帧样本图像以及每组所述M帧样本图像对应的至少一帧样本增强图像;
模型训练处理器,被配置为针对每组M帧样本图像,对该组M帧样本图 像进行特征提取,得到至少一个第一尺度的样本图像特征;
针对每个所述第一尺度的样本图像特征,均执行以下过程:
对该第一尺度的样本图像特征进行N级下采样处理,得到第二尺度的样本图像特征;
对所述第二尺度的样本图像特征进行N级上采样处理,得到每级上采样对应的预测输出图像;
针对每级上采样,将该级上采样对应的目标输出图像与该级上采样对应的预测输出图像的差值作为该级上采样的损失;其中,第i级上采样对应的目标输出图像为对该组M帧样本图像对应的样本增强图像进行N+1-i级下采样处理的输入;
将各级上采样的损失之和作为所述目标损失,根据所述目标损失更新所述原始视频处理模型中的网络参数值。
可选地,每组M帧样本图像对应一帧样本增强图像,所述一帧样本增强图像具体为该组M帧样本图像的中间帧样本图像对应的增强图像,其中,M为大于1的奇数。
可选地,该组M帧样本图像的中间帧样本图像对应的增强图像具体为:
所述中间帧样本图像对应的去噪图像;或者
所述中间帧样本图像对应的去模糊图像。
可选地,M的值为3、5或7。
可选地,本公开实施例的视频增强装置,还包括:
待处理视频获取处理器,被配置为获取待处理视频中的L帧图像;
在所述L帧图像的第一帧图像之前,以及最后一帧图像之后分别增加
Figure PCTCN2021079872-appb-000002
帧图像,得到L+M-1帧图像;
视频帧划分处理器,被配置为将所述L+M-1帧图像划分为L组M帧图像,L为大于M的整数;
图像增强处理器,具体被配置为针对每组M帧图像将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像。
可选地,所述图像增强处理器通过下述步骤实现对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特 征对应的增强图像:
将所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到叠加特征;
将所述叠加特征转换为三个通道的图像特征,得到该第一尺度的图像特征对应的增强图像。
可选地,所述图像增强处理器通过下述步骤实现对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:
对所述第三尺度的图像特征和所述第一尺度的图像特征进行叠加处理之后进行超分辨率处理,得到该第一尺度的图像特征对应的超分辨率图像。
可选地,N的值为4。
根据本公开的第三方面,提供一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的方法。
根据本公开的第四方面,提供一种非易失性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了可以应用于本公开实施例的视频增强方法的示例性系统架构的示意图;
图2示出了卷积神经网络的一种结构示意图;
图3示出了本公开实施例中视频增强方法的一种流程图;
图4示出了本公开实施例中视频处理模型的一种网络结构示意图;
图5示出了本公开实施例中视频处理模型的训练方法的一种流程图;
图6示出了本公开实施例中视频处理模型的又一种网络结构示意图;
图7示出了本公开实施例中视频增强装置的一种结构示意图;
图8示出了适于用来实现本公开实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
一些技术中,可以基于卷积神经网络算法,对图像进行增强处理。但是由于视频是由多帧图像构成,视频增强的计算量较大,计算效率较低。并且,该算法视频增强的效果也较差。
图1示出了可以应用于本公开实施例的视频增强方法的示例性系统架构的示意图。
如图1所示,系统架构100可以包括终端设备101、102中的一个或多个,网络103和服务器104。网络103用以在终端设备101、102和服务器104之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。终端设备101、102可以是具有显示屏的各种电子设备,包括但不限于便携式计算机、智能手机和平板电脑等等。应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的,根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器104可以是多个服务器组成的服务器集群等。
本公开实施例所提供的视频增强方法一般由服务器104执行,相应地,视频增强装置一般设置于服务器104中。但本领域技术人员容易理解的是,本公开实施例所提供的视频增强方法也可以由终端设备101、102执行,相应地,视频增强装置也可以设置于终端设备101、102中,本示例性实施例中对此不做特殊限定。举例而言,在一种示例性实施例中,可以是用户通过终端设备101、102将待处理视频上传至服务器104,服务器104通过本公开实施例所提供的视频增强方法对该待处理视频进行处理,还可以将得到的增强视频发送给终端设备101、102。
以下对本公开实施例的技术方案进行详细阐述:
目前,图像增强可以包括:图像效果的增强以及图像形态的增强。其中,图像效果的增强可以包括:图像去噪、图像去模糊、图像修复等,图像形态的增强可以包括:图像超分辨率处理等。
图像增强可以通过卷积神经网络来实现。卷积神经网络是神经网络的一种特殊结构,可以将原始图像和原始图像的增强图像分别作为输入和输出,并用卷积核替代标量的权值。一个三层结构的卷积神经网络如图2所示。该网络具有4个输入,隐藏层中具有3个输出,输出层含有2个输出,最终系统输出两幅图像。每个模块
Figure PCTCN2021079872-appb-000003
表示一个卷积核。k表示输入层编号,i和j表示输入和输出的单位编号。偏置
Figure PCTCN2021079872-appb-000004
是一组叠加在卷积层输出上的标量。叠加了偏置的卷积层的输出可以输入激活层。经过训练后,卷积核和偏置是固定的。
其中,训练的过程是通过一组匹配的输入和输出,以及优化算法对卷 积核和偏置进行参数调优。通常情况下,每个卷积层可以包含数十个或数百个卷积核,深度神经网络往往包含5层以上的卷积层。可见,基于卷积神经网络的图像增强算法,网络参数较多,计算效率较低。并且,卷积神经网络无法提取更多的图像特征,导致图像增强的效果较差。
为了解决上述问题,本公开实施例提供了一种视频增强方法,可以提高视频增强的计算效率,并提高视频增强的效果。
具体的,可以将M帧图像输入预先建立的视频处理模型,得到M帧图像中至少一帧图像的增强图像,M为大于1的整数。此处的至少一帧图像的增强图像可以是M帧图像的中间帧对应的增强图像,也可以是M帧图像中除中间帧以外的其他帧对应的增强图像;例如如果M是3,可以是第2帧;如果M是5,可以是第3帧。
参见图3,图3示出了本公开实施例中视频增强方法的一种流程图,视频处理模型对M帧图像的处理过程可以包括以下步骤:
步骤S310,对M帧图像进行特征提取,得到至少一个第一尺度的图像特征。
针对每个第一尺度的图像特征,均可以执行步骤S320~步骤S340,这样,最终得到的增强图像的数量与第一尺度的图像特征的数量相同。
步骤S320,针对每个第一尺度的图像特征,对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数。
步骤S330,对第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数。
步骤S340,对第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
本公开实施例的视频增强方法,通过对M帧图像进行处理,可以保证视频处理的帧间连续性,避免出现帧间抖动。通过对M帧图像进行N级下采样处理以及N级上采样处理,即进行多尺度的特征提取,可以提升计算效 率,加速运算。在上采样处理过程中,通过逐级还原并与对应下采样处理过程中的特征叠加,融合了高层特征和低层特征,可以提高特征表达能力,从而可以提高视频增强的效果。
以下对本公开实施例的视频增强方法进行更加详细的阐述。
在步骤S310中,对M帧图像进行特征提取,得到至少一个第一尺度的图像特征。
本公开实施例中,M帧图像可以是连续的视频帧。需要说明的是,为了保证帧间连续性,避免出现帧间抖动,M可以是较小的数值,例如,可以为2~7的整数。在选取M帧图像时,可以选取要进行增强处理的当前帧、当前帧的前、后各一帧图像。或者,选取当前帧、当前帧的前、后各两帧图像。或者,选取当前帧、当前帧的前、后各三帧图像等。即M的值可以为3、5或7。这样,可以使当前帧位于M帧图像的中间,可以避免偏差,以提取更多、更准确的图像特征,以提升图像增强效果。当然,本公开选取M帧图像的方法并不限于此,也可以选取2帧图像或4帧图像等。在M为4时,可以选取当前帧、当前帧的一帧、当前帧的后两帧图像,或者选取当前帧、当前帧的两帧、当前帧的后一帧图像等。当然,也可以选取当前帧的前3帧图像或者当前帧的后3帧图像,在此不做限定。
其中,M帧图像的获取方法具体可以为:首先,可以获取待处理视频中的L帧图像,L为大于M的整数。之后再对L帧图像进行分组,每组可以划分为M帧图像。由于M为大于1的整数,在按照上述方式进行分组时,最终得到的分组数量小于L,最终导致在对待处理视频进行增强处理时,前几帧图像和后几帧图像可能未被处理。为了避免该问题,可以在L帧图像的第一帧图像之前,以及最后一帧图像之后分别增加
Figure PCTCN2021079872-appb-000005
帧图像,得到L+M-1帧图像;将L+M-1帧图像划分为L组M帧图像。可以看出,增加的图像的数量可以根据M的数值来确定。在第一帧图像之前增加的图像均可以为第一帧图像,在最后一帧图像之后增加的图像均可以为最后一帧图像。
针对每组M帧图像,均可以执行将M帧图像输入预先建立的视频处理模型,得到M帧图像中至少一帧图像的增强图像的步骤。对于每组M帧图像,由于对其处理的方式相同,在此以一组M帧图像为例进行说明。可以理解的 是,对于待处理视频,在对L组M帧图像进行增强处理之后,即可得到增强视频。例如,对于一个包含5帧原始图像的视频,即L=5,如果M=3,将5帧原始图像表示为P1-P5(即第一帧原始图像P1到第二帧原始图像P5);那么此时的可将5帧原始图像按以下方式分为5组:第一组P1、P1、P2;第二组P1、P2、P3;第三组P2、P3、P4;第四组P3、P4、P5;第五组P4、P5、P5;以预先建立的视频处理模型用于输入3帧图像输出中间帧对应的增强图像为例,那么将上述五组图像分别输入到预先建立的视频处理模型分别得到P1对应的增强图像P11(对应第一组的输出)、P2对应的增强图像P21(对应第二组的输出)、P3对应的增强图像P31(对应第一组的输出)、P4对应的增强图像P41(对应第一组的输出),P5对应的增强图像P51,根据这5帧增强图像P11-P51就可以得到增强后的视频。
参见图4,图4示出了本公开实施例中视频处理模型的一种网络结构示意图,可以看出,视频处理模型的网络结构可以是U型网络。下面结合图4,对视频处理模型的处理过程进行介绍。
在步骤S310中,对M帧图像进行特征提取,得到至少一个第一尺度的图像特征。
具体的,假设每帧图像的尺度为(H,W),也就是该图像分辨率为H×W。若M帧图像均为RGB图像,那么图像的通道数C的值为3。若M帧图像均为灰度图像,那么C的值为1。M帧图像可以在C通道进行合并,则输入视频处理模型的M帧图像可以表示为(H,W,C*M)。其中,C*M表示特征层数,例如在M为3,C为3时,特征层数为9。
在对M帧图像进行特征提取时,可以对特征层数进行扩展,而不改变图像的分辨率。因此,第一尺度即为(H,W)。例如,可以将特征层数由C*M扩展为F。这样,输入的M帧图像由(H,W,C*M)变为(H,W,F)。其中,F是预先设置的值,例如可以是64或128等。
在本公开的一种实现方式中,可以通过卷积操作来改变特征层数,具体的,可以对M帧图像进行卷积处理,得到第一尺度的图像特征,卷积处理过程中卷积核的大小可以是3×3等。由于激活函数可以给神经元引入了非线性因素,使得神经网络可以任意逼近任何非线性函数。因此,在对M帧图像进 行卷积处理后,还可以进行激活操作,得到第一尺度的图像特征。激活函数可以是ReLU(线性整流函数)函数或者sigmoid函数等。需要说明的是,本公开对M帧图像进行特征提取的方式不限于此。
需要说明的是,第一尺度的图像特征的数量可以是一个,也可以是多个。在第一尺度的图像特征的数量为多个时,每个第一尺度的图像特征可以对应不同图像的特征。例如,在M为5时,第一尺度的图像特征可以包括:第2帧图像的特征和第3帧图像的特征。针对每个第一尺度的图像特征,均可以执行步骤S320~步骤S340,这样,可以得到第2帧图像的增强图像以及第3帧图像的增强图像。最终得到的增强图像的数量与第一尺度的图像特征的数量相同,本公开实施例中,以一个第一尺度的图像特征为了进行说明。
步骤S320,对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数。
本公开实施例中,N级下采样指的是进行N次下采样。每经过一次下采样,就可以得到比原先图像更小的特征,相当于对图像进行了压缩,单位面积能被感知的区域变大了。这样,经过N级下采样之后,可以获取更多的轮廓信息。在本公开的一种实现方式中,为了保留更多的细节信息,步长的取值可以为2,即下采样的倍数可以是2倍。例如,第一尺度(H,W)的图像特征,经过2倍下采样之后,可以得到尺度为
Figure PCTCN2021079872-appb-000006
的图像特征,其中,[]表示取整操作。当然,本公开对下采样的倍数不做具体限定。
可以理解的是,N的值越大,也就是下采样的次数越多,单位面积能被感知的区域越大,可以获取的轮廓信息也就越多。但是,下采样的次数越多,也就意味着需要更多的计算资源。通过实验表明,在N的值为4时,可以使用较少计算资源的同时,获取较多的轮廓信息。因此,本公开实施例中,N的值可以为4。参见图4,其中的网络结构即为N为4时的U型网络。这样,在下采样倍数为2时,通过4级下采样处理得到的图像特征的尺度依次为:
Figure PCTCN2021079872-appb-000007
Figure PCTCN2021079872-appb-000008
此时,第二尺度的图像特征即为
Figure PCTCN2021079872-appb-000009
需要说明的是,在每级下采样处理之后,还可以对下采样之后的图像特征进行激活操作以及卷积处理。参见图4,在下采样层之后还可以包括:激活层、第一卷积层以及又一个激活层,激活层中的激活函数可以是ReLU函数等,第一卷积层中卷积核的大小可以3×3等。当然,除了图4所示的网络结构外,在下采样层后,还可以包括:卷积层、激活层、池化层等其他网络结构。
步骤S330,对第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征。第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;i为2~N的整数。
本公开实施例中,N级上采样与上述N级下采样相对应,N级上采样指的是进行N次上采样,第1级上采样指的是进行第一次上采样,第1级上采样处理的输入为第二尺度的图像特征。其中,第j级下采样与第N+1-j级上采样位于U型网络的同一层,第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,这样,第j级下采样处理之前的图像,和第N+1-j级上采样处理之后的图像的分辨率相同。或者,第j级下采样处理之后的图像,和第N+1-j级上采样处理之前的图像的分辨率相同。其中,j为1~N的整数。
在上采样过程中,对于第i级上采样,可以将第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加之后,作为第i级上采样处理的输入。例如,在图4中,第2级上采样处理的输入,即为第3级下采样处理的输出和第1级上采样处理的输出的叠加。本公开实施例中,叠加处理指的是将两个特征进行融合处理,可以是特征叠加等。例如,第3级下采样处理的输出为(a1,a2,a3),第1级上采样处理的输出为(b1,b2,b3),两者叠加处理后为(a1+b1,a2+b2,a3+b3)。
这样,下采样各个阶段的图像特征可以在上采样过程中进行叠加,也就是在上采样的过程中,可以结合各个层次的图像特征,从而可以提高图像特征提取的准确性。
需要说明的是,在下采样处理时存在取整操作,那么在叠加时,第N+1-i级下采样处理的输出所对应的尺度,可能与第i-1级上采样处理的输出所对应的尺度存在差异。此时可以先对第N+1-i级下采样处理的输出进行裁剪,使 裁剪后的尺度与第i-1级上采样处理的输出所对应的尺度相同。
举例而言,假设第3级下采样处理的输出所对应的分辨率为10×10,第1级上采样处理的输出所对应的分辨率为8×8,为了使这两个特征能够顺利拼接,可以裁剪10×10的中间部分(8×8的大小),然后拼接。
需要说明的是,经过上述N级下采样和N级上采样之后,得到的图像特征的尺度小于或等于第一尺度。也就是说,第三尺度可以小于或等于第一尺度。例如,在下采样过程中,不存在取整操作时,第三尺度与第一尺度相同,即(H,W)。在存在取整操作时,由于存在特征的裁剪,第三尺度将小于第一尺度。
本公开实施例中,与下采样处理过程相对应,在上采样处理之后,还可以进行卷积处理和激活操作。参见图4,在上采样层之后还可以包括:激活层、第二卷积层和又一个激活层,激活层中的激活函数可以是ReLU函数等,第二卷积层中卷积核的大小可以4×4等。当然,除了图4所示的网络结构外,在上采样层之后,还可以包括其他网络结构,在此不做限定。
步骤S340,对第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
本公开实施例中,可以直接将第三尺度的图像特征和第一尺度的图像特征进行叠加处理,得到叠加特征;将叠加特征转换为三个通道的图像特征,得到该第一尺度的图像特征对应的增强图像。具体的,在上述过程中F值一直不变,因此可以通过卷积处理,将叠加特征转换为包含三个通道的图像特征。例如,可以输出三通道的RGB图像。
值得一提的是,本公开实施例中,每级上采样之后均可以对应一个卷积操作,该卷积操作可以用于将特征层数由F转换为3,即输出三通道的图像特征。并且,各级上采样之后的该卷积操作中的参数可以是共享的。例如,如图4所示,各级上采样之后均包含相同的第三卷积层。这样,通过参数共享可以减少视频处理模型中的参数,加快网络的训练过程。
其中,视频处理模型可以是通过目标损失对原始视频处理模型进行训练得到;原始视频处理模型被配置为对输入原始视频处理模型的视频进行视频增强处理;目标损失包括多级尺度的损失,多级尺度的损失中每级尺度的损 失为N级上采样处理中每级上采样处理的损失。
具体的,每级上采样处理的损失是第一图像和第二图像之间的损失,第一图像是将M帧样本图像输入原始视频处理模型进行对应级的上采样处理得到的,也就是说,在每级上采样之后均可以对应输出一个第一图像,当然不同级上采样对应的第一图像的分辨率是不同的。并且,每级上采样还对应有第二图像,第二图像是每级上采样处理的目标图像,第一图像和第二图像的分辨率是相同的。
在训练时,还可以获取M帧样本图像对应的样本增强图像,对样本增强图像进行N-1级下采样,可以得到N-1个不同分辨率的图像。N-1个不同分辨率的图像和样本增强图像可以作为N个目标图像。例如,对样本增强图像进行N-1级下采样,可以得到第一级上采样处理的目标图像;对样本增强图像进行1级下采样,可以得到第N-1级上采样处理的目标图像;该样本增强图像可以作为第N级上采样处理的目标图像。
以下对视频处理模型的训练方法进行详细介绍。
参见图5,图5示出了本公开实施例中视频处理模型的训练方法的一种流程图,可以包括以下步骤:
步骤S510,获取多组M帧样本图像以及每组M帧样本图像对应的至少一帧样本增强图像。
由于视频处理模型的输入可以是多帧图像,输出是一帧或多帧增强图像。相应地,在训练过程中,可以获取包含M帧样本图像和对应的一帧或多帧样本增强图像的样本数据。在视频处理模型的输出是一帧增强图像时,每组M帧样本图像对应的一帧样本增强图像,具体可以为该组M帧样本图像的中间帧样本图像对应的增强图像。当然,也可以是非中间帧样本图像对应的增强图像。在视频处理模型的输出是多帧增强图像时,每组M帧样本图像对应的多帧样本增强图像,具体可以为该组M帧样本图像的中间帧样本图像对应的增强图像以及与中间帧相邻的样本图像的增强图像,当然也可以是其他样本增强图像的增强图像。本公开以视频处理模型输出一帧增强图像为例进行说明。
需要说明的是,视频处理模型的目的不同,所使用的样本增强图像也可 以不同。例如,如果要训练的视频处理模型用于视频去噪,那么,每组M帧样本图像的中间帧样本图像对应的增强图像具体为:中间帧样本图像对应的去噪图像。如果要训练的视频处理模型用于视频去模糊,那么,每组M帧样本图像的中间帧样本图像对应的增强图像具体为:中间帧样本图像对应的去模糊图像。当然,本公开实施例的视频处理模型并不以此为限。
步骤S520,针对每组M帧样本图像,对该组M帧样本图像进行特征提取,得到至少一个第一尺度的样本图像特征。
针对每个第一尺度的样本图像特征,均执行步骤S530~步骤S560:
步骤S530,针对每个第一尺度的样本图像特征,对第一尺度的样本图像特征进行N级下采样处理,得到第二尺度的样本图像特征。
本公开实施例中,可以设置模型训练的batch size(块尺寸),即同时输入模型中的样本数据的数量。假设块尺寸为B,则最终输入模型的尺寸为(B,H,W,C*M)。
由于步骤S520~步骤S530中对每组M帧样本图像的处理过程,与上述步骤S310~步骤S320的处理过程类似,具体可参见步骤S310~步骤S320中的描述,在此不再赘述。
步骤S540,对第二尺度的样本图像特征进行N级上采样处理,得到每级上采样对应的预测输出图像。
如前所述,每级上采样之后均可以对应一个卷积操作,根据该卷积操作可以对该级上采样的输出特征进行卷积处理,从而可以得到多尺度的预测输出图像。假设对于N为4的网络,在训练过程中,从上到下,分别可以输出F1、F2、F3、F4、F5五个不同尺度的图像,其中该五个尺度分别为:(H,W)、
Figure PCTCN2021079872-appb-000010
Figure PCTCN2021079872-appb-000011
步骤S550,针对每级上采样,将该级上采样对应的目标输出图像与该级上采样对应的预测输出图像的差值作为该级上采样的损失;其中,第i级上采样对应的目标输出图像为对该组M帧样本图像对应的样本增强图像进行N+1-i级下采样处理的输入。
本公开实施例中,第i级上采样对应的目标输出图像是在理想情况下可以输出的图像,具体可以是对该组M帧样本图像对应的样本增强图像进行N+1-i 级下采样处理的输入。
例如,对于任意的M帧样本图像和对应的样本增强图像,第1级上采样对应的目标输出图像,可以是对该样本增强图像进行N级下采样处理的输入,也就是,对该样本增强图像进行N-1级下采样处理的输出。
在训练过程中,得到的预测输出图像和目标输出图像之间通常会存在偏差,差值越大,表示预测输出图像与目标输出图像越不一致;差值越小,表示预测输出图像与目标输出图像越一致。
步骤S560,将各级上采样的损失之和作为目标损失,根据目标损失更新原始视频处理模型中的网络参数值。
在训练的过程中,可以通过梯度下降法,根据反向传播原理,不断计算损失,并根据损失更新网络参数值。在训练完成之后,在损失值符合要求时,例如,小于预设阈值等,可以得到视频处理模型。其中,预设阈值可以根据实际应用进行设置,在此不做限定。本公开实施例中,使用多尺度损失,可以从小尺度特征进行逐层逼近,有助于更好的还原高清图像的细节,从而可以提高视频增强的效果。
参见图6,图6示出了本公开实施例中视频处理模型的又一种网络结构示意图。可以看出,与图4所示的网络结构相比,在第N级上采样处理之后,增加了一个上采层,此时,该视频处理模型可以用于视频的超分辨率处理。
在此基础上,步骤S350中,具体可以包括,对第三尺度的图像特征和第一尺度的图像特征进行叠加处理之后进行超分辨率处理,得到第一尺度的图像特征对应的超分辨率图像。这样,如果输入分辨率为H×W的M帧图像,可以输出分辨率为2H×2W的一帧或多帧超分辨率图像。
相应地,在训练时所使用的样本数据中的样本增强图像,可以是对应的M帧样本图像的中间帧的超分辨率图像。并且,对于图6所示的网络结构,其训练过程与图4所示网络结构的训练过程类似,例如,各级上采样之后的卷积操作中的参数可以是共享的,可以将各级上采样的损失之和作为最终损失,根据最终损失更新网络参数值等,具体参见图5实施例中的描述即可,在此不再赘述。在经过训练之后,可以得到用于超分辨处理的视频处理模型。
本公开实施例的视频增强方法,通过U型网络可以很好的节省计算速 度,提升计算效率。并且在多尺度计算损失,可以最大程度的提高输出图像的效果。本公开可用于视频去噪、视频去模糊、视频超分辨率处理等多种视频增强功能。
应当注意,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
进一步的,本示例实施方式中,还提供了一种视频增强装置700,参考图7所示,包括:
图像增强处理器710,被配置为将M帧图像输入预先建立的视频处理模型,得到M帧图像中至少一帧图像的增强图像,M为大于1的整数;
图像增强处理器710,具体被配置为对M帧图像进行特征提取,得到至少一个第一尺度的图像特征;
针对每个第一尺度的图像特征,均执行以下过程:
对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数;
对第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;
对第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
在本公开的一种示例性实施例中,视频处理模型是通过目标损失对原始视频处理模型进行训练得到;原始视频处理模型被配置为对输入原始视频处理模型的视频进行视频增强处理;目标损失包括多级尺度的损失,多级尺度的损失中每级尺度的损失为N级上采样处理中每级上采样处理的损失。
在本公开的一种示例性实施例中,每级上采样处理的损失是第一图像和 第二图像之间的损失,第一图像是将M帧样本图像输入原始视频处理模型进行对应级的上采样处理得到的,第二图像是每级上采样处理的目标图像,第一图像和第二图像的分辨率是相同的。
在本公开的一种示例性实施例中,上述视频增强装置,还包括:
样本获取处理器,被配置为获取多组M帧样本图像以及每组M帧样本图像对应的至少一帧样本增强图像;
模型训练处理器,被配置为针对每组M帧样本图像,对该组M帧样本图像进行特征提取,得到至少一个第一尺度的样本图像特征;
针对每个第一尺度的样本图像特征,均执行以下过程:
对该第一尺度的样本图像特征进行N级下采样处理,得到第二尺度的样本图像特征;
对第二尺度的样本图像特征进行N级上采样处理,得到每级上采样对应的预测输出图像;
针对每级上采样,将该级上采样对应的目标输出图像与该级上采样对应的预测输出图像的差值作为该级上采样的损失;其中,第i级上采样对应的目标输出图像为对该组M帧样本图像对应的样本增强图像进行N+1-i级下采样处理的输入;
将各级上采样的损失之和作为目标损失,根据目标损失更新原始视频处理模型中的网络参数值。
在本公开的一种示例性实施例中,每组M帧样本图像对应一帧样本增强图像,一帧样本增强图像具体为该组M帧样本图像的中间帧样本图像对应的增强图像,其中,M为大于1的奇数。
在本公开的一种示例性实施例中,该组M帧样本图像的中间帧样本图像对应的增强图像具体为:
中间帧样本图像对应的去噪图像;或者
中间帧样本图像对应的去模糊图像。
在本公开的一种示例性实施例中,M的值为3、5或7。
在本公开的一种示例性实施例中,上述视频增强装置,还包括:
待处理视频获取处理器,被配置为获取待处理视频中的L帧图像;
在L帧图像的第一帧图像之前,以及最后一帧图像之后分别增加
Figure PCTCN2021079872-appb-000012
帧图像,得到L+M-1帧图像;
视频帧划分处理器,被配置为将L+M-1帧图像划分为L组M帧图像,L为大于M的整数;
图像增强处理器,具体被配置为针对每组M帧图像将M帧图像输入预先建立的视频处理模型,得到M帧图像中至少一帧图像的增强图像。
在本公开的一种示例性实施例中,图像增强处理器通过下述步骤实现对第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像:
将第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到叠加特征;
将叠加特征转换为三个通道的图像特征,得到该第一尺度的图像特征对应的增强图像。
在本公开的一种示例性实施例中,图像增强处理器通过下述步骤实现对第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:
对第三尺度的图像特征和第一尺度的图像特征进行叠加处理之后进行超分辨率处理,得到该第一尺度的图像特征对应的超分辨率图像。
需要说明的是本公开所有实施例中的尺度可以理解为对应图像的分辨率。
在本公开的一种示例性实施例中,N的值为4。
上述装置中各处理器的具体细节已经在对应的方法中进行了详细的描述,因此此处不再赘述。
需要说明的是,上述装置中各处理器可以是通用处理器,包括:中央处理器、网络处理器等;还可以是数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。上述装置中的各处理器可以是独立的处理器,也可以集成在一起。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干 模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
在本公开的示例性实施例中,还提供一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器被配置为执行本示例实施方式中的视频增强方法的全部或者部分步骤。
图8示出了用于实现本公开实施例的电子设备的计算机系统的结构示意图。需要说明的是,图8示出的电子设备的计算机系统800仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图8所示,计算机系统800包括中央处理器801,其可以根据存储在只读存储器802中的程序或者从存储部分808加载到随机访问存储器803中的程序而执行各种适当的动作和处理。在随机访问存储器803中,还存储有系统操作所需的各种程序和数据中央处理器801、只读存储器802以及随机访问存储器803通过总线804彼此相连。输入/输出接口805也连接至总线804。
以下部件连接至输入/输出接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如局域网(LAN)卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至输入/输出接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安 装。在该计算机程序被中央处理器801执行时,执行本申请的装置中限定的各种功能。
在本公开的示例性实施例中,还提供一种非易失性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的方法。
需要说明的是,本公开所示的非易失性计算机可读存储介质例如可以是—但不限于—电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。非易失性计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,非易失性计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是非易失性计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频等等,或者上述的任意合适的组合。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精 确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (15)

  1. 一种视频增强方法,其中,包括:
    将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,M为大于1的整数;
    所述将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,包括:
    对所述M帧图像进行特征提取,得到至少一个第一尺度的图像特征;
    针对每个所述第一尺度的图像特征,均执行以下过程:
    对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数;
    对所述第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为所述第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;
    对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
  2. 根据权利要求1所述的方法,其中,所述预先建立的视频处理模型是通过目标损失对原始视频处理模型进行训练得到;所述原始视频处理模型被配置为对输入所述原始视频处理模型的视频进行视频增强处理;所述目标损失包括多级尺度的损失,所述多级尺度的损失中每级尺度的损失为N级上采样处理中每级上采样处理的损失。
  3. 根据权利要求2所述的方法,其中,所述每级上采样处理的损失是第一图像和第二图像之间的损失,所述第一图像是将M帧样本图像输入所述原始视频处理模型进行对应级的上采样处理得到的,所述第二图像是每级上采样处理的目标图像,所述第一图像和所述第二图像的分辨率是相同的。
  4. 根据权利要求2所述的方法,其中,对所述原始视频处理模型进行训练得到所述预先建立的视频处理模型,包括:
    获取多组M帧样本图像以及每组所述M帧样本图像对应的至少一帧样本增 强图像;
    针对每组M帧样本图像,对所述每组M帧样本图像进行特征提取,得到至少一个第一尺度的样本图像特征;
    针对每个第一尺度的样本图像特征,均执行以下过程:
    对第一尺度的样本图像特征进行N级下采样处理,得到第二尺度的样本图像特征;
    对所述第二尺度的样本图像特征进行N级上采样处理,得到每级上采样对应的预测输出图像;
    针对每级上采样,将所述每级上采样对应的目标输出图像与该级上采样对应的预测输出图像的差值作为该级上采样的损失;其中,第i级上采样对应的目标输出图像为对该组M帧样本图像对应的样本增强图像进行N+1-i级下采样处理的输入;
    将各级上采样的损失之和作为所述目标损失,根据所述目标损失更新所述原始视频处理模型中的网络参数值。
  5. 根据权利要求4所述的方法,其中,每组M帧样本图像对应一帧样本增强图像,所述一帧样本增强图像具体为所述每组M帧样本图像的中间帧样本图像对应的增强图像,其中,M为大于1的奇数。
  6. 根据权利要求1所述的方法,其中,M的值为3、5或7。
  7. 根据权利要求6所述的方法,其中,在所述将M帧图像输入预先建立的视频处理模型之前,所述方法还包括:
    获取待处理视频中的L帧图像;
    在所述L帧图像的第一帧图像之前,以及最后一帧图像之后分别增加
    Figure PCTCN2021079872-appb-100001
    帧图像,得到L+M-1帧图像;
    将所述L+M-1帧图像划分为L组M帧图像,L为大于M的整数;
    其中,针对每组M帧图像,执行所述将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像的步骤。
  8. 根据权利要求1所述的方法,其中,所述对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:
    将所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到叠加特征;
    将所述叠加特征转换为三个通道的图像特征,得到该第一尺度的图像特征对应的增强图像。
  9. 根据权利要求1所述的方法,其中,所述对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像,包括:
    对所述第三尺度的图像特征和所述第一尺度的图像特征进行叠加处理之后进行超分辨率处理,得到所述第一尺度的图像特征对应的超分辨率图像。
  10. 根据权利要求1所述的方法,其中,N的值为4。
  11. 一种视频增强装置,其中,包括:
    图像增强处理器,被配置为将M帧图像输入预先建立的视频处理模型,得到所述M帧图像中至少一帧图像的增强图像,M为大于1的整数;
    所述图像增强处理器,具体被配置为对所述M帧图像进行特征提取,得到至少一个第一尺度的图像特征;
    针对每个所述第一尺度的图像特征,均执行以下过程:
    对该第一尺度的图像特征进行N级下采样处理,得到第二尺度的图像特征,N为大于1的整数;
    对所述第二尺度的图像特征进行N级上采样处理,得到第三尺度的图像特征;其中,第1级上采样处理的输入为所述第二尺度的图像特征,第i级上采样处理的输入为第N+1-i级下采样处理的输出和第i-1级上采样处理的输出进行叠加处理后的图像特征;第j级上采样处理的放大倍数和第N+1-j级下采样处理的缩小倍数相同,i为2~N的整数,j为1~N的整数;
    对所述第三尺度的图像特征和该第一尺度的图像特征进行叠加处理,得到该第一尺度的图像特征对应的增强图像。
  12. 根据权利要求11所述的装置,其中,所述预先建立的视频处理模型是通过目标损失对原始视频处理模型进行训练得到;所述原始视频处理模型被配置为对输入所述原始视频处理模型的视频进行视频增强处理;所述目标损失包括多级尺度的损失,所述多级尺度的损失中每级尺度的损失为N级上采样处理 中每级上采样处理的损失。
  13. 根据权利要求12所述的装置,其中,所述每级上采样处理的损失是第一图像和第二图像之间的损失,所述第一图像是将M帧样本图像输入所述原始视频处理模型进行对应级的上采样处理得到的,所述第二图像是每级上采样处理的目标图像,所述第一图像和所述第二图像的分辨率是相同的。
  14. 一种电子设备,其中,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1~10任一项所述的方法。
  15. 一种非易失性计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1~10任一项所述的方法。
PCT/CN2021/079872 2020-04-30 2021-03-10 视频增强方法及装置、电子设备、存储介质 WO2021218414A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/630,784 US20220318950A1 (en) 2020-04-30 2021-03-10 Video enhancement method and apparatus, and electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010366748.9 2020-04-30
CN202010366748.9A CN113592723B (zh) 2020-04-30 2020-04-30 视频增强方法及装置、电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2021218414A1 true WO2021218414A1 (zh) 2021-11-04

Family

ID=78237621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079872 WO2021218414A1 (zh) 2020-04-30 2021-03-10 视频增强方法及装置、电子设备、存储介质

Country Status (3)

Country Link
US (1) US20220318950A1 (zh)
CN (1) CN113592723B (zh)
WO (1) WO2021218414A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117528147A (zh) * 2024-01-03 2024-02-06 国家广播电视总局广播电视科学研究院 基于云边协同架构的视频增强传输方法、系统及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160213244A1 (en) * 2010-08-27 2016-07-28 Sony Corporation Image processing apparatus and method
CN108648163A (zh) * 2018-05-17 2018-10-12 厦门美图之家科技有限公司 一种人脸图像的增强方法及计算设备
CN109087306A (zh) * 2018-06-28 2018-12-25 众安信息技术服务有限公司 动脉血管图像模型训练方法、分割方法、装置及电子设备
CN109493317A (zh) * 2018-09-25 2019-03-19 哈尔滨理工大学 基于级联卷积神经网络的3d多椎骨分割方法
CN110458771A (zh) * 2019-07-29 2019-11-15 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345449B (zh) * 2018-07-17 2020-11-10 西安交通大学 一种基于融合网络的图像超分辨率及去非均匀模糊方法
WO2020108009A1 (en) * 2018-11-26 2020-06-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method, system, and computer-readable medium for improving quality of low-light images
US12033301B2 (en) * 2019-09-09 2024-07-09 Nvidia Corporation Video upsampling using one or more neural networks
KR20190117416A (ko) * 2019-09-26 2019-10-16 엘지전자 주식회사 동영상 프레임 해상도를 향상시키기 위한 방법 및 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160213244A1 (en) * 2010-08-27 2016-07-28 Sony Corporation Image processing apparatus and method
CN108648163A (zh) * 2018-05-17 2018-10-12 厦门美图之家科技有限公司 一种人脸图像的增强方法及计算设备
CN109087306A (zh) * 2018-06-28 2018-12-25 众安信息技术服务有限公司 动脉血管图像模型训练方法、分割方法、装置及电子设备
CN109493317A (zh) * 2018-09-25 2019-03-19 哈尔滨理工大学 基于级联卷积神经网络的3d多椎骨分割方法
CN110458771A (zh) * 2019-07-29 2019-11-15 深圳市商汤科技有限公司 图像处理方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
US20220318950A1 (en) 2022-10-06
CN113592723B (zh) 2024-04-02
CN113592723A (zh) 2021-11-02

Similar Documents

Publication Publication Date Title
CN109389556B (zh) 一种多尺度空洞卷积神经网络超分辨率重构方法及装置
CN111104962B (zh) 图像的语义分割方法、装置、电子设备及可读存储介质
WO2019120110A1 (zh) 图像重建方法及设备
CN113240580B (zh) 一种基于多维度知识蒸馏的轻量级图像超分辨率重建方法
WO2021098362A1 (zh) 视频分类模型构建、视频分类的方法、装置、设备及介质
US20210209459A1 (en) Processing method and system for convolutional neural network, and storage medium
CN110992265B (zh) 图像处理方法及模型、模型的训练方法及电子设备
CN106228512A (zh) 基于学习率自适应的卷积神经网络图像超分辨率重建方法
CN110298851B (zh) 人体分割神经网络的训练方法及设备
CN117499658A (zh) 使用神经网络生成视频帧
CN113129212B (zh) 图像超分辨率重建方法、装置、终端设备及存储介质
EP3952312A1 (en) Method and apparatus for video frame interpolation, and device and storage medium
US20230252605A1 (en) Method and system for a high-frequency attention network for efficient single image super-resolution
CN114298900A (zh) 图像超分方法和电子设备
CN115409855B (zh) 图像处理方法、装置、电子设备和存储介质
WO2021057463A1 (zh) 图像风格化处理方法、装置、电子设备及可读介质
WO2021218414A1 (zh) 视频增强方法及装置、电子设备、存储介质
CN111667401A (zh) 多层次渐变图像风格迁移方法及系统
CN113628115A (zh) 图像重建的处理方法、装置、电子设备和存储介质
US20240155071A1 (en) Text to video generation
Liu et al. Cross-resolution feature attention network for image super-resolution
WO2024001653A9 (zh) 特征提取方法、装置、存储介质及电子设备
WO2023040813A1 (zh) 人脸图像处理方法、装置、设备及介质
Ye et al. Multi-directional feature fusion super-resolution network based on nonlinear spiking neural P systems
CN109447900A (zh) 一种图像超分辨率重建方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21796267

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21796267

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21796267

Country of ref document: EP

Kind code of ref document: A1