WO2020186765A1 - 视频处理方法、装置以及计算机存储介质 - Google Patents

视频处理方法、装置以及计算机存储介质 Download PDF

Info

Publication number
WO2020186765A1
WO2020186765A1 PCT/CN2019/114458 CN2019114458W WO2020186765A1 WO 2020186765 A1 WO2020186765 A1 WO 2020186765A1 CN 2019114458 W CN2019114458 W CN 2019114458W WO 2020186765 A1 WO2020186765 A1 WO 2020186765A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolution kernel
sampling
frame
deformable convolution
video
Prior art date
Application number
PCT/CN2019/114458
Other languages
English (en)
French (fr)
Inventor
许翔宇
李沐辰
孙文秀
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to SG11202108771RA priority Critical patent/SG11202108771RA/en
Priority to JP2020573289A priority patent/JP7086235B2/ja
Publication of WO2020186765A1 publication Critical patent/WO2020186765A1/zh
Priority to US17/362,883 priority patent/US20210327033A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a video processing method, device, and computer storage medium.
  • the mixed noises reduce the visual quality of the video.
  • the video obtained under a small camera aperture and low-light scenes often contains noise, but noisy video also contains a lot of information.
  • the noise in the video will make this information uncertain and seriously affect viewing The viewer’s visual experience. Therefore, the denoising of video has important research significance and has become an important research topic of computer vision.
  • the embodiments of the present disclosure are to provide a video processing method, device, and computer storage medium.
  • embodiments of the present disclosure provide a video processing method, the method including:
  • the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point
  • the method before the obtaining the convolution parameter corresponding to the frame to be processed in the video sequence, the method further includes:
  • deep neural network training is performed to obtain a deformable convolution kernel.
  • the deep neural network training based on the sample video sequence to obtain the deformable convolution kernel includes:
  • Coordinate prediction and weight prediction are respectively performed on multiple consecutive video frames in the sample video sequence based on a deep neural network to obtain the predicted coordinates and predicted weights of the deformable convolution kernel, wherein the multiple consecutive video frames Including a sample reference frame and at least one adjacent frame;
  • sampling points of the deformable convolution kernel and the weights of the sampling points are used as the convolution parameters.
  • the sampling the predicted coordinates of the deformable convolution kernel to obtain the sampling points of the deformable convolution kernel includes:
  • the predicted coordinates of the deformable convolution kernel are input into a preset sampling model to obtain sampling points of the deformable convolution kernel.
  • the method further includes:
  • the pixel points and the predicted coordinates of the deformable convolution kernel are sampled and calculated through a preset sampling model, and the sampling value of the sampling point is determined according to the calculation result.
  • the performing denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame includes:
  • the convolution processing of the sampling points of the deformable convolution kernel and the weight of the sampling points with the frame to be processed to obtain the denoised video frame includes:
  • the denoised video frame is obtained.
  • the convolution operation of each pixel with the sampling point of the deformable convolution kernel and the weight of the sampling point to obtain the denoising pixel value corresponding to each pixel includes:
  • the denoising pixel value corresponding to each pixel is obtained.
  • embodiments of the present disclosure provide a video processing device, the video processing device includes an acquisition unit and a denoising unit, wherein:
  • the obtaining unit is configured to obtain a convolution parameter corresponding to a frame to be processed in a video sequence, wherein the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point;
  • the denoising unit is configured to perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame.
  • the video processing device further includes a training unit configured to perform deep neural network training based on the sample video sequence to obtain a deformable convolution kernel.
  • the video processing device further includes a prediction unit and a sampling unit, wherein:
  • the prediction unit is configured to perform coordinate prediction and weight prediction on consecutive multiple video frames in the sample video sequence based on a deep neural network to obtain the prediction coordinates and prediction weights of the deformable convolution kernel, wherein
  • the multiple consecutive video frames include a sample reference frame and at least one adjacent frame;
  • the sampling unit is configured to sample the predicted coordinates of the deformable convolution kernel to obtain sampling points of the deformable convolution kernel;
  • the acquiring unit is further configured to obtain the weight of the sampling point of the deformable convolution kernel according to the predicted coordinates and the predicted weight of the deformable convolution kernel; and combine the sampling points of the deformable convolution kernel and The weight of the sampling point is used as the convolution parameter.
  • the sampling unit is specifically configured to input the predicted coordinates of the deformable convolution kernel into a preset sampling model to obtain sampling points of the deformable convolution kernel.
  • the acquiring unit is further configured to acquire pixels in the sample reference frame and the at least one adjacent frame;
  • the sampling unit is further configured to perform sampling calculations on the pixel points and the predicted coordinates of the deformable convolution kernel based on the sampling points of the deformable convolution kernel through a preset sampling model, and determine according to the calculation result The sampling value of the sampling point.
  • the denoising unit is specifically configured to perform convolution processing on the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed to obtain the denoised video frame .
  • the video processing device further includes a convolution unit configured to combine each pixel with the sampling point of the deformable convolution kernel and the sampling point of the deformable convolution kernel for each pixel in the frame to be processed The weight of the sampling point is convolved to obtain the denoising pixel value corresponding to each pixel;
  • the denoising unit is specifically configured to obtain a denoised video frame according to the denoising pixel value corresponding to each pixel.
  • the convolution unit is specifically configured to perform a weighted sum calculation for each pixel, the sampling point of the deformable convolution kernel, and the weight of the sampling point; and according to the calculation result, obtain The denoising pixel value corresponding to each pixel.
  • embodiments of the present disclosure provide a video processing device, the video processing device includes: a memory and a processor; wherein,
  • the memory is configured to store a computer program that can run on the processor
  • the processor is configured to execute the steps of the method according to any one of the first aspects when running the computer program.
  • an embodiment of the present disclosure provides a computer storage medium, the computer storage medium stores a video processing program, and when the video processing program is executed by at least one processor, the implementation is as described in any one of the first aspect. Method steps.
  • embodiments of the present disclosure provide a terminal device, wherein the terminal device at least includes the video processing apparatus according to any one of the second aspect or the third aspect.
  • a computer program product according to an embodiment of the present disclosure, wherein the computer program product stores a video processing program, and when the video processing program is executed by at least one processor, the implementation is as described in any one of the first aspect Method steps.
  • the convolution parameters corresponding to the frames to be processed in the video sequence are first obtained, where the convolution parameters include the sampling points of the deformable convolution kernel and The weight of the sampling point; since the convolution parameter is obtained by extracting the information of consecutive frames of the video, it can effectively reduce the image blur, detail loss and ghosting problems caused by the motion between frames in the video; and Perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain the denoised video frame; in this way, since the weight of the sampling points can be based on the position of the sampling points Different and change, which can make the video denoising effect better and improve the image quality of the video.
  • FIG. 1 is a schematic flowchart of a video processing method provided by an embodiment of the disclosure
  • FIG. 2 is a schematic structural diagram of a deep convolutional neural network provided by an embodiment of the disclosure
  • FIG. 3 is a schematic flowchart of another video processing method provided by an embodiment of the disclosure.
  • FIG. 4 is a schematic flowchart of another video processing method provided by an embodiment of the disclosure.
  • FIG. 5 is a schematic flowchart of still another video processing method provided by an embodiment of the disclosure.
  • FIG. 6 is a schematic diagram of the overall architecture of a video processing method provided by an embodiment of the disclosure.
  • FIG. 7 is a schematic flowchart of still another video processing method provided by an embodiment of the disclosure.
  • FIG. 8 is a schematic diagram of a detailed architecture of a video processing method provided by an embodiment of the disclosure.
  • FIG. 9 is a schematic diagram of the composition structure of a video processing device provided by an embodiment of the disclosure.
  • FIG. 10 is a schematic diagram of a specific hardware structure of a video processing device provided by an embodiment of the disclosure.
  • FIG. 11 is a schematic diagram of the composition structure of a terminal device provided by an embodiment of the disclosure.
  • the embodiments of the present disclosure provide a video processing method, which is applied to a video processing device, and the device can be set in such as a smart phone, a tablet computer, a notebook computer, a handheld computer, or a personal digital assistant (PDA).
  • PDA personal digital assistant
  • PMP Portable Media Player
  • wearable devices wearable devices
  • navigation devices navigation devices and other mobile terminal equipment
  • fixed terminal equipment such as digital TVs, desktop computers, etc.
  • FIG. 1 shows a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
  • the method may include:
  • S101 Obtain a convolution parameter corresponding to a frame to be processed in a video sequence, where the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point;
  • the video sequence is captured by cameras, smart phones, tablets, and many other terminal devices.
  • small cameras and terminal devices such as smart phones and tablet computers are usually equipped with smaller-sized image sensors and less than ideal optical devices.
  • the denoising processing of video frames is particularly important for these devices.
  • High-end cameras and camcorders are usually equipped with larger image sensors and better optics.
  • the video frames captured by these devices have good imaging quality under normal lighting conditions; however, they are captured in low-light scenes.
  • the video frame also often contains a lot of noise. At this time, the denoising processing of the video frame is still required.
  • video sequences can be obtained through the collection of cameras, smart phones, tablet computers and many other terminal devices.
  • the video sequence contains frames to be processed for denoising processing.
  • the deformable convolution kernel can be obtained; then the sampling points of the deformable convolution kernel and the weight of the sampling points are obtained, and the As the convolution parameter of the frame to be processed.
  • Deep Convolutional Neural Networks (Deep Convolutional Neural Networks, Deep CNN) is a type of feedforward neural network that includes convolution operations and has a deep structure, and is one of the representative algorithms of deep neural networks for deep learning.
  • FIG. 2 shows a schematic structural diagram of a deep convolutional neural network provided by an embodiment of the present disclosure.
  • the structure of the deep convolutional neural network includes a convolutional layer, a pooling layer, and a bilinear upsampling layer; among them, the unfilled layer is the convolutional layer, and the black filled layer is the pooling layer.
  • the gray-filled layer is a bilinear up-sampling layer; the number of channels corresponding to each layer (ie, the number of deformable convolution kernels included in each convolution layer) is shown in Table 1.
  • the number of channels of the first 25-layer coordinate prediction network (represented by the V network) and the weight prediction network (represented by the F network) are the same, indicating that the V network and the F network can share the first 25 layers Feature information, so that the amount of calculation on the network can be reduced through the sharing of feature information.
  • the F network can be used to obtain the prediction weights of the deformable convolution kernel through a sample video sequence (ie, multiple consecutive video frames). Obtain the predicted coordinates of the deformable convolution kernel.
  • the sampling points of the deformable convolution kernel can be obtained; according to the predicted weight of the deformable convolution kernel and the predicted coordinates of the deformable convolution kernel, The weight of the sampling points of the deformable convolution kernel can be obtained, and then the convolution parameters can be obtained.
  • S102 Perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame.
  • the convolution operation can be performed on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points, and the result of the convolution operation is Video frame after denoising.
  • the denoising process is performed on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain the denoised For video frames, the method may include:
  • sampling points of the deformable convolution kernel and the weights of the sampling points are convolved with the frame to be processed to obtain the denoised video frame.
  • the denoising processing for the frame to be processed may be obtained by convolution processing the sampling points of the deformable convolution kernel and the weight of the sampling points with the frame to be processed.
  • the denoising processing for the frame to be processed may be obtained by convolution processing the sampling points of the deformable convolution kernel and the weight of the sampling points with the frame to be processed.
  • it can be a weighted summation of each pixel, the sampling point of the deformable convolution kernel and the weight of the sampling point to obtain the denoising pixel value corresponding to each pixel , So as to achieve the denoising processing of the frame to be processed.
  • the video sequence contains frames to be processed for denoising processing.
  • the convolution parameters include the sampling points of the deformable convolution kernel and the weight of the sampling points; according to the sampling points of the deformable convolution kernel and the The weight of the sampling point performs denoising processing on the frame to be processed to obtain a denoised video frame.
  • the convolution parameter is obtained by extracting the information of continuous frames of the video, it can effectively reduce the image blur, detail loss and ghosting problems caused by the motion between frames in the video; and the weight of the sampling point is also It can be changed according to the position of the sampling point, which can make the video denoising effect better and improve the imaging quality of the video.
  • FIG. 3 shows a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
  • the method may further include:
  • S201 Perform deep neural network training based on the sample video sequence to obtain a deformable convolution kernel.
  • multiple consecutive video frames are selected from the video sequence as the sample video sequence, where the sample video sequence not only includes the sample reference frame, but also includes at least one adjacent frame adjacent to the sample reference frame.
  • the at least one adjacent frame may be at least one adjacent frame that is adjacent to the sample reference frame in the forward direction, or at least one adjacent frame that is adjacent in the backward direction of the sample reference frame, or it may be before the sample reference frame.
  • the multiple adjacent frames that are adjacent to each other and adjacent to the rear are not specifically limited in the embodiment of the present disclosure. The following will take the sample reference frame forward adjacent and backward adjacent multiple adjacent frames as a sample video sequence as an example for description.
  • the sample reference frame is the 0th frame in the video sequence
  • the sample reference The at least one adjacent frame adjacent to the frame includes the forward adjacent -T frame, the -(T-1) frame, ..., the -2 frame, the -1 frame, and the backward adjacent frame 1,
  • the deformable convolution kernel can be obtained by performing deep neural network training on the sample video sequence, and each pixel in the frame to be processed can be subjected to convolution processing with the corresponding deformable convolution kernel.
  • the denoising processing of the frame to be processed is achieved; compared with the fixed convolution kernel in the prior art, the embodiment of the present disclosure adopts a deformable convolution kernel, which can make the video processing of the frame to be processed achieve a better denoising effect.
  • the corresponding deformable convolution kernel is also three-dimensional; unless otherwise specified, the deformable convolution kernel in the embodiment of the present disclosure refers to a three-dimensional deformable convolution nuclear.
  • a deep neural network can be used to perform coordinate prediction and weight prediction on consecutive multiple video frames in the sample video sequence, and first obtain the deformable volume The coordinate prediction and weight prediction of the product kernel; and then according to the coordinate prediction and weight prediction, the sampling points of the deformable convolution kernel and the weight of the sampling points are further obtained.
  • FIG. 4 shows a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
  • the method may include:
  • S201a Perform coordinate prediction and weight prediction on multiple consecutive video frames in the sample video sequence based on the deep neural network, to obtain the prediction coordinates and prediction weights of the deformable convolution kernel;
  • the consecutive multiple video frames include a sample reference frame and at least one adjacent frame thereof. If at least one adjacent frame includes a T frame that is adjacent in the forward direction and a T frame that is adjacent in the backward direction, the number of consecutive video frames is (2T+1) frames in total.
  • the predicted coordinates of the deformable convolution kernel are predicted by the weight prediction network to obtain the predicted weight of the deformable convolution kernel.
  • the frame to be processed may be a sample reference frame in a sample video sequence for video denoising processing.
  • the number of pixels contained in the frame to be processed can be obtained as H ⁇ W. Since the deformable convolution kernel is three-dimensional, and the size of the deformable convolution kernel is composed of N sampling points, the number of predicted coordinates of the deformable convolution kernel that can be obtained in the frame to be processed is H ⁇ W ⁇ N ⁇ 3, and the number of predictive weights of the deformable convolution kernel that can be obtained in the frame to be processed is H ⁇ W ⁇ N.
  • S201b Sampling the predicted coordinates of the deformable convolution kernel to obtain sampling points of the deformable convolution kernel
  • the predicted coordinates of the deformable convolution kernel can be sampled, so as to obtain the sampling of the deformable convolution kernel point.
  • the predicted coordinates of the deformable convolution kernel can be sampled through a preset sampling model.
  • FIG. 5 shows a schematic flowchart of still another video processing method provided by an embodiment of the present disclosure.
  • the method for sampling the predicted coordinates of the deformable convolution kernel to obtain the sampling points of the deformable convolution kernel may include:
  • S201b-1 Input the predicted coordinates of the deformable convolution kernel into a preset sampling model to obtain sampling points of the deformable convolution kernel.
  • the preset sampling model means a preset model for sampling the predicted coordinates of the deformable convolution kernel.
  • the preset sampling model may refer to a trilinear sampler, or may refer to other sampling models, which is not specifically limited in the embodiment of the present disclosure.
  • the method may further include:
  • S201b-2 Acquire pixels in the sample reference frame and the at least one adjacent frame
  • sample reference frame and the at least one adjacent frame have a total of (2T+1) frames, and the width of each frame is represented by W and the height is represented by H, then the number of pixels that can be obtained It is H ⁇ W ⁇ (2T+1).
  • S201b-3 Based on the sampling points of the deformable convolution kernel, perform sampling calculations on the pixel points and the predicted coordinates of the deformable convolution kernel through a preset sampling model, and determine the sampling points according to the calculation results The sampled value.
  • all pixels and the predicted coordinates of the deformable convolution kernel can be input into the preset sampling model, and the output of the preset sampling model is the sampling point of the deformable convolution kernel And the sampling value of the sampling point.
  • the number of sampling points is H ⁇ W ⁇ N
  • the number of corresponding sampling values is also H ⁇ W ⁇ N.
  • the tri-linear sampler can not only determine the sampling point of the deformable convolution kernel according to the predicted coordinates of the deformable convolution kernel, but also determine the sampling value corresponding to the sampling point.
  • the (2T+1) frame in the sample video sequence is composed of a sample reference frame, T adjacent frames forwardly adjacent to the sample reference frame, and the following It is composed of T adjacent frames; the number of pixels contained in the (2T+1) frame is H ⁇ W ⁇ (2T+1), and these H ⁇ W ⁇ (2T+1)
  • the pixel value corresponding to each pixel and the H ⁇ W ⁇ N ⁇ 3 predicted coordinates are input to the trilinear sampler for sampling calculation.
  • the sampling calculation of the three linear sampler is shown in equation (1),
  • n is a positive integer greater than or equal to 1 and less than or equal to N
  • u (y, x, n) , v (y, x, n) , z (y, x, n) respectively represent the predicted coordinates of the nth sampling point at the pixel position (y, x) in the three dimensions (horizontal, vertical, and time)
  • X( i, j, m) represents the pixel value at the pixel point (i, j) of the m-th frame in the video sequence.
  • the predictive coordinates of the deformable convolution kernel are changed. It adds a relative value at the position coordinates (x n ,y n, t n ) of each sampling point Offset variable.
  • u (y, x, n) , v (y, x, n) , z (y, x, n) can be expressed by the following formulas,
  • u (y, x, n) represents the pixel point position (y, x) at the nth sampling point corresponding to the predicted coordinates in the horizontal dimension
  • V(y, x, n, 1) represents the pixel point position
  • the nth sampling point at y, x) corresponds to the offset variable in the horizontal dimension
  • v (y, x, n) indicates that the nth sampling point at the pixel point position (y, x) corresponds to the vertical dimension
  • the predicted coordinates of, V(y,x,n,2) represents the offset variable in the vertical dimension corresponding to the nth sampling point at the pixel position (y,x)
  • z (y,x,n) represents the pixel
  • the nth sampling point at the point location (y, x) corresponds to the predicted coordinates in the time dimension
  • V(y, x, n, 3) represents the nth sampling point at the pixel point location (y, x)
  • the sampling points of the deformable convolution kernel can be determined on the one hand, and the sampling value of each sampling point can also be obtained on the other hand; since the predicted coordinates of the deformable convolution kernel can be changed, it is explained
  • the position of each sampling point is not fixed, that is, the deformable convolution kernel in the embodiment of the present disclosure is not a fixed convolution kernel, but a deformable convolution kernel.
  • the embodiment of the present disclosure adopts a deformable convolution kernel, which can make the video processing of the frame to be processed achieve a better denoising effect.
  • S201d Use the sampling points of the deformable convolution kernel and the weight of the sampling points as the convolution parameters.
  • the samples of the deformable convolution kernel can be obtained according to the obtained prediction coordinates of the deformable convolution kernel and the prediction weight of the deformable convolution kernel.
  • the weight of the point; thus, the convolution parameter corresponding to the frame to be processed is obtained.
  • the predicted coordinates here refer to the relative coordinates of the deformable convolution kernel.
  • the width of each frame in the sample video sequence is represented by W and the height is represented by H.
  • the deformable convolution kernel is three-dimensional, and the size of the deformable convolution kernel Is composed of N sampling points, then the number of predictive coordinates of the deformable convolution kernel that can be obtained in the frame to be processed is H ⁇ W ⁇ N ⁇ 3, and the deformable frame that can be obtained in the frame to be processed
  • the number of prediction weights of the convolution kernel is H ⁇ W ⁇ N.
  • the number of sampling points of the deformable convolution kernel is H ⁇ W ⁇ N, and the number of weights of the sampling points is also H ⁇ W ⁇ N.
  • N the number of the deformable convolution kernel contained in each convolutional layer
  • N the sampling points contained in the deformable convolution kernel
  • the number is N; generally speaking, N can take a value of 9, but in practical applications, it can also be specifically set according to actual conditions, and the embodiment of the present disclosure does not specifically limit it. It should also be noted that for these N sampling points, in the embodiment of the present disclosure, since the predicted coordinates of the deformable convolution kernel are changeable, it indicates that the position of each sampling point is not fixed.
  • the V network has a relative offset for each sampling point; it further shows that the deformable convolution kernel in the embodiment of the present disclosure is not a fixed convolution kernel, but a deformable convolution kernel, so that the implementation of the present disclosure
  • the example can be applied to video processing with large motion between frames; in addition, according to different sampling points, the weight of each sampling point obtained by combining the F network is also different; that is, the embodiments of the present disclosure are not only A deformable convolution kernel and a variable weight are used.
  • the video processing of the frame to be processed can achieve better denoising effect.
  • the network can also adopt an encoder-decoder design structure; among them, in the working stage of the encoder, the convolutional neural network can perform 4 downsampling, and each time Downsampling, for the input frame to be processed H ⁇ W (H represents the height of the frame to be processed, W represents the width of the frame to be processed), the output video frame of H/2 ⁇ W/2 can be obtained, which is mainly used for The feature image is extracted for the frame to be processed; in the working stage of the decoder, the convolutional neural network can be up-sampled 4 times, and each time up-sampling, for the input frame to be processed H ⁇ W (H represents the frame to be processed Height, W represents the width of the frame to be processed), the output 2H ⁇ 2W video frame can be obtained, which is mainly used to restore the original size video frame according to the feature image extracted by the encoder; here, for down-sampling or up-sampling The number of sampling times can be specifically set according to actual
  • connection relationship that is, a skip connection; for example, there is a skip connection between the 6th layer and the 22nd layer.
  • a skip connection relationship between the layer and the 19th layer, and the 12th layer and the 16th layer have a skip connection relationship; this can also enable the decoder stage to comprehensively use the low-order and high-order features to make the frame to be processed Video denoising effect is better.
  • X represents an input terminal for inputting a sample video sequence; wherein, the sample video sequence is from video Selected in the sequence, the sample video sequence is composed of 5 consecutive frames (for example, including the sample reference frame, the two adjacent frames forwardly adjacent to the sample reference frame, and the two adjacent frames backward adjacent to the sample reference frame Frame) composition; then coordinate prediction and weight prediction are performed for the continuous frames input by X; for coordinate prediction, a coordinate prediction network (represented by V network) can be established, and the predicted coordinates of the deformable convolution kernel can be obtained through the V network; for weight For prediction, a weight prediction network (represented by F network) can be established, and the prediction weight of the deformable convolution kernel can be obtained through the F network; then the continuous frames input by X and the predicted coordinates of the deformable convolution kernel obtained by prediction are all input to In the preset sampling model, the sampling points of the de
  • the output result is the denoised video frame (using Y Representation); through the continuous frame information in the video sequence, not only the denoising processing of the frame to be processed is realized, but also because the sampling point position of the deformable convolution kernel is changed (that is, the deformable convolution kernel is used), and each The weight of each sampling point is also changeable, which can also make the video denoising effect better.
  • the sampling points of the deformable convolution kernel and the weight of the sampling points can be obtained; in this way, the frame to be processed is denoised according to the sampling points of the deformable convolution kernel and the weight of the sampling points, so as to obtain denoising After the video frame.
  • the denoised video frame may be obtained by convolution processing the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed.
  • FIG. 7 shows a schematic flowchart of still another video processing method provided by an embodiment of the present disclosure.
  • the method for performing convolution processing on the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed to obtain the denoised video frame may include:
  • S102a For each pixel in the frame to be processed, perform a convolution operation on each pixel, the sample point of the deformable convolution kernel and the weight of the sample point, to obtain the corresponding pixel point Denoising pixel value;
  • the denoising pixel value corresponding to each pixel may be calculated by performing a weighted summation calculation for each pixel, the sampling point of the deformable convolution kernel, and the weight of the sampling point.
  • S102a may include:
  • S102a-1 Perform a weighted sum calculation on each pixel, the sampling point of the deformable convolution kernel and the weight of the sampling point;
  • S102a-2 Obtain the denoising pixel value corresponding to each pixel according to the calculation result.
  • the denoising pixel value corresponding to each pixel can be obtained by performing a weighted sum calculation of the sampling points of the deformable convolution kernel and the weight values of the sampling points for each pixel.
  • the deformable convolution kernel that performs convolution with the pixel contains N sampling points.
  • the sampling value of each sampling point and the value of each sampling point The weight is weighted and calculated, and then the N sampling points are summed.
  • the final result is the denoising pixel value corresponding to each pixel in the frame to be processed; specifically, see equation (3),
  • the denoising pixel value corresponding to each pixel in the frame to be processed can be obtained through calculation.
  • the position of each sampling point is not fixed, and the weight of each sampling point is also different; that is, the denoising processing of the embodiment of the present disclosure not only uses deformable
  • the convolution kernel also uses variable weights; compared with the fixed convolution kernel in the prior art or artificially set weights, the video processing of the frame to be processed can achieve a better denoising effect.
  • S102b Obtain a denoised video frame according to the denoised pixel value corresponding to each pixel.
  • each pixel in the frame to be processed can be convolved with the corresponding deformable convolution kernel, that is, each pixel in the frame to be processed can be convolved with the sampling point and sampling of the deformable convolution kernel.
  • the weight of the point is processed by convolution operation to obtain the denoising pixel value corresponding to each pixel; in this way, the denoising process of the frame to be processed is realized.
  • FIG. 8 shows a detailed architectural schematic diagram of a video processing method provided by an embodiment of the present disclosure.
  • the sample video sequence 801 is composed of multiple consecutive video frames (for example, including a sample reference frame, two adjacent frames forwardly adjacent to the sample reference frame, and a sample The reference frame is composed of two adjacent frames in the backward direction; then the input sample video sequence 801 is predicted and weighted based on the deep neural network.
  • a coordinate prediction network 802 and a weight prediction network 803 can be established; Perform coordinate prediction according to the coordinate prediction network 802 to obtain the prediction coordinates 804 of the deformable convolution kernel; perform weight prediction according to the weight prediction network 803 to obtain the prediction weight 805 of the deformable convolution kernel; combine the input sample video sequence 801 and the variable
  • the predicted coordinates 804 of the deformable convolution kernel are jointly input to the trilinear sampler 806, and the trilinear sampler 806 performs sampling processing, and the output of the trilinear sampler 806 is the sampling point 807 of the deformable convolution kernel;
  • the sampling points 807 of the deformed convolution kernel and the prediction weight 805 of the deformable convolution kernel are subjected to a convolution operation 808 with the frame to be processed, and finally a denoised video frame 809 is output.
  • the weight of the sampling points of the deformable convolution kernel can be obtained according to the predicted coordinates 804 of the deformable convolution kernel and the predicted weight 805 of the deformable convolution kernel; in this way, for the convolution operation 808, the sampling points of the deformable convolution kernel and the weights of the sampling points may be convolved with the frame to be processed, so as to realize the denoising processing of the frame to be processed.
  • the deep neural network training of the sample video sequence through the deep neural network can obtain the deformable convolution kernel.
  • the prediction coordinates and prediction weights of the deformable convolution kernel since the prediction coordinates are changed, it is explained that the position of each sampling point is changed, and further that the convolution kernel in the embodiment of the present disclosure is not fixed.
  • the convolution kernel is a deformable convolution kernel, so that the embodiments of the present disclosure can be applied to video processing with large motion between frames; in addition, according to different sampling points, the weight of each sampling point can also be That is to say, the embodiment of the present disclosure not only uses a deformable convolution kernel, but also uses a changeable prediction weight, which can make the video processing of the frame to be processed achieve a better denoising effect.
  • pixel-based adaptive Level information allocates different sampling points to track the movement of the same position in consecutive frames of the video, and can better make up for the lack of single frame information by using multi-frame information, and also enables the method of the embodiments of the present disclosure to be applied to video restoration Scene.
  • the deformable convolution kernel can also be regarded as an efficient extractor of time-series optical flow, which makes full use of the multi-frame information in the continuous frames of the video, and can also apply the method of the embodiment of the present disclosure to other pixel-level In the video processing scene of information; in addition, the method based on the embodiments of the present disclosure can also achieve the purpose of high-quality video imaging under limited hardware quality or low-light conditions.
  • the foregoing embodiment provides a video processing method by obtaining convolution parameters corresponding to frames to be processed in a video sequence, where the convolution parameters include the sampling points of the deformable convolution kernel and the weight of the sampling points;
  • the sampling points of the deformable convolution kernel and the weights of the sampling points perform denoising processing on the frame to be processed to obtain a denoised video frame; in this way, since the convolution parameter is obtained by extracting continuous frames of video
  • the information is obtained, which can effectively reduce the image blur, detail loss and ghosting problems caused by the motion between frames in the video; and the weight of the sampling point can also be changed according to the position of the sampling point, which can make the video
  • the denoising effect is better and the image quality of the video is improved.
  • FIG. 9 shows the composition of a video processing device 90 provided by an embodiment of the present disclosure.
  • the video processing device 90 may include: an acquisition unit 901 and a denoising unit 902, among them,
  • the obtaining unit 901 is configured to obtain a convolution parameter corresponding to a frame to be processed in a video sequence, where the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point;
  • the denoising unit 902 is configured to perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame.
  • the video processing device 90 further includes a training unit 903 configured to perform deep neural network training based on the sample video sequence to obtain a deformable convolution kernel.
  • the video processing device 90 further includes a prediction unit 904 and a sampling unit 905, where:
  • the prediction unit 904 is configured to perform coordinate prediction and weight prediction on consecutive multiple video frames in the sample video sequence based on a deep neural network to obtain the prediction coordinates and prediction weights of the deformable convolution kernel, wherein,
  • the multiple consecutive video frames include a sample reference frame and at least one adjacent frame;
  • the sampling unit 905 is configured to sample the predicted coordinates of the deformable convolution kernel to obtain sampling points of the deformable convolution kernel;
  • the acquiring unit 901 is further configured to obtain the weights of the sampling points of the deformable convolution kernel according to the predicted coordinates and the predicted weights of the deformable convolution kernel; and the sampling points of the deformable convolution kernel And the weight of the sampling point as the convolution parameter.
  • the sampling unit 905 is specifically configured to input the predicted coordinates of the deformable convolution kernel into a preset sampling model to obtain sampling points of the deformable convolution kernel.
  • the acquiring unit 901 is further configured to acquire pixels in the sample reference frame and the at least one adjacent frame;
  • the sampling unit 905 is further configured to perform sampling calculations on the pixel points and the predicted coordinates of the deformable convolution kernel based on the sampling points of the deformable convolution kernel through a preset sampling model, and according to the calculated results Determine the sampling value of the sampling point.
  • the denoising unit 902 is specifically configured to perform convolution processing on the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed to obtain the denoised video frame.
  • the video processing device 90 further includes a convolution unit 906 configured to combine each pixel with the deformable convolution kernel for each pixel in the frame to be processed Performing a convolution operation on the sampling points of and the weights of the sampling points to obtain the denoising pixel value corresponding to each pixel;
  • the denoising unit 902 is specifically configured to obtain a denoised video frame according to the denoising pixel value corresponding to each pixel.
  • the convolution unit 906 is specifically configured to perform a weighted sum calculation on each pixel point, the sampling point of the deformable convolution kernel and the weight of the sampling point; and according to the calculation result, Obtain the denoising pixel value corresponding to each pixel.
  • a "unit" may be a part of a circuit, a part of a processor, a part of a program, or software, etc., of course, may also be a module, or may be non-modular.
  • the various components in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • this embodiment provides a computer storage medium that stores a video processing program that implements the steps of the method in the foregoing embodiment when the video processing program is executed by at least one processor.
  • FIG. 10 shows the specific hardware structure of the video processing device 90 provided by an embodiment of the present disclosure, which may include: a network interface 1001, a memory 1002, and a processor 1003;
  • the various components are coupled together through the bus system 1004.
  • the bus system 1004 is used to implement connection and communication between these components.
  • the bus system 1004 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the bus system 1004 in FIG. 10.
  • the network interface 1001 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
  • the memory 1002 is configured to store a computer program that can run on the processor 1003;
  • the processor 1003 is configured to execute: when running the computer program:
  • the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point
  • An embodiment of the present application provides a computer program product, wherein the computer program product stores a video processing program, and when the video processing program is executed by at least one processor, the steps of the method described in the foregoing embodiments are implemented.
  • the memory 1002 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM, ESDRAM Synchronous Link Dynamic Random Access Memory
  • Synchlink DRAM Synchronous Link Dynamic Random Access Memory
  • DRRAM Direct Rambus RAM
  • the processor 1003 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the above method can be completed by hardware integrated logic circuits in the processor 1003 or instructions in the form of software.
  • the above-mentioned processor 1003 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • FPGA ready-made programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1002, and the processor 1003 reads the information in the memory 1002, and completes the steps of the foregoing method in combination with its hardware.
  • the embodiments described herein can be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Equipment (DSP Device, DSPD), programmable Logic Device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processors, controllers, microcontrollers, microprocessors, and others for performing the functions described in this disclosure Electronic unit or its combination.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device Digital Signal Processing Equipment
  • PLD programmable Logic Device
  • Field-Programmable Gate Array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the technology described herein can be implemented through modules (such as procedures, functions, etc.) that perform the functions described herein.
  • the software codes can be stored in the memory and executed by the processor.
  • the memory can be implemented in the processor or external to the processor.
  • the processor 1003 is further configured to execute the steps of the method in the foregoing embodiment when the computer program is running.
  • FIG. 11 shows a schematic diagram of the composition structure of a terminal device 110 provided by an embodiment of the present disclosure; wherein, the terminal device 110 at least includes any video processing apparatus 90 involved in the foregoing embodiments.
  • the technical solution of the present disclosure essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present disclosure.
  • a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Transforming Electric Information Into Light Information (AREA)
  • Image Analysis (AREA)
  • Picture Signal Circuits (AREA)

Abstract

本公开实施例公开了一种视频处理方法、装置以及计算机存储介质,该方法包括:获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。

Description

视频处理方法、装置以及计算机存储介质
相关申请的交叉引用
本申请基于申请号为201910210075.5、申请日为2019年03月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及一种视频处理方法、装置以及计算机存储介质。
背景技术
在视频的采集、传输和接收过程中,通常会有各种噪声夹杂在其中,夹杂的噪声降低了视频的视觉质量。例如,在相机光圈较小以及低光场景下所得到的视频往往包含有噪声,但是带噪声的视频中也包含了大量的信息,视频中的噪声会使得这些信息具有不确定性,严重影响观看者的视觉感受。因此,视频的去噪具有重要的研究意义,已经成为计算机视觉的重要研究课题。
然而目前的解决方案仍然存在不足,尤其是当视频中连续的帧与帧之间存在运动或者相机自身存在抖动时,不仅无法将噪声去除干净,还容易导致视频中图像细节的损失或者图像边缘的模糊与重影。
发明内容
本公开实施例在于提出一种视频处理方法、装置以及计算机存储介质。
本公开的技术方案是这样实现的:
第一方面,本公开实施例提供了一种视频处理方法,所述方法包括:
获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
在上述方案中,在所述获取视频序列中待处理帧对应的卷积参数之前,所述方法还包括:
基于样本视频序列进行深度神经网络训练得到可变形卷积核。
在上述方案中,所述基于样本视频序列进行深度神经网络训练得到可变形 卷积核,包括:
基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;
对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;
根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;
将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
在上述方案中,所述对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点,包括:
将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
在上述方案中,在所述获得所述可变形卷积核的采样点之后,所述方法还包括:
获取所述样本参考帧及所述至少一个相邻帧中的像素点;
基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
在上述方案中,所述根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧,包括:
将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
在上述方案中,所述将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧,包括:
针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;
根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
在上述方案中,所述将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值,包括:
将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;
根据计算的结果,获得每个像素点对应的去噪像素值。
第二方面,本公开实施例提供了一种视频处理装置,所述视频处理装置包括获取单元和去噪单元,其中,
所述获取单元,配置为获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
所述去噪单元,配置为根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
在上述方案中,所述视频处理装置还包括训练单元,配置为基于样本视频序列进行深度神经网络训练得到可变形卷积核。
在上述方案中,所述视频处理装置还包括预测单元和采样单元,其中,
所述预测单元,配置为基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;
所述采样单元,配置为对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;
所述获取单元,还配置为根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;以及将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
在上述方案中,所述采样单元,具体配置为将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
在上述方案中,所述获取单元,还配置为获取所述样本参考帧及所述至少一个相邻帧中的像素点;
所述采样单元,还配置为基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
在上述方案中,所述去噪单元,具体配置为将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
在上述方案中,所述视频处理装置还包括卷积单元,配置为针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;
所述去噪单元,具体配置为根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
在上述方案中,所述卷积单元,具体配置为将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;以及根据计算的结果,获得每个像素点对应的去噪像素值。
第三方面,本公开实施例提供了一种视频处理装置,所述视频处理装置包括:存储器和处理器;其中,
所述存储器,配置为存储能够在所述处理器上运行的计算机程序;
所述处理器,配置为在运行所述计算机程序时,执行如第一方面中任一项 所述方法的步骤。
第四方面,本公开实施例提供了一种计算机存储介质,所述计算机存储介质存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现如第一方面中任一项所述方法的步骤。
第五方面,本公开实施例提供了一种终端设备,其中,所述终端设备至少包括如第二方面中任一项、或者如第三方面所述的视频处理装置。
第六方面,本公开实施例一种计算机程序产品,其中,所述计算机程序产品存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现如第一方面中任一项所述方法的步骤。
本公开实施例所提供的一种视频处理方法、装置以及计算机存储介质,首先获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;由于该卷积参数是通过提取视频连续帧的信息来得到的,能够有效减少视频中帧与帧之间运动所带来的图像模糊、细节损失与重影问题;再根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧;这样,由于采样点的权重可以根据采样点位置的不同而变化,从而能够使得视频去噪效果更佳,提高了视频的成像质量。
附图说明
图1为本公开实施例提供的一种视频处理方法的流程示意图;
图2为本公开实施例提供的一种深度卷积神经网络的结构示意图;
图3为本公开实施例提供的另一种视频处理方法的流程示意图;
图4为本公开实施例提供的又一种视频处理方法的流程示意图;
图5为本公开实施例提供的再一种视频处理方法的流程示意图;
图6为本公开实施例提供的一种视频处理方法的总体架构示意图;
图7为本公开实施例提供的再一种视频处理方法的流程示意图;
图8为本公开实施例提供的一种视频处理方法的详细架构示意图;
图9为本公开实施例提供的一种视频处理装置的组成结构示意图;
图10为本公开实施例提供的一种视频处理装置的具体硬件结构示意图;
图11为本公开实施例提供的一种终端设备的组成结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。
本公开实施例提供了一种视频处理的方法,该方法应用于视频处理装置中, 该装置可以设置在诸如智能手机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant,PDA)、便捷式媒体播放器(Portable Media Player,PMP)、可穿戴设备、导航装置等移动式终端设备中,也可以设置在诸如数字TV、台式计算机等固定式终端设备中,本公开实施例不作具体限定。
参见图1,其示出了本公开实施例提供的一种视频处理方法的流程示意图,该方法可以包括:
S101:获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
需要说明的是,视频序列是通过摄像机、智能手机、平板电脑和许多其他终端设备进行采集而捕获到的。其中,小型摄像机和诸如智能手机、平板电脑等终端设备通常配置有较小尺寸的图像传感器和不太理想的光学器件,此时视频帧的去噪处理对于这些设备尤其重要。高端摄像机和摄录机等通常配置有更大尺寸的图像传感器和更好的光学器件,使用这些设备所捕获的视频帧在正常光照条件下具有不错的成像质量;然而在低光场景下所捕获的视频帧也往往包含有大量噪声,此时仍然需要进行视频帧的去噪处理。
这样,通过摄像机、智能手机、平板电脑和许多其他终端设备的采集,可以获取到视频序列。其中,该视频序列中包含有待进行去噪处理的待处理帧。通过对该视频序列中的连续帧(即连续的多个视频帧)进行深度神经网络训练,可以得到可变形卷积核;然后获取可变形卷积核的采样点以及采样点的权重,将其作为待处理帧的卷积参数。
在一些实施例中,深度卷积神经网络(Deep Convolutional Neural Networks,Deep CNN)是一类包含卷积运算且具有深度结构的前馈神经网络,是深度神经网络进行深度学习的代表算法之一。
参见图2,其示出了本公开实施例提供的一种深度卷积神经网络的结构示意图。如图2所示,该深度卷积神经网络的结构中包含有卷积层、池化层和双线性上采样层;其中,无填充颜色的层为卷积层,黑色填充的层为池化层,灰色填充的层为双线性上采样层;每一层对应的通道数(即,每一个卷积层中所包含的可变形卷积核数量)如表1所示。从表1中可以看出,前25层坐标预测网络(用V网络表示)和权重预测网络(用F网络表示)的通道数是相同的,表明了V网络和F网络可以共享前25层的特征信息,这样通过特征信息的共享可以减小网络的计算量。其中,F网络可以用于通过样本视频序列(即连续的多个视频帧)来获取可变形卷积核的预测权重,V网络可以用于通过样本视频序列(即连续的多个视频帧)来获取可变形卷积核的预测坐标,根据可变形卷积核的预测坐标,能够得到可变形卷积核的采样点;根据可变形卷积核的预测权重和可变形卷积核的预测坐标,能够得到可变形卷积核的采样点的权重,进而得到了卷积参数。
表1
Figure PCTCN2019114458-appb-000001
S102:根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
需要说明的是,在获取到待处理帧对应的卷积参数之后,可以根据可变形卷积核的采样点以及采样点的权重与待处理帧进行卷积运算处理,卷积运算的结果即为去噪后的视频帧。
具体地,在一些实施例中,对于S102来说,所述根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧,该方法可以包括:
将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
也就是说,针对待处理帧的去噪处理,可以是由可变形卷积核的采样点以及采样点的权重与待处理帧进行卷积处理得到的。例如,针对待处理帧中的每个像素点,可以是由每个像素点与可变形卷积核的采样点以及采样点的权重进行加权求和来得到每个像素点对应的去噪像素值,从而实现了对待处理帧的去噪处理。
在本公开实施例中,视频序列中包含有待进行去噪处理的待处理帧。通过获取视频序列中待处理帧对应的卷积参数,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。这样,由于该卷积参数是通过提取视频连续帧的信息来得到的,能够有效减少视频中帧与帧之间运动所带来的图像模糊、细节损失与重影问题;而且采样点的权重还可以根据采样点位置的不同而变化,从而能够使得视频去噪效果更佳,提高了视频的成像质量。
为了得到可变形卷积核,在一些实施例中,参见图3,其示出了本公开实施例提供的另一种视频处理方法的流程示意图。如图3所示,在所述获取视频序列中待处理帧对应的卷积参数之前,即S101之前,该方法还可以包括:
S201:基于样本视频序列进行深度神经网络训练得到可变形卷积核。
需要说明的是,从视频序列中选择连续的多个视频帧作为样本视频序列,其中,样本视频序列不仅包含有样本参考帧,还包含有与样本参考帧相邻的至 少一个相邻帧。这里,至少一个相邻帧可以是该样本参考帧前向相邻的至少一个相邻帧,也可以是该样本参考帧后向相邻的至少一个相邻帧,还可以是该样本参考帧前向相邻以及后向相邻的多个相邻帧,本公开实施例不作具体限定。下面将以该样本参考帧前向相邻以及后向相邻的多个相邻帧作为样本视频序列为例进行描述,例如,假定样本参考帧为视频序列中的第0帧,与该样本参考帧相邻的至少一个相邻帧包括前向相邻的第-T帧、第-(T-1)帧、…、第-2帧、第-1帧和后向相邻的第1帧、第2帧、…、第(T-1)帧、第T帧,即该样本视频序列中总共有(2T+1)帧,且这些帧为连续帧。
在本公开实施例中,通过对样本视频序列进行深度神经网络训练可以得到可变形卷积核,而待处理帧中每个像素点可以与对应的可变形卷积核进行卷积运算处理,以实现对待处理帧进行去噪处理;与现有技术中的固定卷积核相比,本公开实施例采用可变形卷积核,可以使得待处理帧的视频处理达到更好的去噪效果。另外,由于本公开实施例是进行三维卷积运算,与其对应的可变形卷积核也是三维的;如无特别说明,本公开实施例中的可变形卷积核均是指三维可变形卷积核。
在一些实施例中,针对可变形卷积核的采样点以及采样点的权重,可以通过深度神经网络对样本视频序列中连续的多个视频帧进行坐标预测和权重预测,首先获取到可变形卷积核的坐标预测和权重预测;再根据坐标预测和权重预测来进一步得到可变形卷积核的采样点以及采样点的权重。
在一些实施例中,参见图4,其示出了本公开实施例提供的又一种视频处理方法的流程示意图。如图4所示,对于S201来说,所述基于样本视频序列进行深度神经网络训练得到可变形卷积核,该方法可以包括:
S201a:基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重;
需要说明的是,连续的多个视频帧包括样本参考帧以及其至少一个相邻帧。如果至少一个相邻帧包括前向相邻的T帧和后向相邻的T帧,那么连续的多个视频帧总共为(2T+1)帧。通过深度神经网络对这连续的多个视频帧(比如总共(2T+1)帧)进行深度学习,根据学习结果建立坐标预测网络和权重预测网络;然后由坐标预测网络进行坐标预测,可以得到可变形卷积核的预测坐标,而由权重预测网络进行权重预测,可以得到可变形卷积核的预测权重。这里,待处理帧可以是样本视频序列中的样本参考帧,以对其进行视频去噪处理。
示例性地,假定样本视频序列中每一帧的宽度用W表示,高度用H表示,可以得到待处理帧所包含的像素点个数为H×W个。由于可变形卷积核是三维的,而且可变形卷积核的大小是由N个采样点组成,那么待处理帧中所能够获取到的可变形卷积核的预测坐标个数为H×W×N×3个,而待处理帧中所能够获取到的可变形卷积核的预测权重个数为H×W×N个。
S201b:对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;
需要说明的是,在获取到可变形卷积核的预测坐标和可变形卷积核的预测权重之后,可以对可变形卷积核的预测坐标进行采样,从而能够得到可变形卷积核的采样点。
具体地,可以通过预设采样模型对可变形卷积核的预测坐标进行采样处理。在一些实施例中,参见图5,其示出了本公开实施例提供的再一种视频处理方法的流程示意图。如图5所示,对于S201b来说,所述对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点,该方法可以包括:
S201b-1:将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
需要说明的是,预设采样模型表示预先设置的对可变形卷积核的预测坐标进行采样处理的模型。在本公开实施例中,预设采样模型可以是指三线性采样器,也可以是指其他采样模型,本公开实施例不作具体限定。
基于预设采样模型,在获得所述可变形卷积核的采样点之后,所述方法还可以包括:
S201b-2:获取所述样本参考帧及所述至少一个相邻帧中的像素点;
需要说明的是,如果样本参考帧及所述至少一个相邻帧总共有(2T+1)帧,且每一帧的宽度用W表示,高度用H表示,那么可以获取到的像素点个数为H×W×(2T+1)个。
S201b-3:基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
需要说明的是,基于预设采样模型,可以将所有的像素点以及可变形卷积核的预测坐标输入到预设采样模型中,而预设采样模型的输出就是可变形卷积核的采样点以及采样点的采样值。这样,如果得到采样点个数为H×W×N个,那么对应的采样值个数也为H×W×N个。
示例性地,以三线性采样器为例,三线性采样器不仅可以根据可变形卷积核的预测坐标确定出可变形卷积核的采样点,还可以确定出与采样点对应的采样值。其中,以样本视频序列中的(2T+1)帧为例,该(2T+1)帧是由样本参考帧、与样本参考帧前向相邻的T个相邻帧以及与样本参考帧后向相邻的T个相邻帧组成的;该(2T+1)帧中所包含的像素点个数为H×W×(2T+1)个,将这些H×W×(2T+1)个像素点所对应的像素值和H×W×N×3个预测坐标共同输入到三线性采样器进行采样计算。例如,该三线性采样器的采样计算如式(1)所示,
Figure PCTCN2019114458-appb-000002
其中,
Figure PCTCN2019114458-appb-000003
表示像素点位置(y,x)处的第n个采样点的采样值,n为大于或等于1且小于或等于N的正整数,u (y,x,n),v (y,x,n),z (y,x,n)分别表示像素点位置(y,x)处的第n个采样点对应在三个维度(水平维度、垂直维度和时间维度)上的预测坐标,X(i,j,m)表示视频序列中第m帧像素点位置(i,j)处的像素值。
另外,对于可变形卷积核来说,可变形卷积核的预测坐标是变化的,它是在每个采样点的位置坐标(x n,y n,t n)处都增加了一个相对的偏移变量。具体地,u (y,x,n),v (y,x,n),z (y,x,n)可以分别用下式表示,
u (y,x,n)=x n+V(y,x,n,1)
v (y,x,n)=y n+V(y,x,n,2)         (2)
z (y,x,n)=t n+V(y,x,n,3)
其中,u (y,x,n)表示像素点位置(y,x)处的第n个采样点对应在水平维度上的预测坐标,V(y,x,n,1)表示像素点位置(y,x)处的第n个采样点对应在水平维度上的偏移变量;v (y,x,n)表示像素点位置(y,x)处的第n个采样点对应在垂直维度上的预测坐标,V(y,x,n,2)表示像素点位置(y,x)处的第n个采样点对应在垂直维度上的偏移变量;z (y,x,n)表示像素点位置(y,x)处的第n个采样点对应在时间维度上的预测坐标,V(y,x,n,3)表示像素点位置(y,x)处的第n个采样点对应在时间维度上的偏移变量。
在本公开实施例中,一方面可以确定出可变形卷积核的采样点,另一方面还可以得到每个采样点的采样值;由于可变形卷积核的预测坐标是可变化的,说明了每个采样点的位置并不是固定不变的,也就是说,本公开实施例中的可变形卷积核并非是固定的卷积核,而是可变形的卷积核。与现有技术中的固定卷积核相比,本公开实施例采用可变形卷积核,可以使得待处理帧的视频处理达到更好的去噪效果。
S201c:根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;
S201d:将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
需要说明的是,在得到可变形卷积核的采样点之后,还可以根据所获取到的可变形卷积核的预测坐标和可变形卷积核的预测权重,得到可变形卷积核的采样点的权重;从而也就获取到了待处理帧对应的卷积参数。需要注意的是,这里的预测坐标是指可变形卷积核的相对坐标值。
还需要说明的是,在本公开实施例中,假定样本视频序列中每一帧的宽度用W表示,高度用H表示,由于可变形卷积核是三维的,而且可变形卷积核 的大小是有N个采样点组成,那么待处理帧中所能够获取到的可变形卷积核的预测坐标个数为H×W×N×3个,而待处理帧中所能够获取到的可变形卷积核的预测权重个数为H×W×N个。在一些实施例中,可以得到可变形卷积核的采样点个数为H×W×N个,采样点的权重个数也为H×W×N个。
示例性地,仍以图2所示的深度卷积神经网络为例,假定每一个卷积层中所包含的可变形卷积核大小是相同的,比如可变形卷积核所包含的采样点个数为N个;通常来说,N可以取值为9,但是在实际应用中,还可以根据实际情况进行具体设定,本公开实施例不作具体限定。还需要注意的是,针对这N个采样点,在本公开实施例中,由于可变形卷积核的预测坐标是可变化的,说明了每个采样点的位置并不是固定不变的,根据V网络对每个采样点都会存在一个相对偏移量;进而表明了本公开实施例中的可变形卷积核并非是固定的卷积核,而是可变形的卷积核,使得本公开实施例可以适用于帧与帧之间具有较大运动的视频处理;另外,根据采样点的不同,结合F网络所得到的每个采样点的权重也是不同的;也就是说,本公开实施例不仅采用了可变形的卷积核,而且还采用了可变化的权重,与现有技术中的固定卷积核或者人为设置的权重相比,可以使得待处理帧的视频处理达到更好的去噪效果。
基于图2所示的深度卷积神经网络,该网络还可以采用编码器-解码器的设计结构;其中,在编码器的工作阶段,通过卷积神经网络可以进行4次下采样,而且每次下采样,对于输入的待处理帧H×W(H表示待处理帧的高度,W表示待处理帧的宽度),则可以得到输出H/2×W/2的视频帧,它主要是用于对待处理帧进行特征图像的提取;在解码器的工作阶段,通过卷积神经网络可以进行4次上采样,而每次上采样,对于输入的待处理帧H×W(H表示待处理帧的高度,W表示待处理帧的宽度),则可以得到输出2H×2W的视频帧,它主要是用于根据编码器提取的特征图像恢复出原尺寸大小的视频帧;这里,针对下采样或者上采样的次数,可以根据实际情况进行具体设定,本公开实施例不作具体限定。另外,从图2中还可以看出,部分卷积层的输出与输入之间具有连接关系,即跳跃连接(skip connection);比如第6层和第22层之间具有跳跃连接关系,第9层和第19层之间具有跳跃连接关系,第12层和第16层之间具有跳跃连接关系;这样还可以使得解码器阶段能够综合利用低阶和高阶的特征,以使得待处理帧的视频去噪效果更佳。
参见图6,其示出了本公开实施例提供的一种视频处理方法的总体架构示意图;如图6所示,X表示输入端,用于输入样本视频序列;其中,样本视频序列是从视频序列中选取的,该样本视频序列是由5个连续帧(比如包括样本参考帧、与样本参考帧前向相邻的2个相邻帧以及与样本参考帧后向相邻的2个相邻帧)组成;然后针对X输入的连续帧进行坐标预测和权重预测;针对坐标预测,可以建立坐标预测网络(用V网络表示),通过V网络可以得到可变 形卷积核的预测坐标;针对权重预测,可以建立权重预测网络(用F网络表示),通过F网络可以得到可变形卷积核的预测权重;然后将X输入的连续帧和预测得到的可变形卷积核的预测坐标全部输入到预设采样模型中,通过预设采样模型输出可变形卷积核的采样点(用
Figure PCTCN2019114458-appb-000004
表示);根据可变形卷积核的采样点以及可变形卷积核的预测权重,可以得到可变形卷积核的采样点的权重;最后针对待处理帧中每个像素点,将每个像素点与可变形卷积核的采样点以及采样点的权重进行卷积运算,得到待处理帧中每个像素点对应的去噪像素值,输出的结果即为去噪后的视频帧(用Y表示);通过视频序列中的连续帧信息,不仅实现了对待处理帧的去噪处理,而且由于可变形卷积核的采样点位置是变化的(即采用了可变形卷积核),同时每个采样点的权重也是可变化的,从而还可以使得视频去噪的效果更佳。
在S101之后,可以获取到可变形卷积核的采样点及采样点的权重;这样,根据可变形卷积核的采样点及采样点的权重对待处理帧进行去噪处理,从而能够得到去噪后的视频帧。
具体地,去噪后的视频帧可以是由可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理得到的。在一些实施例中,参见图7,其示出了本公开实施例提供的再一种视频处理方法的流程示意图。如图7所示,所述将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧,该方法可以包括:
S102a:针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;
需要说明的是,对于每个像素点对应的去噪像素值,可以是将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算得到的。具体地,在一些实施例中,S102a可以包括:
S102a-1:将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;
S102a-2:根据计算的结果,获得每个像素点对应的去噪像素值。
需要说明的是,对于每个像素点对应的去噪像素值,可以是对每个像素点进行可变形卷积核的采样点以及采样点的权重值的加权求和计算所得到的。具体地,针对待处理帧中的每个像素点,与该像素点进行卷积运算的可变形卷积核包含有N个采样点,首先对每个采样点的采样值以及每个采样点的权重进行加权计算,然后再对这N个采样点进行求和运算,最终结果即为待处理帧中每个像素点所对应的去噪像素值;具体地,参见式(3)所示,
Figure PCTCN2019114458-appb-000005
Figure PCTCN2019114458-appb-000006
表示像素点位置(y,x)处的第n个采样点的采样值,F(y,x,n)表示像素点位置(y,x)处的第n个采样点的权重值,n=1,2,...,N。
这样,利用上述的式(3),经过计算可以得到所述待处理帧中每个像素点对应的去噪像素值。在本公开实施例中,每个采样点的位置并不是固定不变的,而且每个采样点的权重也是不同的;也就是说,本公开实施例的去噪处理,不仅采用了可变形的卷积核,而且还采用了可变化的权重;与现有技术中的固定卷积核或者人为设置的权重相比,可以使得待处理帧的视频处理达到更好的去噪效果。
S102b:根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
需要说明的是,待处理帧中每个像素点可以与对应的可变形卷积核进行卷积运算处理,即,待处理帧中每个像素点可以与可变形卷积核的采样点及采样点的权重进行卷积运算处理,以得到每个像素点对应的去噪像素值;这样就实现了对待处理帧的去噪处理。
示例性地,假定预设采样模型为三线性采样器,图8示出了本公开实施例提供的一种视频处理方法的详细架构示意图。如图8所示,首先输入样本视频序列801,该样本视频序列801是由连续的多个视频帧(比如包括样本参考帧、与样本参考帧前向相邻的2个相邻帧以及与样本参考帧后向相邻的2个相邻帧)组成;然后基于深度神经网络对输入的样本视频序列801进行坐标预测和权重预测,比如可以建立坐标预测网络802和权重预测网络803;这样,可以根据坐标预测网络802进行坐标预测,获取可变形卷积核的预测坐标804;可以根据权重预测网络803进行权重预测,获取可变形卷积核的预测权重805;将输入的样本视频序列801和可变形卷积核的预测坐标804共同输入到三线性采样器806中,由三线性采样器806进行采样处理,而三线性采样器806的输出为可变形卷积核的采样点807;然后将可变形卷积核的采样点807以及可变形卷积核的预测权重805与待处理帧进行卷积运算808,最终输出去噪后的视频帧809。需要注意的是,在卷积运算808之前,还可以根据可变形卷积核的预测坐标804和可变形卷积核的预测权重805,得到可变形卷积核的采样点的权重;这样,对于卷积运算808来说,可以是对可变形卷积核的采样点以及采样点的权重与待处理帧进行卷积运算,以实现对待处理帧的去噪处理。
基于如图8所示的详细架构,通过深度神经网络对样本视频序列进行深度神经网络训练,可以得到可变形卷积核。另外,针对可变形卷积核的预测坐标和预测权重,由于预测坐标是变化的,说明了每个采样点的位置是变化的,进而说明了本公开实施例中的卷积核并非是固定的卷积核,而是可变形的卷积核,使得本公开实施例可以适用于帧与帧之间具有较大运动的视频处理;另外,根据采样点的不同,每个采样点的权重也是可以变化的;也就是说,本公开实施例不仅采用了可变形的卷积核,而且还采用了可变化的预测权重,可以使得待 处理帧的视频处理达到更好的去噪效果。
在本公开实施例中,通过采用可变形卷积核,不仅避免了视频连续帧中帧与帧之间运动所带来的图像模糊、细节损失与重影问题,而且还可以自适应的基于像素级信息分配不同的采样点去追踪视频连续帧中同一位置的移动情况,且通过利用多帧信息能够更好地弥补单帧信息的不足,还可以使得本公开实施例的方法能够应用到视频修复场景中。另外,可变形卷积核还可以看作是一种时序光流的高效提取器,充分利用了视频连续帧中的多帧信息,还能够将本公开实施例的方法应用到其它依赖于像素级信息的视频处理场景中;除此之外,在硬件质量有限或者低光条件下,基于本公开实施例的方法也能够达到高质量视频成像的目的。
上述实施例提供了一种视频处理方法,通过获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧;这样,由于该卷积参数是通过提取视频连续帧的信息来得到的,能够有效减少视频中帧与帧之间运动所带来的图像模糊、细节损失与重影问题;而且采样点的权重还可以根据采样点位置的不同而变化,从而能够使得视频去噪效果更佳,提高了视频的成像质量。
基于前述实施例相同的发明构思,参见图9,其示出了本公开实施例提供的一种视频处理装置90的组成,所述视频处理装置90可以包括:获取单元901和去噪单元902,其中,
所述获取单元901,配置为获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
所述去噪单元902,配置为根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
在上述方案中,参见图9,所述视频处理装置90还包括训练单元903,配置为基于样本视频序列进行深度神经网络训练得到可变形卷积核。
在上述方案中,参见图9,所述视频处理装置90还包括预测单元904和采样单元905,其中,
所述预测单元904,配置为基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;
所述采样单元905,配置为对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;
所述获取单元901,还配置为根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;以及将所述可变形卷积核的采样 点及所述采样点的权重,作为所述卷积参数。
在上述方案中,所述采样单元905,具体配置为将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
在上述方案中,所述获取单元901,还配置为获取所述样本参考帧及所述至少一个相邻帧中的像素点;
所述采样单元905,还配置为基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
在上述方案中,所述去噪单元902,具体配置为将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
在上述方案中,参见图9,所述视频处理装置90还包括卷积单元906,配置为针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;
所述去噪单元902,具体配置为根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
在上述方案中,所述卷积单元906,具体配置为将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;以及根据计算的结果,获得每个像素点对应的去噪像素值。
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
因此,本实施例提供了一种计算机存储介质,该计算机存储介质存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现前述实施例中所述方法的步骤。
基于上述视频处理装置90的组成以及计算机存储介质,参见图10,其示出了本公开实施例提供的视频处理装置90的具体硬件结构,可以包括:网络接口1001、存储器1002和处理器1003;各个组件通过总线系统1004耦合在一起。可理解,总线系统1004用于实现这些组件之间的连接通信。总线系统1004除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图10中将各种总线都标为总线系统1004。其中,网络接口1001,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;
存储器1002,配置为存储能够在处理器1003上运行的计算机程序;
处理器1003,配置为在运行所述计算机程序时,执行:
获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
本申请实施例提供一种计算机程序产品,其中,所述计算机程序产品存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现前述实施例中所述方法的步骤。
可以理解,本公开实施例中的存储器1002可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本文描述的系统和方法的存储器1002旨在包括但不限于这些和任意其它适合类型的存储器。
而处理器1003可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1003中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1003可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以 实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本公开实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1002,处理器1003读取存储器1002中的信息,结合其硬件完成上述方法的步骤。
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本公开所述功能的其它电子单元或其组合中。
对于软件实现,可通过执行本文所述功能的模块(例如过程、函数等)来实现本文所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
可选地,作为另一个实施例,处理器1003还配置为在运行所述计算机程序时,执行前述实施例中所述方法的步骤。
参见图11,其示出了本公开实施例提供的一种终端设备110的组成结构示意图;其中,所述终端设备110至少包括如前述实施例中所涉及的任意一种视频处理装置90。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。
上面结合附图对本公开的实施例进行了描述,但是本公开并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本公开的启示下,在不脱离本公开宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本公开的保护之内。

Claims (20)

  1. 一种视频处理方法,所述方法包括:
    获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
    根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
  2. 根据权利要求1所述的方法,其中,在所述获取视频序列中待处理帧对应的卷积参数之前,所述方法还包括:
    基于样本视频序列进行深度神经网络训练得到可变形卷积核。
  3. 根据权利要求2所述的方法,其中,所述基于样本视频序列进行深度神经网络训练得到可变形卷积核,包括:
    基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;
    对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;
    根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;
    将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
  4. 根据权利要求3所述的方法,其中,所述对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点,包括:
    将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
  5. 根据权利要求4所述的方法,其中,在所述获得所述可变形卷积核的采样点之后,所述方法还包括:
    获取所述样本参考帧及所述至少一个相邻帧中的像素点;
    基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
  6. 根据权利要求1至5任一项所述的方法,其中,所述根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧,包括:
    将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
  7. 根据权利要求6所述的方法,其中,所述将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧,包括:
    针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;
    根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
  8. 根据权利要求7所述的方法,其中,所述将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值,包括:
    将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;
    根据计算的结果,获得每个像素点对应的去噪像素值。
  9. 一种视频处理装置,所述视频处理装置包括获取单元和去噪单元,其中,
    所述获取单元,配置为获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;
    所述去噪单元,配置为根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
  10. 根据权利要求9所述的视频处理装置,其中,所述视频处理装置还包括训练单元,配置为基于样本视频序列进行深度神经网络训练得到可变形卷积核。
  11. 根据权利要求10所述的视频处理装置,其中,所述视频处理装置还包括预测单元和采样单元,其中,
    所述预测单元,配置为基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;
    所述采样单元,配置为对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;
    所述获取单元,还配置为根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;以及将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
  12. 根据权利要求11所述的视频处理装置,其中,所述采样单元,具体配置为将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
  13. 根据权利要求12所述的视频处理装置,其中,所述获取单元,还配置为获取所述样本参考帧及所述至少一个相邻帧中的像素点;
    所述采样单元,还配置为基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
  14. 根据权利要求9至13任一项所述的视频处理装置,其中,所述去噪单元,具体配置为将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
  15. 根据权利要求14所述的视频处理装置,其中,所述视频处理装置还包括卷积单元,配置为针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;
    所述去噪单元,具体配置为根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
  16. 根据权利要求15所述的视频处理装置,其中,所述卷积单元,具体配置为将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;以及根据计算的结果,获得每个像素点对应的去噪像素值。
  17. 一种视频处理装置,其中,所述视频处理装置包括:存储器和处理器;其中,
    所述存储器,配置为于存储能够在所述处理器上运行的计算机程序;
    所述处理器,配置为在运行所述计算机程序时,执行如权利要求1至8任一项所述方法的步骤。
  18. 一种计算机存储介质,其中,所述计算机存储介质存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现如权利要求1至8任一项所述方法的步骤。
  19. 一种终端设备,其中,所述终端设备至少包括如权利要求9至17任一项所述的视频处理装置。
  20. 一种计算机程序产品,其中,所述计算机程序产品存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现如权利要求1至8任一项所述方法的步骤。
PCT/CN2019/114458 2019-03-19 2019-10-30 视频处理方法、装置以及计算机存储介质 WO2020186765A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11202108771RA SG11202108771RA (en) 2019-03-19 2019-10-30 Video processing method and apparatus, and computer storage medium
JP2020573289A JP7086235B2 (ja) 2019-03-19 2019-10-30 ビデオ処理方法、装置及びコンピュータ記憶媒体
US17/362,883 US20210327033A1 (en) 2019-03-19 2021-06-29 Video processing method and apparatus, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910210075.5A CN109862208B (zh) 2019-03-19 2019-03-19 视频处理方法、装置、计算机存储介质以及终端设备
CN201910210075.5 2019-03-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/362,883 Continuation US20210327033A1 (en) 2019-03-19 2021-06-29 Video processing method and apparatus, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2020186765A1 true WO2020186765A1 (zh) 2020-09-24

Family

ID=66901319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114458 WO2020186765A1 (zh) 2019-03-19 2019-10-30 视频处理方法、装置以及计算机存储介质

Country Status (6)

Country Link
US (1) US20210327033A1 (zh)
JP (1) JP7086235B2 (zh)
CN (1) CN109862208B (zh)
SG (1) SG11202108771RA (zh)
TW (1) TWI714397B (zh)
WO (1) WO2020186765A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744156A (zh) * 2021-09-06 2021-12-03 中南大学 一种基于可变形卷积神经网络的图像去噪方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109862208B (zh) * 2019-03-19 2021-07-02 深圳市商汤科技有限公司 视频处理方法、装置、计算机存储介质以及终端设备
CN112580675A (zh) * 2019-09-29 2021-03-30 北京地平线机器人技术研发有限公司 图像处理方法及装置、计算机可读存储介质
CN113727141B (zh) * 2020-05-20 2023-05-12 富士通株式会社 视频帧的插值装置以及方法
CN113936163A (zh) * 2020-07-14 2022-01-14 武汉Tcl集团工业研究院有限公司 一种图像处理方法、终端以及存储介质
US11689713B2 (en) * 2020-07-15 2023-06-27 Tencent America LLC Predicted frame generation by deformable convolution for video coding
CN114640796B (zh) * 2022-03-24 2024-02-09 北京字跳网络技术有限公司 视频处理方法、装置、电子设备及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358069A1 (en) * 2015-06-03 2016-12-08 Samsung Electronics Co., Ltd. Neural network suppression
CN106408522A (zh) * 2016-06-27 2017-02-15 深圳市未来媒体技术研究院 一种基于卷积对神经网络的图像去噪方法
CN107292319A (zh) * 2017-08-04 2017-10-24 广东工业大学 一种基于可变形卷积层的特征图像提取的方法及装置
CN107516304A (zh) * 2017-09-07 2017-12-26 广东工业大学 一种图像去噪方法及装置
CN107609638A (zh) * 2017-10-12 2018-01-19 湖北工业大学 一种基于线性解码器和插值采样优化卷积神经网络的方法
CN107609519A (zh) * 2017-09-15 2018-01-19 维沃移动通信有限公司 一种人脸特征点的定位方法及装置
CN107909113A (zh) * 2017-11-29 2018-04-13 北京小米移动软件有限公司 交通事故图像处理方法、装置及存储介质
CN109862208A (zh) * 2019-03-19 2019-06-07 深圳市商汤科技有限公司 视频处理方法、装置以及计算机存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9786036B2 (en) * 2015-04-28 2017-10-10 Qualcomm Incorporated Reducing image resolution in deep convolutional networks
US10043243B2 (en) * 2016-01-22 2018-08-07 Siemens Healthcare Gmbh Deep unfolding algorithm for efficient image denoising under varying noise conditions
CN106296692A (zh) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 基于对抗网络的图像显著性检测方法
CN107103590B (zh) * 2017-03-22 2019-10-18 华南理工大学 一种基于深度卷积对抗生成网络的图像反射去除方法
US10409888B2 (en) * 2017-06-02 2019-09-10 Mitsubishi Electric Research Laboratories, Inc. Online convolutional dictionary learning
CN107495959A (zh) * 2017-07-27 2017-12-22 大连大学 一种基于一维卷积神经网络的心电信号分类方法
WO2019019199A1 (en) * 2017-07-28 2019-01-31 Shenzhen United Imaging Healthcare Co., Ltd. SYSTEM AND METHOD FOR IMAGE CONVERSION
CN107689034B (zh) * 2017-08-16 2020-12-01 清华-伯克利深圳学院筹备办公室 一种去噪方法及装置
CN109074633B (zh) * 2017-10-18 2020-05-12 深圳市大疆创新科技有限公司 视频处理方法、设备、无人机及计算机可读存储介质
CN107886162A (zh) * 2017-11-14 2018-04-06 华南理工大学 一种基于wgan模型的可变形卷积核方法
CN108197580B (zh) * 2018-01-09 2019-07-23 吉林大学 一种基于3d卷积神经网络的手势识别方法
CN108805265B (zh) * 2018-05-21 2021-03-30 Oppo广东移动通信有限公司 神经网络模型处理方法和装置、图像处理方法、移动终端

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358069A1 (en) * 2015-06-03 2016-12-08 Samsung Electronics Co., Ltd. Neural network suppression
CN106408522A (zh) * 2016-06-27 2017-02-15 深圳市未来媒体技术研究院 一种基于卷积对神经网络的图像去噪方法
CN107292319A (zh) * 2017-08-04 2017-10-24 广东工业大学 一种基于可变形卷积层的特征图像提取的方法及装置
CN107516304A (zh) * 2017-09-07 2017-12-26 广东工业大学 一种图像去噪方法及装置
CN107609519A (zh) * 2017-09-15 2018-01-19 维沃移动通信有限公司 一种人脸特征点的定位方法及装置
CN107609638A (zh) * 2017-10-12 2018-01-19 湖北工业大学 一种基于线性解码器和插值采样优化卷积神经网络的方法
CN107909113A (zh) * 2017-11-29 2018-04-13 北京小米移动软件有限公司 交通事故图像处理方法、装置及存储介质
CN109862208A (zh) * 2019-03-19 2019-06-07 深圳市商汤科技有限公司 视频处理方法、装置以及计算机存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744156A (zh) * 2021-09-06 2021-12-03 中南大学 一种基于可变形卷积神经网络的图像去噪方法

Also Published As

Publication number Publication date
TWI714397B (zh) 2020-12-21
JP7086235B2 (ja) 2022-06-17
CN109862208A (zh) 2019-06-07
TW202037145A (zh) 2020-10-01
JP2021530770A (ja) 2021-11-11
US20210327033A1 (en) 2021-10-21
CN109862208B (zh) 2021-07-02
SG11202108771RA (en) 2021-09-29

Similar Documents

Publication Publication Date Title
WO2020186765A1 (zh) 视频处理方法、装置以及计算机存储介质
CN108629743B (zh) 图像的处理方法、装置、存储介质和电子装置
CN111275626B (zh) 一种基于模糊度的视频去模糊方法、装置及设备
US9615039B2 (en) Systems and methods for reducing noise in video streams
US20210352212A1 (en) Video image processing method and apparatus
US9007402B2 (en) Image processing for introducing blurring effects to an image
EP2164040B1 (en) System and method for high quality image and video upscaling
WO2021238500A1 (zh) 全景视频插帧方法、装置及对应的存储介质
WO2021189733A1 (zh) 图像处理方法及装置、电子设备、存储介质
CN106780336B (zh) 一种图像缩小方法及装置
CN112602088B (zh) 提高弱光图像的质量的方法、系统和计算机可读介质
CN111935425B (zh) 视频降噪方法、装置、电子设备及计算机可读介质
CN114073071A (zh) 视频插帧方法及装置、计算机可读存储介质
CN110958363B (zh) 图像处理方法及装置、计算机可读介质和电子设备
CN109544490B (zh) 图像增强方法、装置和计算机可读存储介质
CN114390188B (zh) 一种图像处理方法和电子设备
CN113409188A (zh) 一种图像背景替换方法、系统、电子设备及存储介质
CN113596576A (zh) 一种视频超分辨率的方法及装置
WO2020187042A1 (zh) 图像处理方法、装置、设备以及计算机可读介质
US20230098437A1 (en) Reference-Based Super-Resolution for Image and Video Enhancement
US20220321830A1 (en) Optimization of adaptive convolutions for video frame interpolation
US11195247B1 (en) Camera motion aware local tone mapping
CN114463213A (zh) 视频处理方法、视频处理装置、终端及存储介质
CN111738958B (zh) 图片修复方法、装置、电子设备及计算机可读介质
WO2024130715A1 (zh) 视频处理方法、视频处理装置和可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19920051

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020573289

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/01/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19920051

Country of ref document: EP

Kind code of ref document: A1