WO2020186765A1 - 视频处理方法、装置以及计算机存储介质 - Google Patents
视频处理方法、装置以及计算机存储介质 Download PDFInfo
- Publication number
- WO2020186765A1 WO2020186765A1 PCT/CN2019/114458 CN2019114458W WO2020186765A1 WO 2020186765 A1 WO2020186765 A1 WO 2020186765A1 CN 2019114458 W CN2019114458 W CN 2019114458W WO 2020186765 A1 WO2020186765 A1 WO 2020186765A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- convolution kernel
- sampling
- frame
- deformable convolution
- video
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000005070 sampling Methods 0.000 claims abstract description 279
- 238000012545 processing Methods 0.000 claims abstract description 80
- 238000000034 method Methods 0.000 claims abstract description 55
- 238000013528 artificial neural network Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 13
- 238000010586 diagram Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 9
- 230000033001 locomotion Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000013461 design Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to the field of computer vision technology, and in particular to a video processing method, device, and computer storage medium.
- the mixed noises reduce the visual quality of the video.
- the video obtained under a small camera aperture and low-light scenes often contains noise, but noisy video also contains a lot of information.
- the noise in the video will make this information uncertain and seriously affect viewing The viewer’s visual experience. Therefore, the denoising of video has important research significance and has become an important research topic of computer vision.
- the embodiments of the present disclosure are to provide a video processing method, device, and computer storage medium.
- embodiments of the present disclosure provide a video processing method, the method including:
- the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point
- the method before the obtaining the convolution parameter corresponding to the frame to be processed in the video sequence, the method further includes:
- deep neural network training is performed to obtain a deformable convolution kernel.
- the deep neural network training based on the sample video sequence to obtain the deformable convolution kernel includes:
- Coordinate prediction and weight prediction are respectively performed on multiple consecutive video frames in the sample video sequence based on a deep neural network to obtain the predicted coordinates and predicted weights of the deformable convolution kernel, wherein the multiple consecutive video frames Including a sample reference frame and at least one adjacent frame;
- sampling points of the deformable convolution kernel and the weights of the sampling points are used as the convolution parameters.
- the sampling the predicted coordinates of the deformable convolution kernel to obtain the sampling points of the deformable convolution kernel includes:
- the predicted coordinates of the deformable convolution kernel are input into a preset sampling model to obtain sampling points of the deformable convolution kernel.
- the method further includes:
- the pixel points and the predicted coordinates of the deformable convolution kernel are sampled and calculated through a preset sampling model, and the sampling value of the sampling point is determined according to the calculation result.
- the performing denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame includes:
- the convolution processing of the sampling points of the deformable convolution kernel and the weight of the sampling points with the frame to be processed to obtain the denoised video frame includes:
- the denoised video frame is obtained.
- the convolution operation of each pixel with the sampling point of the deformable convolution kernel and the weight of the sampling point to obtain the denoising pixel value corresponding to each pixel includes:
- the denoising pixel value corresponding to each pixel is obtained.
- embodiments of the present disclosure provide a video processing device, the video processing device includes an acquisition unit and a denoising unit, wherein:
- the obtaining unit is configured to obtain a convolution parameter corresponding to a frame to be processed in a video sequence, wherein the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point;
- the denoising unit is configured to perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame.
- the video processing device further includes a training unit configured to perform deep neural network training based on the sample video sequence to obtain a deformable convolution kernel.
- the video processing device further includes a prediction unit and a sampling unit, wherein:
- the prediction unit is configured to perform coordinate prediction and weight prediction on consecutive multiple video frames in the sample video sequence based on a deep neural network to obtain the prediction coordinates and prediction weights of the deformable convolution kernel, wherein
- the multiple consecutive video frames include a sample reference frame and at least one adjacent frame;
- the sampling unit is configured to sample the predicted coordinates of the deformable convolution kernel to obtain sampling points of the deformable convolution kernel;
- the acquiring unit is further configured to obtain the weight of the sampling point of the deformable convolution kernel according to the predicted coordinates and the predicted weight of the deformable convolution kernel; and combine the sampling points of the deformable convolution kernel and The weight of the sampling point is used as the convolution parameter.
- the sampling unit is specifically configured to input the predicted coordinates of the deformable convolution kernel into a preset sampling model to obtain sampling points of the deformable convolution kernel.
- the acquiring unit is further configured to acquire pixels in the sample reference frame and the at least one adjacent frame;
- the sampling unit is further configured to perform sampling calculations on the pixel points and the predicted coordinates of the deformable convolution kernel based on the sampling points of the deformable convolution kernel through a preset sampling model, and determine according to the calculation result The sampling value of the sampling point.
- the denoising unit is specifically configured to perform convolution processing on the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed to obtain the denoised video frame .
- the video processing device further includes a convolution unit configured to combine each pixel with the sampling point of the deformable convolution kernel and the sampling point of the deformable convolution kernel for each pixel in the frame to be processed The weight of the sampling point is convolved to obtain the denoising pixel value corresponding to each pixel;
- the denoising unit is specifically configured to obtain a denoised video frame according to the denoising pixel value corresponding to each pixel.
- the convolution unit is specifically configured to perform a weighted sum calculation for each pixel, the sampling point of the deformable convolution kernel, and the weight of the sampling point; and according to the calculation result, obtain The denoising pixel value corresponding to each pixel.
- embodiments of the present disclosure provide a video processing device, the video processing device includes: a memory and a processor; wherein,
- the memory is configured to store a computer program that can run on the processor
- the processor is configured to execute the steps of the method according to any one of the first aspects when running the computer program.
- an embodiment of the present disclosure provides a computer storage medium, the computer storage medium stores a video processing program, and when the video processing program is executed by at least one processor, the implementation is as described in any one of the first aspect. Method steps.
- embodiments of the present disclosure provide a terminal device, wherein the terminal device at least includes the video processing apparatus according to any one of the second aspect or the third aspect.
- a computer program product according to an embodiment of the present disclosure, wherein the computer program product stores a video processing program, and when the video processing program is executed by at least one processor, the implementation is as described in any one of the first aspect Method steps.
- the convolution parameters corresponding to the frames to be processed in the video sequence are first obtained, where the convolution parameters include the sampling points of the deformable convolution kernel and The weight of the sampling point; since the convolution parameter is obtained by extracting the information of consecutive frames of the video, it can effectively reduce the image blur, detail loss and ghosting problems caused by the motion between frames in the video; and Perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain the denoised video frame; in this way, since the weight of the sampling points can be based on the position of the sampling points Different and change, which can make the video denoising effect better and improve the image quality of the video.
- FIG. 1 is a schematic flowchart of a video processing method provided by an embodiment of the disclosure
- FIG. 2 is a schematic structural diagram of a deep convolutional neural network provided by an embodiment of the disclosure
- FIG. 3 is a schematic flowchart of another video processing method provided by an embodiment of the disclosure.
- FIG. 4 is a schematic flowchart of another video processing method provided by an embodiment of the disclosure.
- FIG. 5 is a schematic flowchart of still another video processing method provided by an embodiment of the disclosure.
- FIG. 6 is a schematic diagram of the overall architecture of a video processing method provided by an embodiment of the disclosure.
- FIG. 7 is a schematic flowchart of still another video processing method provided by an embodiment of the disclosure.
- FIG. 8 is a schematic diagram of a detailed architecture of a video processing method provided by an embodiment of the disclosure.
- FIG. 9 is a schematic diagram of the composition structure of a video processing device provided by an embodiment of the disclosure.
- FIG. 10 is a schematic diagram of a specific hardware structure of a video processing device provided by an embodiment of the disclosure.
- FIG. 11 is a schematic diagram of the composition structure of a terminal device provided by an embodiment of the disclosure.
- the embodiments of the present disclosure provide a video processing method, which is applied to a video processing device, and the device can be set in such as a smart phone, a tablet computer, a notebook computer, a handheld computer, or a personal digital assistant (PDA).
- PDA personal digital assistant
- PMP Portable Media Player
- wearable devices wearable devices
- navigation devices navigation devices and other mobile terminal equipment
- fixed terminal equipment such as digital TVs, desktop computers, etc.
- FIG. 1 shows a schematic flowchart of a video processing method provided by an embodiment of the present disclosure.
- the method may include:
- S101 Obtain a convolution parameter corresponding to a frame to be processed in a video sequence, where the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point;
- the video sequence is captured by cameras, smart phones, tablets, and many other terminal devices.
- small cameras and terminal devices such as smart phones and tablet computers are usually equipped with smaller-sized image sensors and less than ideal optical devices.
- the denoising processing of video frames is particularly important for these devices.
- High-end cameras and camcorders are usually equipped with larger image sensors and better optics.
- the video frames captured by these devices have good imaging quality under normal lighting conditions; however, they are captured in low-light scenes.
- the video frame also often contains a lot of noise. At this time, the denoising processing of the video frame is still required.
- video sequences can be obtained through the collection of cameras, smart phones, tablet computers and many other terminal devices.
- the video sequence contains frames to be processed for denoising processing.
- the deformable convolution kernel can be obtained; then the sampling points of the deformable convolution kernel and the weight of the sampling points are obtained, and the As the convolution parameter of the frame to be processed.
- Deep Convolutional Neural Networks (Deep Convolutional Neural Networks, Deep CNN) is a type of feedforward neural network that includes convolution operations and has a deep structure, and is one of the representative algorithms of deep neural networks for deep learning.
- FIG. 2 shows a schematic structural diagram of a deep convolutional neural network provided by an embodiment of the present disclosure.
- the structure of the deep convolutional neural network includes a convolutional layer, a pooling layer, and a bilinear upsampling layer; among them, the unfilled layer is the convolutional layer, and the black filled layer is the pooling layer.
- the gray-filled layer is a bilinear up-sampling layer; the number of channels corresponding to each layer (ie, the number of deformable convolution kernels included in each convolution layer) is shown in Table 1.
- the number of channels of the first 25-layer coordinate prediction network (represented by the V network) and the weight prediction network (represented by the F network) are the same, indicating that the V network and the F network can share the first 25 layers Feature information, so that the amount of calculation on the network can be reduced through the sharing of feature information.
- the F network can be used to obtain the prediction weights of the deformable convolution kernel through a sample video sequence (ie, multiple consecutive video frames). Obtain the predicted coordinates of the deformable convolution kernel.
- the sampling points of the deformable convolution kernel can be obtained; according to the predicted weight of the deformable convolution kernel and the predicted coordinates of the deformable convolution kernel, The weight of the sampling points of the deformable convolution kernel can be obtained, and then the convolution parameters can be obtained.
- S102 Perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame.
- the convolution operation can be performed on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points, and the result of the convolution operation is Video frame after denoising.
- the denoising process is performed on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain the denoised For video frames, the method may include:
- sampling points of the deformable convolution kernel and the weights of the sampling points are convolved with the frame to be processed to obtain the denoised video frame.
- the denoising processing for the frame to be processed may be obtained by convolution processing the sampling points of the deformable convolution kernel and the weight of the sampling points with the frame to be processed.
- the denoising processing for the frame to be processed may be obtained by convolution processing the sampling points of the deformable convolution kernel and the weight of the sampling points with the frame to be processed.
- it can be a weighted summation of each pixel, the sampling point of the deformable convolution kernel and the weight of the sampling point to obtain the denoising pixel value corresponding to each pixel , So as to achieve the denoising processing of the frame to be processed.
- the video sequence contains frames to be processed for denoising processing.
- the convolution parameters include the sampling points of the deformable convolution kernel and the weight of the sampling points; according to the sampling points of the deformable convolution kernel and the The weight of the sampling point performs denoising processing on the frame to be processed to obtain a denoised video frame.
- the convolution parameter is obtained by extracting the information of continuous frames of the video, it can effectively reduce the image blur, detail loss and ghosting problems caused by the motion between frames in the video; and the weight of the sampling point is also It can be changed according to the position of the sampling point, which can make the video denoising effect better and improve the imaging quality of the video.
- FIG. 3 shows a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
- the method may further include:
- S201 Perform deep neural network training based on the sample video sequence to obtain a deformable convolution kernel.
- multiple consecutive video frames are selected from the video sequence as the sample video sequence, where the sample video sequence not only includes the sample reference frame, but also includes at least one adjacent frame adjacent to the sample reference frame.
- the at least one adjacent frame may be at least one adjacent frame that is adjacent to the sample reference frame in the forward direction, or at least one adjacent frame that is adjacent in the backward direction of the sample reference frame, or it may be before the sample reference frame.
- the multiple adjacent frames that are adjacent to each other and adjacent to the rear are not specifically limited in the embodiment of the present disclosure. The following will take the sample reference frame forward adjacent and backward adjacent multiple adjacent frames as a sample video sequence as an example for description.
- the sample reference frame is the 0th frame in the video sequence
- the sample reference The at least one adjacent frame adjacent to the frame includes the forward adjacent -T frame, the -(T-1) frame, ..., the -2 frame, the -1 frame, and the backward adjacent frame 1,
- the deformable convolution kernel can be obtained by performing deep neural network training on the sample video sequence, and each pixel in the frame to be processed can be subjected to convolution processing with the corresponding deformable convolution kernel.
- the denoising processing of the frame to be processed is achieved; compared with the fixed convolution kernel in the prior art, the embodiment of the present disclosure adopts a deformable convolution kernel, which can make the video processing of the frame to be processed achieve a better denoising effect.
- the corresponding deformable convolution kernel is also three-dimensional; unless otherwise specified, the deformable convolution kernel in the embodiment of the present disclosure refers to a three-dimensional deformable convolution nuclear.
- a deep neural network can be used to perform coordinate prediction and weight prediction on consecutive multiple video frames in the sample video sequence, and first obtain the deformable volume The coordinate prediction and weight prediction of the product kernel; and then according to the coordinate prediction and weight prediction, the sampling points of the deformable convolution kernel and the weight of the sampling points are further obtained.
- FIG. 4 shows a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
- the method may include:
- S201a Perform coordinate prediction and weight prediction on multiple consecutive video frames in the sample video sequence based on the deep neural network, to obtain the prediction coordinates and prediction weights of the deformable convolution kernel;
- the consecutive multiple video frames include a sample reference frame and at least one adjacent frame thereof. If at least one adjacent frame includes a T frame that is adjacent in the forward direction and a T frame that is adjacent in the backward direction, the number of consecutive video frames is (2T+1) frames in total.
- the predicted coordinates of the deformable convolution kernel are predicted by the weight prediction network to obtain the predicted weight of the deformable convolution kernel.
- the frame to be processed may be a sample reference frame in a sample video sequence for video denoising processing.
- the number of pixels contained in the frame to be processed can be obtained as H ⁇ W. Since the deformable convolution kernel is three-dimensional, and the size of the deformable convolution kernel is composed of N sampling points, the number of predicted coordinates of the deformable convolution kernel that can be obtained in the frame to be processed is H ⁇ W ⁇ N ⁇ 3, and the number of predictive weights of the deformable convolution kernel that can be obtained in the frame to be processed is H ⁇ W ⁇ N.
- S201b Sampling the predicted coordinates of the deformable convolution kernel to obtain sampling points of the deformable convolution kernel
- the predicted coordinates of the deformable convolution kernel can be sampled, so as to obtain the sampling of the deformable convolution kernel point.
- the predicted coordinates of the deformable convolution kernel can be sampled through a preset sampling model.
- FIG. 5 shows a schematic flowchart of still another video processing method provided by an embodiment of the present disclosure.
- the method for sampling the predicted coordinates of the deformable convolution kernel to obtain the sampling points of the deformable convolution kernel may include:
- S201b-1 Input the predicted coordinates of the deformable convolution kernel into a preset sampling model to obtain sampling points of the deformable convolution kernel.
- the preset sampling model means a preset model for sampling the predicted coordinates of the deformable convolution kernel.
- the preset sampling model may refer to a trilinear sampler, or may refer to other sampling models, which is not specifically limited in the embodiment of the present disclosure.
- the method may further include:
- S201b-2 Acquire pixels in the sample reference frame and the at least one adjacent frame
- sample reference frame and the at least one adjacent frame have a total of (2T+1) frames, and the width of each frame is represented by W and the height is represented by H, then the number of pixels that can be obtained It is H ⁇ W ⁇ (2T+1).
- S201b-3 Based on the sampling points of the deformable convolution kernel, perform sampling calculations on the pixel points and the predicted coordinates of the deformable convolution kernel through a preset sampling model, and determine the sampling points according to the calculation results The sampled value.
- all pixels and the predicted coordinates of the deformable convolution kernel can be input into the preset sampling model, and the output of the preset sampling model is the sampling point of the deformable convolution kernel And the sampling value of the sampling point.
- the number of sampling points is H ⁇ W ⁇ N
- the number of corresponding sampling values is also H ⁇ W ⁇ N.
- the tri-linear sampler can not only determine the sampling point of the deformable convolution kernel according to the predicted coordinates of the deformable convolution kernel, but also determine the sampling value corresponding to the sampling point.
- the (2T+1) frame in the sample video sequence is composed of a sample reference frame, T adjacent frames forwardly adjacent to the sample reference frame, and the following It is composed of T adjacent frames; the number of pixels contained in the (2T+1) frame is H ⁇ W ⁇ (2T+1), and these H ⁇ W ⁇ (2T+1)
- the pixel value corresponding to each pixel and the H ⁇ W ⁇ N ⁇ 3 predicted coordinates are input to the trilinear sampler for sampling calculation.
- the sampling calculation of the three linear sampler is shown in equation (1),
- n is a positive integer greater than or equal to 1 and less than or equal to N
- u (y, x, n) , v (y, x, n) , z (y, x, n) respectively represent the predicted coordinates of the nth sampling point at the pixel position (y, x) in the three dimensions (horizontal, vertical, and time)
- X( i, j, m) represents the pixel value at the pixel point (i, j) of the m-th frame in the video sequence.
- the predictive coordinates of the deformable convolution kernel are changed. It adds a relative value at the position coordinates (x n ,y n, t n ) of each sampling point Offset variable.
- u (y, x, n) , v (y, x, n) , z (y, x, n) can be expressed by the following formulas,
- u (y, x, n) represents the pixel point position (y, x) at the nth sampling point corresponding to the predicted coordinates in the horizontal dimension
- V(y, x, n, 1) represents the pixel point position
- the nth sampling point at y, x) corresponds to the offset variable in the horizontal dimension
- v (y, x, n) indicates that the nth sampling point at the pixel point position (y, x) corresponds to the vertical dimension
- the predicted coordinates of, V(y,x,n,2) represents the offset variable in the vertical dimension corresponding to the nth sampling point at the pixel position (y,x)
- z (y,x,n) represents the pixel
- the nth sampling point at the point location (y, x) corresponds to the predicted coordinates in the time dimension
- V(y, x, n, 3) represents the nth sampling point at the pixel point location (y, x)
- the sampling points of the deformable convolution kernel can be determined on the one hand, and the sampling value of each sampling point can also be obtained on the other hand; since the predicted coordinates of the deformable convolution kernel can be changed, it is explained
- the position of each sampling point is not fixed, that is, the deformable convolution kernel in the embodiment of the present disclosure is not a fixed convolution kernel, but a deformable convolution kernel.
- the embodiment of the present disclosure adopts a deformable convolution kernel, which can make the video processing of the frame to be processed achieve a better denoising effect.
- S201d Use the sampling points of the deformable convolution kernel and the weight of the sampling points as the convolution parameters.
- the samples of the deformable convolution kernel can be obtained according to the obtained prediction coordinates of the deformable convolution kernel and the prediction weight of the deformable convolution kernel.
- the weight of the point; thus, the convolution parameter corresponding to the frame to be processed is obtained.
- the predicted coordinates here refer to the relative coordinates of the deformable convolution kernel.
- the width of each frame in the sample video sequence is represented by W and the height is represented by H.
- the deformable convolution kernel is three-dimensional, and the size of the deformable convolution kernel Is composed of N sampling points, then the number of predictive coordinates of the deformable convolution kernel that can be obtained in the frame to be processed is H ⁇ W ⁇ N ⁇ 3, and the deformable frame that can be obtained in the frame to be processed
- the number of prediction weights of the convolution kernel is H ⁇ W ⁇ N.
- the number of sampling points of the deformable convolution kernel is H ⁇ W ⁇ N, and the number of weights of the sampling points is also H ⁇ W ⁇ N.
- N the number of the deformable convolution kernel contained in each convolutional layer
- N the sampling points contained in the deformable convolution kernel
- the number is N; generally speaking, N can take a value of 9, but in practical applications, it can also be specifically set according to actual conditions, and the embodiment of the present disclosure does not specifically limit it. It should also be noted that for these N sampling points, in the embodiment of the present disclosure, since the predicted coordinates of the deformable convolution kernel are changeable, it indicates that the position of each sampling point is not fixed.
- the V network has a relative offset for each sampling point; it further shows that the deformable convolution kernel in the embodiment of the present disclosure is not a fixed convolution kernel, but a deformable convolution kernel, so that the implementation of the present disclosure
- the example can be applied to video processing with large motion between frames; in addition, according to different sampling points, the weight of each sampling point obtained by combining the F network is also different; that is, the embodiments of the present disclosure are not only A deformable convolution kernel and a variable weight are used.
- the video processing of the frame to be processed can achieve better denoising effect.
- the network can also adopt an encoder-decoder design structure; among them, in the working stage of the encoder, the convolutional neural network can perform 4 downsampling, and each time Downsampling, for the input frame to be processed H ⁇ W (H represents the height of the frame to be processed, W represents the width of the frame to be processed), the output video frame of H/2 ⁇ W/2 can be obtained, which is mainly used for The feature image is extracted for the frame to be processed; in the working stage of the decoder, the convolutional neural network can be up-sampled 4 times, and each time up-sampling, for the input frame to be processed H ⁇ W (H represents the frame to be processed Height, W represents the width of the frame to be processed), the output 2H ⁇ 2W video frame can be obtained, which is mainly used to restore the original size video frame according to the feature image extracted by the encoder; here, for down-sampling or up-sampling The number of sampling times can be specifically set according to actual
- connection relationship that is, a skip connection; for example, there is a skip connection between the 6th layer and the 22nd layer.
- a skip connection relationship between the layer and the 19th layer, and the 12th layer and the 16th layer have a skip connection relationship; this can also enable the decoder stage to comprehensively use the low-order and high-order features to make the frame to be processed Video denoising effect is better.
- X represents an input terminal for inputting a sample video sequence; wherein, the sample video sequence is from video Selected in the sequence, the sample video sequence is composed of 5 consecutive frames (for example, including the sample reference frame, the two adjacent frames forwardly adjacent to the sample reference frame, and the two adjacent frames backward adjacent to the sample reference frame Frame) composition; then coordinate prediction and weight prediction are performed for the continuous frames input by X; for coordinate prediction, a coordinate prediction network (represented by V network) can be established, and the predicted coordinates of the deformable convolution kernel can be obtained through the V network; for weight For prediction, a weight prediction network (represented by F network) can be established, and the prediction weight of the deformable convolution kernel can be obtained through the F network; then the continuous frames input by X and the predicted coordinates of the deformable convolution kernel obtained by prediction are all input to In the preset sampling model, the sampling points of the de
- the output result is the denoised video frame (using Y Representation); through the continuous frame information in the video sequence, not only the denoising processing of the frame to be processed is realized, but also because the sampling point position of the deformable convolution kernel is changed (that is, the deformable convolution kernel is used), and each The weight of each sampling point is also changeable, which can also make the video denoising effect better.
- the sampling points of the deformable convolution kernel and the weight of the sampling points can be obtained; in this way, the frame to be processed is denoised according to the sampling points of the deformable convolution kernel and the weight of the sampling points, so as to obtain denoising After the video frame.
- the denoised video frame may be obtained by convolution processing the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed.
- FIG. 7 shows a schematic flowchart of still another video processing method provided by an embodiment of the present disclosure.
- the method for performing convolution processing on the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed to obtain the denoised video frame may include:
- S102a For each pixel in the frame to be processed, perform a convolution operation on each pixel, the sample point of the deformable convolution kernel and the weight of the sample point, to obtain the corresponding pixel point Denoising pixel value;
- the denoising pixel value corresponding to each pixel may be calculated by performing a weighted summation calculation for each pixel, the sampling point of the deformable convolution kernel, and the weight of the sampling point.
- S102a may include:
- S102a-1 Perform a weighted sum calculation on each pixel, the sampling point of the deformable convolution kernel and the weight of the sampling point;
- S102a-2 Obtain the denoising pixel value corresponding to each pixel according to the calculation result.
- the denoising pixel value corresponding to each pixel can be obtained by performing a weighted sum calculation of the sampling points of the deformable convolution kernel and the weight values of the sampling points for each pixel.
- the deformable convolution kernel that performs convolution with the pixel contains N sampling points.
- the sampling value of each sampling point and the value of each sampling point The weight is weighted and calculated, and then the N sampling points are summed.
- the final result is the denoising pixel value corresponding to each pixel in the frame to be processed; specifically, see equation (3),
- the denoising pixel value corresponding to each pixel in the frame to be processed can be obtained through calculation.
- the position of each sampling point is not fixed, and the weight of each sampling point is also different; that is, the denoising processing of the embodiment of the present disclosure not only uses deformable
- the convolution kernel also uses variable weights; compared with the fixed convolution kernel in the prior art or artificially set weights, the video processing of the frame to be processed can achieve a better denoising effect.
- S102b Obtain a denoised video frame according to the denoised pixel value corresponding to each pixel.
- each pixel in the frame to be processed can be convolved with the corresponding deformable convolution kernel, that is, each pixel in the frame to be processed can be convolved with the sampling point and sampling of the deformable convolution kernel.
- the weight of the point is processed by convolution operation to obtain the denoising pixel value corresponding to each pixel; in this way, the denoising process of the frame to be processed is realized.
- FIG. 8 shows a detailed architectural schematic diagram of a video processing method provided by an embodiment of the present disclosure.
- the sample video sequence 801 is composed of multiple consecutive video frames (for example, including a sample reference frame, two adjacent frames forwardly adjacent to the sample reference frame, and a sample The reference frame is composed of two adjacent frames in the backward direction; then the input sample video sequence 801 is predicted and weighted based on the deep neural network.
- a coordinate prediction network 802 and a weight prediction network 803 can be established; Perform coordinate prediction according to the coordinate prediction network 802 to obtain the prediction coordinates 804 of the deformable convolution kernel; perform weight prediction according to the weight prediction network 803 to obtain the prediction weight 805 of the deformable convolution kernel; combine the input sample video sequence 801 and the variable
- the predicted coordinates 804 of the deformable convolution kernel are jointly input to the trilinear sampler 806, and the trilinear sampler 806 performs sampling processing, and the output of the trilinear sampler 806 is the sampling point 807 of the deformable convolution kernel;
- the sampling points 807 of the deformed convolution kernel and the prediction weight 805 of the deformable convolution kernel are subjected to a convolution operation 808 with the frame to be processed, and finally a denoised video frame 809 is output.
- the weight of the sampling points of the deformable convolution kernel can be obtained according to the predicted coordinates 804 of the deformable convolution kernel and the predicted weight 805 of the deformable convolution kernel; in this way, for the convolution operation 808, the sampling points of the deformable convolution kernel and the weights of the sampling points may be convolved with the frame to be processed, so as to realize the denoising processing of the frame to be processed.
- the deep neural network training of the sample video sequence through the deep neural network can obtain the deformable convolution kernel.
- the prediction coordinates and prediction weights of the deformable convolution kernel since the prediction coordinates are changed, it is explained that the position of each sampling point is changed, and further that the convolution kernel in the embodiment of the present disclosure is not fixed.
- the convolution kernel is a deformable convolution kernel, so that the embodiments of the present disclosure can be applied to video processing with large motion between frames; in addition, according to different sampling points, the weight of each sampling point can also be That is to say, the embodiment of the present disclosure not only uses a deformable convolution kernel, but also uses a changeable prediction weight, which can make the video processing of the frame to be processed achieve a better denoising effect.
- pixel-based adaptive Level information allocates different sampling points to track the movement of the same position in consecutive frames of the video, and can better make up for the lack of single frame information by using multi-frame information, and also enables the method of the embodiments of the present disclosure to be applied to video restoration Scene.
- the deformable convolution kernel can also be regarded as an efficient extractor of time-series optical flow, which makes full use of the multi-frame information in the continuous frames of the video, and can also apply the method of the embodiment of the present disclosure to other pixel-level In the video processing scene of information; in addition, the method based on the embodiments of the present disclosure can also achieve the purpose of high-quality video imaging under limited hardware quality or low-light conditions.
- the foregoing embodiment provides a video processing method by obtaining convolution parameters corresponding to frames to be processed in a video sequence, where the convolution parameters include the sampling points of the deformable convolution kernel and the weight of the sampling points;
- the sampling points of the deformable convolution kernel and the weights of the sampling points perform denoising processing on the frame to be processed to obtain a denoised video frame; in this way, since the convolution parameter is obtained by extracting continuous frames of video
- the information is obtained, which can effectively reduce the image blur, detail loss and ghosting problems caused by the motion between frames in the video; and the weight of the sampling point can also be changed according to the position of the sampling point, which can make the video
- the denoising effect is better and the image quality of the video is improved.
- FIG. 9 shows the composition of a video processing device 90 provided by an embodiment of the present disclosure.
- the video processing device 90 may include: an acquisition unit 901 and a denoising unit 902, among them,
- the obtaining unit 901 is configured to obtain a convolution parameter corresponding to a frame to be processed in a video sequence, where the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point;
- the denoising unit 902 is configured to perform denoising processing on the frame to be processed according to the sampling points of the deformable convolution kernel and the weight of the sampling points to obtain a denoised video frame.
- the video processing device 90 further includes a training unit 903 configured to perform deep neural network training based on the sample video sequence to obtain a deformable convolution kernel.
- the video processing device 90 further includes a prediction unit 904 and a sampling unit 905, where:
- the prediction unit 904 is configured to perform coordinate prediction and weight prediction on consecutive multiple video frames in the sample video sequence based on a deep neural network to obtain the prediction coordinates and prediction weights of the deformable convolution kernel, wherein,
- the multiple consecutive video frames include a sample reference frame and at least one adjacent frame;
- the sampling unit 905 is configured to sample the predicted coordinates of the deformable convolution kernel to obtain sampling points of the deformable convolution kernel;
- the acquiring unit 901 is further configured to obtain the weights of the sampling points of the deformable convolution kernel according to the predicted coordinates and the predicted weights of the deformable convolution kernel; and the sampling points of the deformable convolution kernel And the weight of the sampling point as the convolution parameter.
- the sampling unit 905 is specifically configured to input the predicted coordinates of the deformable convolution kernel into a preset sampling model to obtain sampling points of the deformable convolution kernel.
- the acquiring unit 901 is further configured to acquire pixels in the sample reference frame and the at least one adjacent frame;
- the sampling unit 905 is further configured to perform sampling calculations on the pixel points and the predicted coordinates of the deformable convolution kernel based on the sampling points of the deformable convolution kernel through a preset sampling model, and according to the calculated results Determine the sampling value of the sampling point.
- the denoising unit 902 is specifically configured to perform convolution processing on the sample points of the deformable convolution kernel and the weight of the sample points with the frame to be processed to obtain the denoised video frame.
- the video processing device 90 further includes a convolution unit 906 configured to combine each pixel with the deformable convolution kernel for each pixel in the frame to be processed Performing a convolution operation on the sampling points of and the weights of the sampling points to obtain the denoising pixel value corresponding to each pixel;
- the denoising unit 902 is specifically configured to obtain a denoised video frame according to the denoising pixel value corresponding to each pixel.
- the convolution unit 906 is specifically configured to perform a weighted sum calculation on each pixel point, the sampling point of the deformable convolution kernel and the weight of the sampling point; and according to the calculation result, Obtain the denoising pixel value corresponding to each pixel.
- a "unit" may be a part of a circuit, a part of a processor, a part of a program, or software, etc., of course, may also be a module, or may be non-modular.
- the various components in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be realized in the form of hardware or software function module.
- the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
- the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
- this embodiment provides a computer storage medium that stores a video processing program that implements the steps of the method in the foregoing embodiment when the video processing program is executed by at least one processor.
- FIG. 10 shows the specific hardware structure of the video processing device 90 provided by an embodiment of the present disclosure, which may include: a network interface 1001, a memory 1002, and a processor 1003;
- the various components are coupled together through the bus system 1004.
- the bus system 1004 is used to implement connection and communication between these components.
- the bus system 1004 also includes a power bus, a control bus, and a status signal bus.
- various buses are marked as the bus system 1004 in FIG. 10.
- the network interface 1001 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;
- the memory 1002 is configured to store a computer program that can run on the processor 1003;
- the processor 1003 is configured to execute: when running the computer program:
- the convolution parameter includes a sampling point of a deformable convolution kernel and a weight of the sampling point
- An embodiment of the present application provides a computer program product, wherein the computer program product stores a video processing program, and when the video processing program is executed by at least one processor, the steps of the method described in the foregoing embodiments are implemented.
- the memory 1002 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
- the volatile memory may be a random access memory (Random Access Memory, RAM), which is used as an external cache.
- RAM static random access memory
- DRAM dynamic random access memory
- DRAM synchronous dynamic random access memory
- DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
- Enhanced SDRAM, ESDRAM Synchronous Link Dynamic Random Access Memory
- Synchlink DRAM Synchronous Link Dynamic Random Access Memory
- DRRAM Direct Rambus RAM
- the processor 1003 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the above method can be completed by hardware integrated logic circuits in the processor 1003 or instructions in the form of software.
- the above-mentioned processor 1003 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP Digital Signal Processor
- ASIC application specific integrated circuit
- FPGA ready-made programmable gate array
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present disclosure can be implemented or executed.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present disclosure may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the storage medium is located in the memory 1002, and the processor 1003 reads the information in the memory 1002, and completes the steps of the foregoing method in combination with its hardware.
- the embodiments described herein can be implemented by hardware, software, firmware, middleware, microcode, or a combination thereof.
- the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Equipment (DSP Device, DSPD), programmable Logic Device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processors, controllers, microcontrollers, microprocessors, and others for performing the functions described in this disclosure Electronic unit or its combination.
- ASIC Application Specific Integrated Circuits
- DSP Digital Signal Processing
- DSP Device Digital Signal Processing Equipment
- PLD programmable Logic Device
- Field-Programmable Gate Array Field-Programmable Gate Array
- FPGA Field-Programmable Gate Array
- the technology described herein can be implemented through modules (such as procedures, functions, etc.) that perform the functions described herein.
- the software codes can be stored in the memory and executed by the processor.
- the memory can be implemented in the processor or external to the processor.
- the processor 1003 is further configured to execute the steps of the method in the foregoing embodiment when the computer program is running.
- FIG. 11 shows a schematic diagram of the composition structure of a terminal device 110 provided by an embodiment of the present disclosure; wherein, the terminal device 110 at least includes any video processing apparatus 90 involved in the foregoing embodiments.
- the technical solution of the present disclosure essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present disclosure.
- a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Transforming Electric Information Into Light Information (AREA)
- Image Analysis (AREA)
- Picture Signal Circuits (AREA)
Abstract
Description
Claims (20)
- 一种视频处理方法,所述方法包括:获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
- 根据权利要求1所述的方法,其中,在所述获取视频序列中待处理帧对应的卷积参数之前,所述方法还包括:基于样本视频序列进行深度神经网络训练得到可变形卷积核。
- 根据权利要求2所述的方法,其中,所述基于样本视频序列进行深度神经网络训练得到可变形卷积核,包括:基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
- 根据权利要求3所述的方法,其中,所述对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点,包括:将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
- 根据权利要求4所述的方法,其中,在所述获得所述可变形卷积核的采样点之后,所述方法还包括:获取所述样本参考帧及所述至少一个相邻帧中的像素点;基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
- 根据权利要求1至5任一项所述的方法,其中,所述根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧,包括:将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
- 根据权利要求6所述的方法,其中,所述将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧,包括:针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
- 根据权利要求7所述的方法,其中,所述将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值,包括:将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;根据计算的结果,获得每个像素点对应的去噪像素值。
- 一种视频处理装置,所述视频处理装置包括获取单元和去噪单元,其中,所述获取单元,配置为获取视频序列中待处理帧对应的卷积参数,其中,所述卷积参数包括可变形卷积核的采样点及所述采样点的权重;所述去噪单元,配置为根据所述可变形卷积核的采样点及所述采样点的权重对所述待处理帧进行去噪处理,得到去噪后的视频帧。
- 根据权利要求9所述的视频处理装置,其中,所述视频处理装置还包括训练单元,配置为基于样本视频序列进行深度神经网络训练得到可变形卷积核。
- 根据权利要求10所述的视频处理装置,其中,所述视频处理装置还包括预测单元和采样单元,其中,所述预测单元,配置为基于深度神经网络对所述样本视频序列中连续的多个视频帧分别进行坐标预测和权重预测,得到所述可变形卷积核的预测坐标和预测权重,其中,所述连续的多个视频帧包括样本参考帧及其至少一个相邻帧;所述采样单元,配置为对所述可变形卷积核的预测坐标进行采样,得到所述可变形卷积核的采样点;所述获取单元,还配置为根据所述可变形卷积核的预测坐标和预测权重,得到所述可变形卷积核的采样点的权重;以及将所述可变形卷积核的采样点及所述采样点的权重,作为所述卷积参数。
- 根据权利要求11所述的视频处理装置,其中,所述采样单元,具体配置为将所述可变形卷积核的预测坐标输入到预设采样模型中,获得所述可变形卷积核的采样点。
- 根据权利要求12所述的视频处理装置,其中,所述获取单元,还配置为获取所述样本参考帧及所述至少一个相邻帧中的像素点;所述采样单元,还配置为基于所述可变形卷积核的采样点,通过预设采样模型对所述像素点以及所述可变形卷积核的预测坐标进行采样计算,根据计算的结果确定所述采样点的采样值。
- 根据权利要求9至13任一项所述的视频处理装置,其中,所述去噪单元,具体配置为将可变形卷积核的采样点及所述采样点的权重与所述待处理帧进行卷积处理,得到所述去噪后的视频帧。
- 根据权利要求14所述的视频处理装置,其中,所述视频处理装置还包括卷积单元,配置为针对所述待处理帧中的每个像素点,将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行卷积运算,得到每个像素点对应的去噪像素值;所述去噪单元,具体配置为根据每个像素点对应的去噪像素值,得到去噪后的视频帧。
- 根据权利要求15所述的视频处理装置,其中,所述卷积单元,具体配置为将每个像素点与所述可变形卷积核的采样点以及所述采样点的权重进行加权求和计算;以及根据计算的结果,获得每个像素点对应的去噪像素值。
- 一种视频处理装置,其中,所述视频处理装置包括:存储器和处理器;其中,所述存储器,配置为于存储能够在所述处理器上运行的计算机程序;所述处理器,配置为在运行所述计算机程序时,执行如权利要求1至8任一项所述方法的步骤。
- 一种计算机存储介质,其中,所述计算机存储介质存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现如权利要求1至8任一项所述方法的步骤。
- 一种终端设备,其中,所述终端设备至少包括如权利要求9至17任一项所述的视频处理装置。
- 一种计算机程序产品,其中,所述计算机程序产品存储有视频处理程序,所述视频处理程序被至少一个处理器执行时实现如权利要求1至8任一项所述方法的步骤。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG11202108771RA SG11202108771RA (en) | 2019-03-19 | 2019-10-30 | Video processing method and apparatus, and computer storage medium |
JP2020573289A JP7086235B2 (ja) | 2019-03-19 | 2019-10-30 | ビデオ処理方法、装置及びコンピュータ記憶媒体 |
US17/362,883 US20210327033A1 (en) | 2019-03-19 | 2021-06-29 | Video processing method and apparatus, and computer storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910210075.5A CN109862208B (zh) | 2019-03-19 | 2019-03-19 | 视频处理方法、装置、计算机存储介质以及终端设备 |
CN201910210075.5 | 2019-03-19 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/362,883 Continuation US20210327033A1 (en) | 2019-03-19 | 2021-06-29 | Video processing method and apparatus, and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020186765A1 true WO2020186765A1 (zh) | 2020-09-24 |
Family
ID=66901319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/114458 WO2020186765A1 (zh) | 2019-03-19 | 2019-10-30 | 视频处理方法、装置以及计算机存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210327033A1 (zh) |
JP (1) | JP7086235B2 (zh) |
CN (1) | CN109862208B (zh) |
SG (1) | SG11202108771RA (zh) |
TW (1) | TWI714397B (zh) |
WO (1) | WO2020186765A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744156A (zh) * | 2021-09-06 | 2021-12-03 | 中南大学 | 一种基于可变形卷积神经网络的图像去噪方法 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109862208B (zh) * | 2019-03-19 | 2021-07-02 | 深圳市商汤科技有限公司 | 视频处理方法、装置、计算机存储介质以及终端设备 |
CN112580675A (zh) * | 2019-09-29 | 2021-03-30 | 北京地平线机器人技术研发有限公司 | 图像处理方法及装置、计算机可读存储介质 |
CN113727141B (zh) * | 2020-05-20 | 2023-05-12 | 富士通株式会社 | 视频帧的插值装置以及方法 |
CN113936163A (zh) * | 2020-07-14 | 2022-01-14 | 武汉Tcl集团工业研究院有限公司 | 一种图像处理方法、终端以及存储介质 |
US11689713B2 (en) * | 2020-07-15 | 2023-06-27 | Tencent America LLC | Predicted frame generation by deformable convolution for video coding |
CN114640796B (zh) * | 2022-03-24 | 2024-02-09 | 北京字跳网络技术有限公司 | 视频处理方法、装置、电子设备及存储介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160358069A1 (en) * | 2015-06-03 | 2016-12-08 | Samsung Electronics Co., Ltd. | Neural network suppression |
CN106408522A (zh) * | 2016-06-27 | 2017-02-15 | 深圳市未来媒体技术研究院 | 一种基于卷积对神经网络的图像去噪方法 |
CN107292319A (zh) * | 2017-08-04 | 2017-10-24 | 广东工业大学 | 一种基于可变形卷积层的特征图像提取的方法及装置 |
CN107516304A (zh) * | 2017-09-07 | 2017-12-26 | 广东工业大学 | 一种图像去噪方法及装置 |
CN107609638A (zh) * | 2017-10-12 | 2018-01-19 | 湖北工业大学 | 一种基于线性解码器和插值采样优化卷积神经网络的方法 |
CN107609519A (zh) * | 2017-09-15 | 2018-01-19 | 维沃移动通信有限公司 | 一种人脸特征点的定位方法及装置 |
CN107909113A (zh) * | 2017-11-29 | 2018-04-13 | 北京小米移动软件有限公司 | 交通事故图像处理方法、装置及存储介质 |
CN109862208A (zh) * | 2019-03-19 | 2019-06-07 | 深圳市商汤科技有限公司 | 视频处理方法、装置以及计算机存储介质 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9786036B2 (en) * | 2015-04-28 | 2017-10-10 | Qualcomm Incorporated | Reducing image resolution in deep convolutional networks |
US10043243B2 (en) * | 2016-01-22 | 2018-08-07 | Siemens Healthcare Gmbh | Deep unfolding algorithm for efficient image denoising under varying noise conditions |
CN106296692A (zh) * | 2016-08-11 | 2017-01-04 | 深圳市未来媒体技术研究院 | 基于对抗网络的图像显著性检测方法 |
CN107103590B (zh) * | 2017-03-22 | 2019-10-18 | 华南理工大学 | 一种基于深度卷积对抗生成网络的图像反射去除方法 |
US10409888B2 (en) * | 2017-06-02 | 2019-09-10 | Mitsubishi Electric Research Laboratories, Inc. | Online convolutional dictionary learning |
CN107495959A (zh) * | 2017-07-27 | 2017-12-22 | 大连大学 | 一种基于一维卷积神经网络的心电信号分类方法 |
WO2019019199A1 (en) * | 2017-07-28 | 2019-01-31 | Shenzhen United Imaging Healthcare Co., Ltd. | SYSTEM AND METHOD FOR IMAGE CONVERSION |
CN107689034B (zh) * | 2017-08-16 | 2020-12-01 | 清华-伯克利深圳学院筹备办公室 | 一种去噪方法及装置 |
CN109074633B (zh) * | 2017-10-18 | 2020-05-12 | 深圳市大疆创新科技有限公司 | 视频处理方法、设备、无人机及计算机可读存储介质 |
CN107886162A (zh) * | 2017-11-14 | 2018-04-06 | 华南理工大学 | 一种基于wgan模型的可变形卷积核方法 |
CN108197580B (zh) * | 2018-01-09 | 2019-07-23 | 吉林大学 | 一种基于3d卷积神经网络的手势识别方法 |
CN108805265B (zh) * | 2018-05-21 | 2021-03-30 | Oppo广东移动通信有限公司 | 神经网络模型处理方法和装置、图像处理方法、移动终端 |
-
2019
- 2019-03-19 CN CN201910210075.5A patent/CN109862208B/zh active Active
- 2019-10-30 WO PCT/CN2019/114458 patent/WO2020186765A1/zh active Application Filing
- 2019-10-30 SG SG11202108771RA patent/SG11202108771RA/en unknown
- 2019-10-30 JP JP2020573289A patent/JP7086235B2/ja active Active
- 2019-12-18 TW TW108146509A patent/TWI714397B/zh active
-
2021
- 2021-06-29 US US17/362,883 patent/US20210327033A1/en not_active Abandoned
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160358069A1 (en) * | 2015-06-03 | 2016-12-08 | Samsung Electronics Co., Ltd. | Neural network suppression |
CN106408522A (zh) * | 2016-06-27 | 2017-02-15 | 深圳市未来媒体技术研究院 | 一种基于卷积对神经网络的图像去噪方法 |
CN107292319A (zh) * | 2017-08-04 | 2017-10-24 | 广东工业大学 | 一种基于可变形卷积层的特征图像提取的方法及装置 |
CN107516304A (zh) * | 2017-09-07 | 2017-12-26 | 广东工业大学 | 一种图像去噪方法及装置 |
CN107609519A (zh) * | 2017-09-15 | 2018-01-19 | 维沃移动通信有限公司 | 一种人脸特征点的定位方法及装置 |
CN107609638A (zh) * | 2017-10-12 | 2018-01-19 | 湖北工业大学 | 一种基于线性解码器和插值采样优化卷积神经网络的方法 |
CN107909113A (zh) * | 2017-11-29 | 2018-04-13 | 北京小米移动软件有限公司 | 交通事故图像处理方法、装置及存储介质 |
CN109862208A (zh) * | 2019-03-19 | 2019-06-07 | 深圳市商汤科技有限公司 | 视频处理方法、装置以及计算机存储介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113744156A (zh) * | 2021-09-06 | 2021-12-03 | 中南大学 | 一种基于可变形卷积神经网络的图像去噪方法 |
Also Published As
Publication number | Publication date |
---|---|
TWI714397B (zh) | 2020-12-21 |
JP7086235B2 (ja) | 2022-06-17 |
CN109862208A (zh) | 2019-06-07 |
TW202037145A (zh) | 2020-10-01 |
JP2021530770A (ja) | 2021-11-11 |
US20210327033A1 (en) | 2021-10-21 |
CN109862208B (zh) | 2021-07-02 |
SG11202108771RA (en) | 2021-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020186765A1 (zh) | 视频处理方法、装置以及计算机存储介质 | |
CN108629743B (zh) | 图像的处理方法、装置、存储介质和电子装置 | |
CN111275626B (zh) | 一种基于模糊度的视频去模糊方法、装置及设备 | |
US9615039B2 (en) | Systems and methods for reducing noise in video streams | |
US20210352212A1 (en) | Video image processing method and apparatus | |
US9007402B2 (en) | Image processing for introducing blurring effects to an image | |
EP2164040B1 (en) | System and method for high quality image and video upscaling | |
WO2021238500A1 (zh) | 全景视频插帧方法、装置及对应的存储介质 | |
WO2021189733A1 (zh) | 图像处理方法及装置、电子设备、存储介质 | |
CN106780336B (zh) | 一种图像缩小方法及装置 | |
CN112602088B (zh) | 提高弱光图像的质量的方法、系统和计算机可读介质 | |
CN111935425B (zh) | 视频降噪方法、装置、电子设备及计算机可读介质 | |
CN114073071A (zh) | 视频插帧方法及装置、计算机可读存储介质 | |
CN110958363B (zh) | 图像处理方法及装置、计算机可读介质和电子设备 | |
CN109544490B (zh) | 图像增强方法、装置和计算机可读存储介质 | |
CN114390188B (zh) | 一种图像处理方法和电子设备 | |
CN113409188A (zh) | 一种图像背景替换方法、系统、电子设备及存储介质 | |
CN113596576A (zh) | 一种视频超分辨率的方法及装置 | |
WO2020187042A1 (zh) | 图像处理方法、装置、设备以及计算机可读介质 | |
US20230098437A1 (en) | Reference-Based Super-Resolution for Image and Video Enhancement | |
US20220321830A1 (en) | Optimization of adaptive convolutions for video frame interpolation | |
US11195247B1 (en) | Camera motion aware local tone mapping | |
CN114463213A (zh) | 视频处理方法、视频处理装置、终端及存储介质 | |
CN111738958B (zh) | 图片修复方法、装置、电子设备及计算机可读介质 | |
WO2024130715A1 (zh) | 视频处理方法、视频处理装置和可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19920051 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020573289 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/01/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19920051 Country of ref document: EP Kind code of ref document: A1 |