WO2023160426A1 - 视频插帧方法、训练方法、装置和电子设备 - Google Patents

视频插帧方法、训练方法、装置和电子设备 Download PDF

Info

Publication number
WO2023160426A1
WO2023160426A1 PCT/CN2023/075807 CN2023075807W WO2023160426A1 WO 2023160426 A1 WO2023160426 A1 WO 2023160426A1 CN 2023075807 W CN2023075807 W CN 2023075807W WO 2023160426 A1 WO2023160426 A1 WO 2023160426A1
Authority
WO
WIPO (PCT)
Prior art keywords
optical flow
map
training
initial
fusion
Prior art date
Application number
PCT/CN2023/075807
Other languages
English (en)
French (fr)
Inventor
吕朋伟
Original Assignee
影石创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 影石创新科技股份有限公司 filed Critical 影石创新科技股份有限公司
Publication of WO2023160426A1 publication Critical patent/WO2023160426A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping

Definitions

  • the present application relates to the technical field of image processing, and in particular to a video frame insertion method, training method, device and electronic equipment.
  • Video frame interpolation refers to the use of algorithms to generate video intermediate frames, which are used to increase video frame rates or generate slow-motion special effects videos. However, the accuracy of frame interpolation results obtained by traditional video frame interpolation methods is low.
  • a video frame insertion method, training method, device and electronic equipment which can improve the accuracy of frame insertion results.
  • a video frame interpolation method comprising: obtaining two adjacent video frames in the video, the two video frames including a previous video frame and a subsequent video frame; calculating the optical flow between the two video frames ;Transform the optical flow between two video frames into the initial optical flow based on the preset ratio; map the two video frames through the initial optical flow to obtain the initial map; based on the initial map, the light between the two video frames The flow is corrected to obtain the corrected optical flow; according to the corrected optical flow, the target interpolation frame between two video frames is obtained.
  • a neural network training method for video frame interpolation comprising: obtaining a set of training data, a set of training data including three consecutive video frames, and the three consecutive video frames are sequentially the first training video Frame, the second training video frame and the 3rd training video frame; Obtain the first reference reverse optical flow, the first reference reverse optical flow is the reverse optical flow from the first training video frame to the second training video frame; Obtain The second reference reverse optical flow, the second reference reverse optical flow is the reverse optical flow from the third training video frame to the second training video frame; calculate the first training reverse optical flow, the first training reverse optical flow Be the reverse optical flow from the first training video frame to the third training video frame; Calculate the second training reverse optical flow, the second training reverse optical flow is the reverse optical flow from the third training video frame to the first training video frame to the optical flow; transform the first training reverse optical flow into the first initial training optical flow based on the preset ratio; transform the second training reverse optical flow into the second initial training optical flow based on the preset ratio; through the
  • the third training reverse optical flow is the corrected reverse optical flow from the first training video frame to the second training video frame
  • the fourth training reverse optical flow is the corrected From the third training video frame to the reverse optical flow of the second training video frame
  • the first training video frame is mapped to the third training map by the third training reverse optical flow
  • through the fourth training reverse optical flow pair The third training video frame is mapped to obtain the fourth training map
  • the four training maps are input into the fusion neural network, and the fusion parameter map output by the fusion neural network is obtained; based on Fusing the parameter map, performing fusion calculation on the third training map and the fourth training map to obtain the target interpolation frame; based on the difference between the target interpolation frame and the second training video frame, the third training reverse optical flow and the first
  • the difference between the reference reverse optical flow, the difference between the fourth training reverse optical flow and the second reference reverse optical flow adjusts
  • a video frame insertion device including: an acquisition module, configured to acquire two adjacent video frames in the video; the acquisition module is also used to calculate the optical flow between the two video frames; the acquisition module also It is used to transform the optical flow between two video frames into an initial optical flow based on a preset ratio; the acquisition module is also used to map the two video frames through the initial optical flow to obtain an initial map; the correction module is used to The optical flow between the two video frames is corrected based on the initial map to obtain a corrected optical flow; the frame interpolation module is used to obtain a target interpolation frame between the two video frames according to the corrected optical flow.
  • an electronic device including: a processor and a memory, the memory is used to store at least one instruction, and the above method is implemented when the instruction is loaded and executed by the processor.
  • a computer-readable storage medium is provided.
  • a computer program is stored in the computer-readable storage medium, and when it is run on a computer, it causes the computer to execute the above-mentioned method.
  • the video frame interpolation method, training method, device, and electronic equipment of the embodiments of the present application first calculate the optical flow between two adjacent video frames in the video, then correct the optical flow, and then obtain the interpolation based on the corrected optical flow.
  • Frame where optical flow refers to the instantaneous velocity of the pixel movement of a space moving object on the observation imaging plane, optical flow contains the information of the target movement, and expresses the change of the image.
  • the optical flow between two video frames can be interpolated between two video frames; in addition, the optical flow can be converted according to the ratio, and the initial optical flow corresponding to the position between the two video frames can be obtained, and according to the converted initial optical flow
  • the flow maps the video frames to obtain an initial map corresponding to the corresponding position between the two video frames, and corrects the optical flow based on the initial map, so that the optical flow can more accurately reflect the changes between the two video frames , thus improving the accuracy of the interpolation results.
  • FIG. 1 is a schematic flow diagram of a video frame insertion method in an embodiment of the present application
  • Fig. 2 is a schematic flow chart of another video frame insertion method in the embodiment of the present application.
  • FIG. 3 is a schematic flow diagram of another video frame insertion method in the embodiment of the present application.
  • FIG. 4 is a schematic diagram of a model structure of an optical flow correction neural network in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a model structure of a fusion neural network in an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a video frame insertion device in an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a neural network training device in an embodiment of the present application.
  • FIG. 8 is a structural block diagram of an electronic device in an embodiment of the present application.
  • this embodiment of the present application provides a video frame insertion method, including:
  • Step 101 obtaining two adjacent video frames in the video
  • the video is the video to be frame-inserted, and the two video frames can be any two adjacent video frames, and the two video frames include the previous video frame I 1 and the subsequent video frame I 3 .
  • Step 102 calculating the optical flow between two video frames
  • Step 103 Transform the optical flow between two video frames into an initial optical flow based on a preset ratio.
  • the optical flow between two video frames is calculated based on two video frames.
  • the The optical flow is converted to the optical flow at the preset position between two video frames, that is, the initial optical flow;
  • Step 104 Map two video frames through the initial optical flow to obtain an initial map
  • Step 105 Correct the optical flow between the two video frames based on the initial map to obtain the corrected optical flow.
  • Step 106 Obtain a target interpolation frame between two video frames according to the corrected optical flow.
  • step 106 After the target interpolation frame between the two video frames is obtained through step 106, the process from step 101 to step 106 can be continued to obtain the target interpolation frame between the other two video frames, for example, after the acquired video frame After the target frame interpolation between the first frame and the second frame, the method can be looped to continue to obtain the target frame interpolation between the next two adjacent frames after the preset frame interval, and so on, to realize the frame interpolation of the entire video.
  • the video frame interpolation method of the embodiment of the present application first calculates the optical flow between two adjacent video frames in the video, then corrects the optical flow, and then obtains the frame interpolation based on the corrected optical flow, wherein the optical flow (optical Flow) refers to the instantaneous speed of the pixel movement of the spatial moving object on the observation imaging plane.
  • the optical flow contains the information of the target movement and expresses the change of the image.
  • the optical flow between two adjacent video frames in the video is used , the interpolation frame between two video frames can be obtained; in addition, the optical flow can be converted according to the ratio, and the initial optical flow corresponding to the position between the two video frames can be obtained, and the video frame can be adjusted according to the converted initial optical flow Mapping is performed to obtain an initial map corresponding to the corresponding position between the two video frames, and the optical flow is corrected based on the initial map, so that the optical flow can more accurately reflect the changes between the two video frames, thereby improving the The precision of interpolation results.
  • step 102 calculating the optical flow between two video frames includes: calculating the optical flow between two video frames based on a computer vision algorithm, where the computer vision algorithm refers to traditional image processing method, rather than the method based on neural network prediction;
  • Step 105 modifying the optical flow between two video frames based on the initial map includes: based on the neural network, using the initial map as input to the optical flow between the two video frames Optical flow is corrected.
  • step 105 the optical flow calculated in step 102 is corrected based on the pre-trained neural network. In this step, since a roughly accurate optical flow has been calculated by a computer vision algorithm, the neural network only needs to correct the optical flow, so the computational load of the neural network is relatively small.
  • the traditional video frame insertion method is to calculate the optical flow through computer vision algorithms, and then use the calculated optical flow to perform optical flow mapping to obtain the target interpolation frame.
  • the frame interpolation based on the optical flow obtained in this way has low accuracy of the frame interpolation result. Therefore, in order to improve the accuracy, the neural network can be used to predict the optical flow, and then obtain the target frame interpolation method, but the calculation of this method large.
  • the video frame insertion method of the embodiment of the present application first calculates the optical flow based on the computer vision algorithm, and then calculates the optical flow based on the neural network
  • the network corrects the optical flow, and then interpolates frames based on the corrected optical flow. Since the optical flow is corrected by the prediction of the neural network, the frame interpolation is performed based on the optical flow obtained in this way, and the frame interpolation results have higher accuracy. , for example, can reduce object contour edge artifacts and improve user experience under slow-motion video; and because the neural network only needs to correct the optical flow that has been obtained, it reduces the amount of calculation of the neural network. That is, the amount of calculation is reduced on the premise of improving the accuracy of the frame interpolation result.
  • step 105 is to correct the optical flow between two video frames based on the initial map, and the process of obtaining the corrected optical flow includes: combining the two video frames, The initial optical flow and the initial map are input to the optical flow correction neural network, and the initial optical flow is corrected through the optical flow correction neural network to obtain the corrected optical flow output by the optical flow correction neural network.
  • step 106 obtaining the target interpolation frame between two video frames according to the corrected optical flow includes:
  • Step 1061 map the two video frames through the corrected optical flow to obtain the corrected map
  • Step 1062 input the two video frames, the corrected optical flow and the corrected map into the fusion neural network to obtain a fusion parameter map output by the fusion neural network;
  • Step 1063 based on the fusion parameter map, perform fusion calculation on the corrected map to obtain the target interpolation frame.
  • the optical flow between two video frames includes a first reverse optical flow F 3-1 and a second reverse optical flow F 1-3 , the first reverse optical flow
  • the optical flow F 3-1 is the reverse optical flow from the previous video frame I 1 to the next video frame I 3
  • the second reverse optical flow F 1-3 is from the next video frame I 3 to the previous video frame
  • the reverse optical flow of I 1 that is, step 101 is to acquire the adjacent previous video frame I 1 and the next video frame I 3 in the video. That is, step 102 includes:
  • Step 1021 calculate the first reverse optical flow F 3-1 based on the computer vision algorithm, that is, the reverse optical flow from the previous video frame I 1 to the next video frame I 3 ;
  • Step 1022 Calculate the second reverse optical flow F 1-3 based on the computer vision algorithm, that is, the reverse optical flow from the next video frame I 3 to the previous video frame I 1 .
  • the reverse optical flow is also called the backward optical flow.
  • the optical flow in the embodiment of the present application can be expressed as an optical flow graph.
  • the optical flow graph records the "offset" of each pixel on a graph.
  • the "offset" here has two directions, one is the offset x in the left and right direction, and the other is the offset in the up and down direction.
  • Shift y the value of this offset can be simply understood as the distance to be moved (the number of pixels), "applying optical flow to image A", or “mapping image A through optical flow” refers to A Each pixel on the map performs a shift operation according to its offset value (up-down direction + left-right direction) at the corresponding position on the optical flow map. After the optical flow mapping is completed, a new image will be obtained, called for the map.
  • the optical flow calculated from image A to image B is the forward optical flow of image A for image A, and the reverse optical flow of image B for image B.
  • forward optical flow refers to the optical flow calculated from image A to image B
  • backward optical flow/reverse optical flow refers to the optical flow calculated from image B to image A.
  • Step 103 transforming the optical flow between two video frames into an initial optical flow based on a preset ratio includes:
  • Step 1031 Transform the first reverse optical flow F 3-1 into the first initial optical flow FCV 2-1 based on the preset ratio, and the first initial optical flow FCV 2-1 is used as the interpolation frame from the previous video frame I 1 to the target frame
  • the reverse optical flow of IN 2 since the target interpolation frame IN 2 is located between the two video frames I 1 and I 3 , through the transformation based on the preset ratio, it can be close to It is possible to obtain the optical flow between two video frames, for example, set the preset ratio to 0.5, and by making F 3-1 ⁇ 0.5, the optical flow of the intermediate frame at half of the two video frames can be approximated;
  • Step 1032 transform the second reverse optical flow F 1-3 into the second initial optical flow FCV 2-3 based on the preset ratio, and the second initial optical flow FCV 2-3 is used as the interpolation from the next video frame I 3 to the target Reverse optical flow of frame IN 2 ;
  • Step 104 the two video frames are subjected to initial optical flow mapping to obtain an initial map including:
  • Step 1041 using the first initial optical flow FCV 2-1 to map the previous video frame I 1 to obtain the first map WF 1-2 ;
  • the first initial optical flow FCV 2-1 is used as the optical flow mapping backward warp on the I 1 image, and the mapping map WarpMask or optical flow mapping map WarpFlow can be obtained through mapping, that is, the first mapping map WF 1- 2 .
  • Step 1042 Map the next video frame I 3 through the second initial optical flow FCV 2-3 to obtain the second map WF 3-2 , that is, the initial map in the above step 1052 includes the first map WF 1-2 and the second map WF 3-2 ;
  • Step 105 Based on the two video frames, the initial optical flow and the initial map, the initial optical flow is corrected through the optical flow correction neural network, and the process of obtaining the corrected optical flow includes:
  • the previous video frame I 1 , the next video frame I 3 , the first initial optical flow FCV 2-1 , the second initial optical flow FCV 2-3 , the first map WF 1-2 and the second map WF 3 -2 is input to the optical flow correction neural network to obtain the third reverse optical flow FCVU 2-1 and the fourth reverse optical flow FCVU 2-3 output by the optical flow correction neural network, and the third reverse optical flow FCVU 2-1 It is the reverse optical flow from the previous video frame I 1 to the target interpolation frame IN 2 after correction, and the fourth reverse optical flow FCVU 2-3 is the corrected reverse optical flow from the next video frame I 3 to the target interpolation frame IN 2
  • the reverse optical flow that is, the third reverse optical flow FCVU 2 - 1 and the fourth reverse optical flow FCVU 2 - 3 belong to the corrected optical flow in step 105 .
  • the neural network model structure of the optical flow correction neural network can be shown in Figure 4.
  • the neural network model can include convolution Conv+activation function Relu downsampling module, convolution Conv+activation function Relu feature extraction module, deconvolution ConvTranspose+activation Function Relu upsampling module.
  • the input of the neural network model is the above-mentioned I 1 , I 3 , FCV 2-1 , FCV 2-3 , WF 1-2 and WF 3-2 ;
  • the downsampling module is used to reduce the size of the input, thereby speeding up
  • the speed of predictive reasoning is also used to extract network features;
  • the feature extraction module is used to extract and convert internal features of the network, and extracts the features after the convolutional layer operation in the convolutional network.
  • the upsampling module is used to re-amplify the reduced features back to the original input size
  • the output of the neural network model is the third reverse optical flow FCVU 2-1 and The fourth reverse optical flow FCVU 2-3 , that is, the corrected reverse optical flow from the previous video frame I 1 to the target interpolation frame IN 2 and the corrected reverse optical flow from the next video frame I 3 to the target interpolation frame IN 2 Reverse optical flow. That is to say, the neural network is used to modify the first initial optical flow FCV 2-1 to the third reverse optical flow FCVU 2-1 , and to modify the second initial optical flow FCV 2-3 to the fourth reverse optical flow FCVU 2-3 .
  • the relevant modules in the figure refer to the reuse of modules.
  • the same feature extraction module is reused, which reduces the complexity of the network structure and enhances the representation ability of network feature extraction.
  • the training process of the neural network model will be described in the following content.
  • step 1061 obtaining the target interpolation frame between two video frames according to the corrected optical flow includes:
  • Step 10611 use the third reverse optical flow FCVU 2-1 to map the previous video frame I1 to obtain the third image Shooting map WM 1-2 ;
  • Step 10612 map the next video frame I 3 through the fourth reverse optical flow FCVU 2-3 to obtain the fourth map WM 3-2 ;
  • Step 1062 combine the previous video frame I 1 , the next video frame I 3 , the third reverse optical flow FCVU 2-1 , the fourth reverse optical flow FCVU 2-3 , the third map WM 1-2 and the first
  • the four-map WM 3-2 is input into the fusion neural network to obtain the fusion parameter map m output by the fusion neural network;
  • the neural network model structure of the fusion neural network can be shown in FIG. 5 , and the neural network model can include a convolution Conv+activation function Relu downsampling module and a deconvolution ConvTranspose+activation function Relu upsampling module.
  • the input of the neural network model is the above-mentioned I 1 , I 3 , FCVU 2-1 , FCVU 2-3 , WM 1-2 and WM 3-2 ;
  • the output of the neural network model is the fusion parameter map m, and the fusion The parameter map m is used to participate in the calculation in the subsequent process to obtain the target interpolation frame IN 2 .
  • the training process of the neural network model will be described in the following content.
  • Step 1063 based on the fusion parameter map m, perform fusion calculation on the third map WM 1-2 and the fourth map WM 3-2 to obtain the target interpolation frame IN 2 .
  • step 1063 based on the fusion parameter map m, performs fusion calculation on the third map WM 1-2 and the fourth map WM 3-2 , and the process of obtaining the target interpolation frame IN 2 includes: Multiply the pixel values in the third map WM 1-2 and the fusion parameter map m correspondingly to obtain the first fusion map WM 1-2 ⁇ m, wherein the resolution of the fusion parameter map m is the same as the resolution of any video frame Similarly, multiple pixel values of the fusion parameter map m correspond to multiple pixel values of the third map WM 1-2 , each pixel value of the fusion parameter map m ranges from 0 to 1, and multiple pixel values of the first fusion map pixel values correspond to a plurality of pixel values in the third map WM 1-2 , and a plurality of pixel values in the third map WM 1-2 are multiplied in a one-to-one correspondence with a plurality of pixel values in the fusion parameter map
  • the calculation formula of the target interpolation frame IN 2 is actually to multiply the third map WM 1-2 and the fusion parameter map m point by point to obtain an intermediate result, and combine the fourth map WM 3-2 and (1 with the fusion parameter map The result of point-by-point subtraction of m) is multiplied point-by-point to obtain another intermediate result, and then the two intermediate results are added point-by-point.
  • Table 1 shows a comparison table of the target interpolation frame IN 2 , the third map WM 1-2 , the fourth map WM 3-2 and the fusion parameter map m.
  • the values in Table 1 are pixel values.
  • the pixel values of the third map WM 1-2 are all 2, and the pixel values of the fourth map WM 3-2 are all 4.
  • the pixel values of the fusion parameter map m are all 0, and the target interpolation frame IN obtained after calculating each pixel value according to the formula WM 1-2 ⁇ m+WM 3-2 ⁇ (1-m)
  • the optical flow of the intermediate frame at the corresponding positions of the two video frames can be obtained, so as to facilitate the determination of the target interpolation frame IN 2 based on the optical flow in the subsequent calculation process.
  • the preset ratio value t may be 0.5. If the preset ratio value t is 0.5, the optical flow obtained after transformation is the optical flow at half of the two video frames.
  • the embodiment of the present application also provides a neural network training method for video frame interpolation.
  • the neural network training method can be used to train the above-mentioned optical flow correction neural network and fusion neural network. Before performing neural network training, it can be obtained from About 100,000 sets of data are extracted from 1,000 video clips covering various scenes and sports forms as training data. Including three consecutive video frames, all the data is normalized to a uniform resolution size by clipping or scaling methods, for example, the resolution is unified to 768 ⁇ 768.
  • Neural network training methods include:
  • Step 201 obtain a set of training data, a set of training data includes three consecutive video frames, the three consecutive video frames are the first training video frame i 1 , the second training video frame i 2 and the third training video frame i 3 , in this step, a set of training data can be randomly selected from the training data;
  • Step 202 obtain the first reference reverse optical flow fg 2-1 , the first reference reverse optical flow fg 2-1 is the reverse optical flow from the first training video frame i 1 to the second training video frame i 2 ;
  • Step 203 obtain the second reference reverse optical flow fg 2-3 , the second reference reverse optical flow fg 2-3 is the reverse optical flow from the third training video frame i3 to the second training video frame i2 ;
  • the first reference backlight flow fg 2-1 and the second reference backlight flow fg 2-3 can be Obtained by the most advanced optical flow acquisition method of the third party, fg 2-1 and fg 2-3 are used as reference optical flow, which is convenient for subsequent comparison with the results output by the neural network, and then the network parameters are adjusted .
  • Step 204 calculate the first training reverse optical flow f 3-1 , the first training reverse optical flow f 3-1 is the reverse optical flow from the first training video frame i 1 to the third training video frame i 3 ;
  • Step 205 calculating the second training reverse optical flow f 1-3 , the second training reverse optical flow f 1-3 is the reverse optical flow from the third training video frame i 3 to the first training video frame i 1 ;
  • Step 206 Transform the first training reverse optical flow f 3-1 into the first initial training optical flow fcv 2- 1 based on the preset ratio, and the first initial training optical flow fcv 2-1 is used as the first training video frame i 1 to the reverse optical flow of the second training video frame i 2 ;
  • Step 207 Transform the second training reverse optical flow f 1-3 into the second initial training optical flow fcv 2- 3 based on the preset ratio, and the second initial training optical flow fcv 2-3 is used as the input from the third training video frame i 3 to the reverse optical flow fcv 2-3 of the second training video frame i 2 ;
  • Step 208 using the first initial training optical flow fcv 2-1 to map the first training video frame i 1 to obtain the first training map wf 1-2 ;
  • Step 209 map the third training video frame i 3 through the second initial training optical flow fcv 2-3 to obtain the second training map wf 3-2 ;
  • Step 210 the first training video frame i 1 , the third training video frame i 3 , the first initial training optical flow fcv 2- 1 , the second initial training optical flow fcv 2-3 , the first training map wf 1- 2 and the second training map wf 3-2 are input to the optical flow correction neural network to obtain the third training reverse optical flow fcvu 2-1 and the fourth training reverse optical flow fcvu 2-3 output by the optical flow correction neural network , the third training reverse optical flow fcvu 2-1 is the modified reverse optical flow from the first training video frame i 1 to the second training video frame i 2 , and the fourth training reverse optical flow fcvu 2-3 is The reverse optical flow from the third training video frame i 3 to the second training video frame i 2 after correction;
  • Step 211 map the first training video frame i 1 through the third training reverse optical flow fcvu 2-1 to obtain the third training map wm 1-2 ;
  • Step 212 map the third training video frame i 3 through the fourth training reverse optical flow fcvu 2-3 to obtain the fourth training map wm 3-2 ;
  • Step 213 the first training video frame i 1 , the third training video frame i 3 , the third training reverse optical flow fcvu 2- 1 , the fourth training reverse optical flow fcvu 2-3 , the first training map wm 1-2 and the second training map wm 3-2 are input into the fusion neural network to obtain the fusion parameter map m output by the fusion neural network;
  • Step 214 based on the fusion parameter map m, perform fusion calculation on the third training map wm 1-2 and the fourth training map wm 3-2 to obtain the target interpolation frame in 2 ;
  • the resolution of the fusion parameter map m is the same as that of any video frame
  • the value range of each pixel of the fusion parameter map m is 0 ⁇ 1
  • the target interpolation frame in 2 wm 1-2 ⁇ m+wm 3-2 ⁇ (1-m).
  • Step 215 based on the difference between the target interpolation frame in 2 and the second training video frame i 2 , the difference between the third training reverse optical flow fcvu 2-1 and the first reference reverse optical flow fg 2-1 , The fourth training reverse optical flow fcvu 2-3 and the second parameter Consider the difference between the reverse optical flow fg 2-3 and adjust the network parameters of the optical flow correction neural network and the fusion neural network.
  • the second training video frame i 2 is known, and the target interpolation frame in 2 is obtained based on neural network prediction, therefore, can be based on the difference between in 2 and i 2 Adjust the network parameters to make the predictions of the neural network more accurate, for similar reasons, the network parameters can be adjusted based on the difference between fcvu 2-1 and fg 2-1 and the difference between fcvu 2-3 and fg 2-3 .
  • step 201 to step 515 is a round of training process, and the neural network can perform multiple rounds of training based on the training data.
  • step 215 calculate the L1 loss L1loss between in 2 and i 2 , the L1loss between fcvu 2-1 and fg 2-1 , the L1loss between fcvu 2-3 and fg 2-3 , and reverse Propagate iteratively to the optical flow correction neural network and the convergence of the fusion neural network, that is, during the multi-round network training process, adjust the network parameters of the optical flow correction neural network and the fusion neural network according to these L1loss, so that the network parameters are continuously optimized until the L1loss is no longer If it drops again, it means that the network training is completed, and the prediction effect of the neural network is the best at this time. After the network training is completed, the neural network can be corrected based on the trained optical flow and the neural network can be fused, and the above-mentioned video frame insertion method can be used to realize video frame insertion.
  • step 204, calculating the first training inverse optical flow f 3-1 includes: calculating the first training inverse optical flow f 3-1 based on a computer vision algorithm;
  • the reverse optical flow f 1-3 includes: calculating the third training reverse optical flow f 1-3 based on a computer vision algorithm.
  • step 214 based on the fusion parameter map m, performs fusion calculation on the third training map wm 1-2 and the fourth training map wm 3-2 to obtain the target interpolation frame in 2 It includes: multiplying the pixel values in the third training map wm 1-2 and the fusion parameter map m correspondingly to obtain the first fusion map wm 1-2 ⁇ m, wherein the resolution of the fusion parameter map m is the same as that of any video frame have the same resolution, the multiple pixel values of the fusion parameter map m are in one-to-one correspondence with the multiple pixel values of the third training map wm 1-2 , the range of each pixel value of the fusion parameter map m is 0-1, the first A plurality of pixel values of the fusion map m are in one-to-one correspondence with a plurality of pixel values of the third training map wm 1-2 , and a plurality of pixel values of the third training map wm 1-2 and a pluralit
  • the embodiment of the present application also provides a video frame insertion device 3, including: an acquisition module 31, configured to acquire two adjacent video frames in the video, and the two video frames include the previous video frame I1 and the latter video frame I 3 ; the acquisition module 31 is also used to calculate the optical flow between the two video frames; the acquisition module 31 is also used to transform the optical flow between the two video frames based on the preset ratio to the initial Optical flow; the acquisition module 31 is also used to map the two video frames through the initial optical flow to obtain an initial map; the correction module 32 is used to modify the optical flow between the two video frames based on the initial map to obtain The corrected optical flow; the frame interpolation module 33, configured to obtain a target interpolation frame between two video frames according to the corrected optical flow.
  • the video frame insertion device can apply the video frame insertion method in any of the above embodiments, and the specific process and principle will not be repeated here.
  • calculating the optical flow between two video frames includes: calculating the optical flow between the two video frames based on a computer vision algorithm; calculating the optical flow between the two video frames based on the initial map
  • the correction includes correcting the optical flow between two video frames based on the neural network with the initial map as input.
  • the optical flow between two video frames is corrected based on the initial map
  • the process of obtaining the corrected optical flow includes: inputting the two video frames, the initial optical flow, and the initial map into
  • the optical flow correction neural network is used to correct the initial optical flow through the optical flow correction neural network to obtain the corrected optical flow output by the optical flow correction neural network.
  • transforming the optical flow between two video frames into an initial optical flow based on a preset ratio includes: transforming the first reverse optical flow F 3-1 into a first initial optical flow based on a preset ratio Optical flow FCV 2-1 , the first reverse optical flow F 3-1 belongs to the optical flow between two video frames, the first reverse optical flow F 3-1 is from the previous video frame I 1 to the next video frame The reverse optical flow of I 3 ; transform the second reverse optical flow F 1-3 into the second initial optical flow FCV 2-3 based on a preset ratio, and the second reverse optical flow F 1-3 belongs to two video frames Between the optical flow, the second reverse optical flow F 1-3 is the reverse optical flow from the next video frame I 3 to the previous video frame I 1 ; two video frames are mapped through the initial optical flow to obtain the initial mapping
  • the figure includes: the first map WF 1-2 is obtained by mapping the previous video frame I 1 through the first initial optical flow FCV 2-1 , and the first map WF 1-2 belongs to the initial map; through the second
  • obtaining the target frame interpolation between two video frames according to the corrected optical flow includes: performing mapping on the two video frames by the corrected optical flow to obtain a corrected map; combining the two video frames , the corrected optical flow, and the corrected map are input to the fusion neural network to obtain a fusion parameter map output by the fusion neural network; based on the fusion parameter map, the third map and the fourth map are fused and calculated to obtain the target interpolation frame.
  • the target frame interpolation packet between two video frames is obtained according to the corrected optical flow Including: through the third reverse optical flow FCVU 2-1 to map the previous video frame I 1 to obtain the third map WM 1-2 ; through the fourth reverse optical flow FCVU 2-3 to the next video frame I 3 Perform mapping to obtain the fourth map WM 3-2 ; the previous video frame I 1 , the next video frame I 3 , the third reverse optical flow FCVU 2-1 , the fourth reverse optical flow FCVU 2-3 , the second
  • the three maps WM 1-2 and the fourth map WM 3-2 are input to the fusion neural network to obtain the fusion parameter map m output by the fusion neural network; based on the fusion parameter map m, the third map WM 1-2 and the fourth map WM
  • the four-map WM 3-2 is fused and calculated to obtain the target interpolation frame IN 2 .
  • the process of obtaining the target frame interpolation includes: multiplying the third map and the fusion parameter map to obtain the first A fusion map, wherein the multiple pixel values of the fusion parameter map correspond to the multiple pixel values of the third mapping map, the range of each pixel value of the fusion parameter map is 0 to 1, and the multiple pixel values of the first fusion map There is a one-to-one correspondence with the multiple pixel values of the third map, and the multiple product values obtained by multiplying the multiple pixel values of the third map with the multiple pixel values of the fusion parameter map are respectively the values of the first fusion map multiple pixel values; subtract 1 from the fusion parameter map to obtain the difference fusion parameter map, the multiple pixel values of the difference fusion parameter map correspond to the multiple pixel values of the fusion parameter map one by one, and 1 corresponds to the multiple pixel values of the fusion parameter map A plurality of difference values obtained after
  • the multiple product values obtained by multiplying the multiple pixel values and the multiple pixel values of the difference fusion parameter map in one-to-one correspondence are respectively the multiple pixel values of the second fusion map; adding the first fusion map and the second fusion map To obtain the target interpolation frame, the multiple values obtained by one-to-one correspondence addition of the multiple pixel values of the first fused image and the multiple pixel values of the second fused image are respectively multiple pixel values of the target interpolated frame.
  • transforming the first reverse optical flow into the first initial optical flow based on a preset ratio includes: multiplying the first reverse optical flow by a preset ratio value to obtain the first initial optical flow,
  • the preset ratio value ranges from 0.4 to 0.6;
  • converting the second reverse optical flow to the second initial optical flow based on the preset ratio includes: multiplying the second reverse optical flow by the preset ratio value to obtain the second initial optical flow flow.
  • the preset ratio is 0.5.
  • the embodiment of the present application also provides a neural network training device 4, including: an acquisition module 41, used to: acquire a set of training data, a set of training data includes three consecutive video frames, three consecutive The video frames of are sequentially the first training video frame i 1 , the second training video frame i 2 and the third training video frame i 3 ; obtain the first reference reverse optical flow fg 2-1 , the first reference reverse optical flow fg 2-1 is the reference reverse optical flow from the first training video frame i 1 to the second training video frame i 2 ; obtain the second reference reverse optical flow fg 2-3 , the second reference reverse optical flow fg 2- 3 is the reference reverse optical flow from the third training video frame i 3 to the second training video frame i 2 ; calculate the first training reverse optical flow f 3-1 , the first training reverse optical flow f 3-1 is The reverse optical flow from the first training video frame i 1 to the third training video frame i 3 ; calculate the second training reverse optical flow f 1-3 , the second training reverse optical
  • the third training reverse optical flow fcvu 2-1 is the corrected reverse optical flow from the first training video frame i 1 to the second training video frame i 2
  • the fourth The training reverse optical flow fcvu 2-3 is the reverse optical flow from the third training video frame i 3 to the second training video frame i 2 after correction
  • the frame interpolation module 43 is used for: training the reverse light through the third The flow fcvu 2-1 maps the first training video frame i 1 to obtain the third training map wm 1-2
  • the fourth training reverse optical flow fcvu 2-3 maps the third training video frame i 3 to obtain the third training map wm 1-2
  • the third training map wm 1-2 and the fourth training map wm 3-2 are input into the third training map wm 1-2 and
  • calculating the first training reverse optical flow f 3-1 includes: calculating the first training reverse optical flow f 3-1 based on a computer vision algorithm; calculating from the second training reverse optical flow f 1-3 includes: calculating the second training reverse optical flow f 1-3 based on a computer vision algorithm.
  • the process of obtaining the target frame interpolation includes: multiplying the third training map and the fusion parameter map
  • the first fusion map is obtained, wherein the multiple pixel values of the fusion parameter map correspond one-to-one to the multiple pixel values of the third training map, the range of each pixel value of the fusion parameter map is 0-1, and the multi-pixel values of the first fusion map pixel values correspond one-to-one to multiple pixel values of the third training map, and the multiple product values obtained by multiplying the multiple pixel values of the third training map and the fusion parameter map in one-to-one correspondence are respectively
  • a plurality of pixel values of the first fusion map; 1 is subtracted from the fusion parameter map to obtain a difference fusion parameter map, and a plurality of pixel values of the difference fusion parameter map correspond to a plurality of pixel values of the fusion parameter map, and 1 and The plurality of difference values
  • transforming the first training reverse optical flow into the first initial training optical flow based on a preset ratio includes: multiplying the first training reverse optical flow by a preset ratio value to obtain the first initial training optical flow
  • the preset ratio value ranges from 0.4 to 0.6
  • converting the second initial training optical flow to the second initial training optical flow based on the preset ratio includes: multiplying the second initial training optical flow by the preset ratio value, Obtain the second initial training optical flow.
  • the preset ratio is 0.5.
  • the above division of the video frame inserting device or the neural network training device is only a division of logical functions, which can be fully or partially integrated into a physical entity or physically separated in actual implementation.
  • these modules can be realized in the form of software calling through processing elements; they can also be realized in the form of hardware; some modules can also be realized in the form of software calling through processing elements, and some modules can be realized in the form of hardware implementation.
  • any one of the acquisition module, correction module, and frame insertion module can be a separate processing element, or can be integrated in the video frame insertion device, for example, integrated in a certain chip of the video frame insertion device.
  • each step of the above method or each module above can be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
  • the above-mentioned video frame insertion device and neural network training device may be the same device or different devices.
  • the video frame inserting device or the neural network training device may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (Application Specific Integrated Circuit, ASIC), or, one or more A microprocessor (digital signal processor, DSP), or, one or more field programmable gate arrays (Field Programmable Gate Array, FPGA), etc.
  • ASIC Application Specific Integrated Circuit
  • DSP digital signal processor
  • FPGA Field Programmable Gate Array
  • the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processors that can call programs.
  • CPU central processing unit
  • these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • the embodiment of the present application also provides an electronic device, including: a processor 51 and a memory 52, the memory 52 is used to store at least one instruction, and when the instruction is loaded and executed by the processor 51, any of the above-mentioned embodiments can be realized Methods in , including video frame interpolation methods or neural network training methods for video frame interpolation.
  • video frame interpolation methods or neural network training method for video frame interpolation are the same as those in the foregoing embodiments, and will not be repeated here.
  • the number of processors 51 may be one or more, and the processors 51 and memory 52 may be connected through a bus 53 or in other ways.
  • the memory 52 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules corresponding to the data processing device in the embodiment of the present application.
  • the processor executes various functional applications and data processing by running non-transitory software programs, instructions and modules stored in the memory, that is, implements the method in any of the above method embodiments.
  • the memory may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function; and necessary data and the like.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the electronic device may be, for example, a server, a computer, a mobile phone and other electronic products.
  • An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is run on a computer, the computer is made to execute the method in any of the foregoing embodiments.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the present application will be generated in whole or in part.
  • the computer can be a Use computers, special purpose computers, computer networks, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, optical fiber, DSL) or wireless (eg, infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a Solid State Disk).
  • "at least one” means one or more, and “multiple” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three kinds of relationships, for example, A and/or B may indicate that A exists alone, A and B exist simultaneously, or B exists alone. Among them, A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • “At least one of the following” and similar expressions refer to any combination of these items, including any combination of single items or plural items.
  • At least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, and c may be single or multiple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Systems (AREA)

Abstract

本申请实施例提供一种视频插帧方法、训练方法、装置和电子设备,涉及图像处理技术领域,可以提高插帧结果精度。视频插帧方法包括:获取视频中相邻的两个视频帧,两个视频帧包括前一个视频帧和后一个视频帧;计算两个视频帧之间的光流;将两个视频帧之间的光流基于预设比例变换为初始光流;通过初始光流对两个视频帧进行映射得到初始映射图;基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流;根据修正后光流得到两个视频帧之间的目标插帧。

Description

视频插帧方法、训练方法、装置和电子设备 技术领域
本申请涉及图像处理技术领域,特别涉及一种视频插帧方法、训练方法、装置和电子设备。
背景技术
视频插帧(video frame interpolation)是指利用算法生成视频中间帧,用于提高视频帧率或生成慢动作特效视频。然而,传统的视频插帧方法所得到的插帧结果精度较低。
发明内容
一种视频插帧方法、训练方法、装置和电子设备,可以提高插帧结果精度。
第一方面,提供一种视频插帧方法,包括:获取视频中相邻的两个视频帧,两个视频帧包括前一个视频帧和后一个视频帧;计算两个视频帧之间的光流;将两个视频帧之间的光流基于预设比例变换为初始光流;通过初始光流对两个视频帧进行映射得到初始映射图;基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流;根据修正后光流得到两个视频帧之间的目标插帧。
第二方面,提供一种用于视频插帧的神经网络训练方法,包括:获取一组训练数据,一组训练数据包括三个连续的视频帧,三个连续的视频帧依次为第一训练视频帧、第二训练视频帧和第三训练视频帧;获取第一参考反向光流,第一参考反向光流为从第一训练视频帧到第二训练视频帧的反向光流;获取第二参考反向光流,第二参考反向光流为从第三训练视频帧到第二训练视频帧的反向光流;计算第一训练反向光流,第一训练反向光流为从第一训练视频帧到第三训练视频帧的反向光流;计算第二训练反向光流,第二训练反向光流为从第三训练视频帧到第一训练视频帧的反向光流;将第一训练反向光流基于预设比例变换为第一初始训练光流;将第二训练反向光流基于预设比例变换为第二初始训练光流;通过第一初始训练光流对第一训练视频帧进行映射得到第一训练映射图;通过第二初始训练光流对第三训练视频帧进行映射得到第二训练映射图;将第一训练视频帧、第三训练视频帧、第一初始训练光流、第二初始训练光流、第一训练映射图和第二训练映射图输入至光流修正神经网络,得到光流修正神经网络输出的第三训练反向光流和第四训练反向光流,第三训练反向光流为修正后的从第一训练视频帧到第二训练视频帧的反向光流,第四训练反向光流为修正后的从第三训练视频帧到第二训练视频帧的反向光流;通过第三训练反向光流对第一训练视频帧进行映射得到第三训练映射图;通过第四训练反向光流对第三训练视频帧进行映射得到第四训练映射图;将第一训练视频帧、第三训练视频帧、第三训练反向光流、第四训练反向光流、第三训练映射图和第四训练映射图输入融合神经网络,得到融合神经网络输出的融合参数图;基于 融合参数图,将第三训练映射图和第四训练映射图进行融合计算,得到目标插帧;基于目标插帧与第二训练视频帧之间的差异、第三训练反向光流与第一参考反向光流之间的差异、第四训练反向光流与第二参考反向光流之间的差异调整光流修正神经网络以及融合神经网络的网络参数。
第三方面,提供一种视频插帧装置,包括:获取模块,用于获取视频中相邻的两个视频帧;获取模块还用于,计算两个视频帧之间的光流;获取模块还用于,将两个视频帧之间的光流基于预设比例变换为初始光流;获取模块还用于,通过初始光流对两个视频帧进行映射得到初始映射图;修正模块,用于基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流;插帧模块,用于根据修正后光流得到两个视频帧之间的目标插帧。
第四方面,提供一种电子设备,包括:处理器和存储器,存储器用于存储至少一条指令,指令由处理器加载并执行时以实现上述的方法。
第五方面,提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述的方法。
本申请实施例的视频插帧方法、训练方法、装置和电子设备,先计算视频中相邻的两个视频帧之间的光流,再对光流进行修正,然后基于修正后光流得到插帧,其中,光流是指空间运动物体在观察成像平面上的像素运动的瞬时速度,光流包含了目标运动的信息,表达了图像的变化,因此,利用视频中相邻两个视频帧之间的光流,可以得到两个视频帧之间的插帧;另外,根据比例将光流进行转换,可以得到两个视频帧之间位置所对应的初始光流,并根据转换后的初始光流对视频帧进行映射,得到对应两个视频帧之间所对应位置的初始映射图,基于初始映射图对对光流进行修正,可以使光流更加准确地反映两个视频帧之间的变化,从而提高了插帧结果的精度。
附图说明
图1为本申请实施例中一种视频插帧方法的流程示意图;
图2为本申请实施例中另一种视频插帧方法的流程示意图;
图3为本申请实施例中另一种视频插帧方法的流程示意图;
图4为本申请实施例中一种光流修正神经网络的模型结构示意图;
图5为本申请实施例中一种融合神经网络的模型结构示意图;
图6为本申请实施例中一种视频插帧装置的结构框图;
图7为本申请实施例中一种神经网络训练装置的结构框图;
图8为本申请实施例中一种电子设备的结构框图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
需要说明的是,附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分 解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
如图1所示,本申请实施例提供一种视频插帧方法,包括:
步骤101、获取视频中相邻的两个视频帧;
其中,视频为待插帧的视频,两个视频帧可以为任意相邻的两个视频帧,两个视频帧包括前一个视频帧I1和后一个视频帧I3
步骤102、计算两个视频帧之间的光流;
步骤103、将两个视频帧之间的光流基于预设比例变换为初始光流,两个视频帧之间的光流是基于两个视频帧计算得到的,这里根据预设比例,可以将光流转换为两个视频帧之间预设位置的光流,即初始光流;
步骤104、通过初始光流对两个视频帧进行映射得到初始映射图;
步骤105、基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流。
步骤106、根据修正后光流得到两个视频帧之间的目标插帧。
在经过步骤106得到这两个视频帧之间的目标插帧之后,可以继续按照步骤101至步骤106的过程获取另外的两个视频帧之间的目标插帧,例如,在获取到视频中第一帧和第二帧之间的目标插帧之后,可以循环该方法在预设帧间隔后继续获取之后相邻两帧之间的目标插帧,依次类推,可以实现整个视频的插帧。
本申请实施例的视频插帧方法,先计算视频中相邻的两个视频帧之间的光流,再对光流进行修正,然后基于修正后光流得到插帧,其中,光流(optical flow)是指空间运动物体在观察成像平面上的像素运动的瞬时速度,光流包含了目标运动的信息,表达了图像的变化,因此,利用视频中相邻两个视频帧之间的光流,可以得到两个视频帧之间的插帧;另外,根据比例将光流进行转换,可以得到两个视频帧之间位置所对应的初始光流,并根据转换后的初始光流对视频帧进行映射,得到对应两个视频帧之间所对应位置的初始映射图,基于初始映射图对对光流进行修正,可以使光流更加准确地反映两个视频帧之间的变化,从而提高了插帧结果的精度。
在一种可能的实施方式中,步骤102、计算两个视频帧之间的光流包括:基于计算机视觉算法计算两个视频帧之间的光流,这里的计算机视觉算法是指传统的图像处理方法,而非基于神经网络预测的方法;步骤105、基于初始映射图对两个视频帧之间的光流进行修正包括:基于神经网络,将初始映射图作为输入对两个视频帧之间的光流进行修正。在步骤105中,基于预先训练好的神经网络对在步骤102中计算得到的光流进行修正。在该步骤中,由于已经通过计算机视觉算法计算得到了一个大致准确的光流,该神经网络只需要对该光流进行修正,因此该神经网络的计算量较小。
传统的视频插帧方法是通过计算机视觉算法计算光流,然后再利用计算得到的光流进行光流映射得到目标插帧。但是基于这种方式得到的光流进行插帧,插帧结果的精度较低,因此为了提高精度,可以利用神经网络来预测得到光流,进而得到目标插帧的方法,但是这种方法的计算量大。
本申请实施例的视频插帧方法,先基于计算机视觉算法计算光流,再基于神经 网络对光流进行修正,然后基于修正后光流得到插帧,由于借助神经网络的预测来对光流进行修正,因此基于这种方式得到的光流进行插帧,插帧结果的精度较高,例如可以减少物体轮廓边缘伪影,提高慢动作视频下的用户体验;又由于该神经网络只需要对已经得到的光流进行修正,因此降低了神经网络的计算量。即在提高插帧结果精度的前提下降低了计算量。
在一种可能的实施方式中,如图2所示,步骤105、基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流的过程包括:将两个视频帧、初始光流和初始映射图输入至光流修正神经网络,通过光流修正神经网络对初始光流进行修正,得到光流修正神经网络输出的修正后光流。
在一种可能的实施方式中,步骤106、根据修正后光流得到两个视频帧之间的目标插帧包括:
步骤1061、通过修正后光流对两个视频帧进行映射得到修正后映射图;
步骤1062、将两个视频帧、修正后光流以及修正后映射图输入至融合神经网络,得到融合神经网络输出的融合参数图;
步骤1063、基于融合参数图,将修正后映射图进行融合计算,得到目标插帧。
在一种可能的实施方式中,如图3所示,两个视频帧之间的光流包括第一反向光流F3-1和第二反向光流F1-3,第一反向光流F3-1为从前一个视频帧I1到后一个视频帧I3的反向光流,第二反向光流F1-3为从后一个视频帧I3到前一个视频帧I1的反向光流,即步骤101为获取视频中相邻的前一个视频帧I1和后一个视频帧I3。即步骤102包括:
步骤1021、基于计算机视觉算法计算第一反向光流F3-1,即从前一个视频帧I1到后一个视频帧I3的反向光流;
步骤1022、基于计算机视觉算法计算第二反向光流F1-3,即从后一个视频帧I3到前一个视频帧I1的反向光流。
其中,反向光流也称后向光流,本申请实施例中的光流可以表达为光流图,例如对于A和B两帧图像,光流图和A图以及B图的分辨率完全一致,光流图记载了一张图上每个像素点的“偏移量”,这里的“偏移量”有两个方向,一个是左右方向的偏移量x,一个是上下方向的偏移量y,这个偏移量的数值可以简单理解为将要移动的距离(像素数),“将光流应用到A图上”,或者说“通过光流对A图进行映射”都是指A图上的每个像素点按照其在光流图上对应位置上的偏移数值(上下方向+左右方向)做一个移位操作,在这个光流映射完成之后,会得到一个新的图像,称为映射图。从A图到B图计算出来的光流,对于A图来说,是A图的前向光流,对B图来说,就是B图的反向光流。因此对于A和B两帧图像,需要从A图通过前向光流进行映射或通过反向光流进行映射得到B图,那么前向光流是指从A图到B图计算得到的光流,后向光流/反向光流是指从B图到A图计算得到的光流。
步骤103、将两个视频帧之间的光流基于预设比例变换为初始光流包括:
步骤1031、将第一反向光流F3-1基于预设比例变换为第一初始光流FCV2-1,第一初始光流FCV2-1作为从前一个视频帧I1到目标插帧IN2的反向光流,由于目标插帧IN2位于I1和I3两个视频帧之间的位置,因此通过基于预设比例的变换,可以近 似得到两个视频帧之间的光流,例如设置预设比例为0.5,通过使F3-1×0.5,可以近似得到两个视频帧二分之一处中间帧的光流;
步骤1032、将第二反向光流F1-3基于预设比例变换为第二初始光流FCV2-3,第二初始光流FCV2-3作为从后一个视频帧I3到目标插帧IN2的反向光流;
步骤104、将两个视频帧经过初始光流映射得到初始映射图包括:
步骤1041、通过第一初始光流FCV2-1对前一个视频帧I1进行映射得到第一映射图WF1-2
其中,即在I1图像上使用第一初始光流FCV2-1做光流映射backward warp,通过映射可以得到映射图WarpMask或者称为光流映射图WarpFlow,即第一映射图WF1-2。
步骤1042、通过第二初始光流FCV2-3对后一个视频帧I3进行映射得到第二映射图WF3-2,即上述步骤1052中的初始映射图包括第一映射图WF1-2和第二映射图WF3-2
步骤105、基于两个视频帧、初始光流和初始映射图,通过光流修正神经网络对初始光流进行修正,得到修正后光流的过程包括:
将前一个视频帧I1、后一个视频帧I3、第一初始光流FCV2-1、第二初始光流FCV2-3、第一映射图WF1-2和第二映射图WF3-2输入至光流修正神经网络,得到光流修正神经网络输出的第三反向光流FCVU2-1和第四反向光流FCVU2-3,第三反向光流FCVU2-1为修正后的从前一个视频帧I1到目标插帧IN2的反向光流,第四反向光流FCVU2-3为修正后的从后一个视频帧I3到目标插帧IN2的反向光流,即第三反向光流FCVU2-1和第四反向光流FCVU2-3属于步骤105中修正后光流。
其中,光流修正神经网络的神经网络模型结构可以如图4所示,该神经网络模型可以包括卷积Conv+激活函数Relu下采样模块、卷积Conv+激活函数Relu特征提取模块、反卷积ConvTranspose+激活函数Relu上采样模块。其中,该神经网络模型输入的是上述的I1、I3、FCV2-1、FCV2-3、WF1-2和WF3-2;下采样模块用于把输入的尺寸缩小,从而加快预测推理的速度,同时也用于提取网络特征;特征提取模块用于网络内部特征的提取和转换,提取的是卷积网络中经过卷积层运算后的特征,这些特征有可能是帧图片中边缘、轮廓、明暗等这些特征在网络内的表示;上采样模块用于把缩小的特征再重新放大回原始的输入大小;该神经网络模型输出的是第三反向光流FCVU2-1和第四反向光流FCVU2-3,即修正后的从前一个视频帧I1到目标插帧IN2的反向光流以及修正后的从后一个视频帧I3到目标插帧IN2的反向光流。也就是说,该神经网络用于将第一初始光流FCV2-1修正为第三反向光流FCVU2-1,将第二初始光流FCV2-3修正为第四反向光流FCVU2-3。图中相关的模块是指模块的复用,例如该神经网络模型中,复用了同样的特征提取模块,降低了网络结构的复杂度,增强了网络特征提取的表征能力。该神经网络模型的训练过程会在后续内容中说明。
在一种可能的实施方式中,如图3所示,步骤1061、根据修正后光流得到两个视频帧之间的目标插帧包括:
步骤10611、通过第三反向光流FCVU2-1对前一个视频帧I1进行映射得到第三映 射图WM1-2
步骤10612、通过第四反向光流FCVU2-3对后一个视频帧I3进行映射得到第四映射图WM3-2
步骤1062、将前一个视频帧I1、后一个视频帧I3、第三反向光流FCVU2-1、第四反向光流FCVU2-3、第三映射图WM1-2和第四映射图WM3-2输入融合神经网络,得到融合神经网络输出的融合参数图m;
其中,融合神经网络的神经网络模型结构可以如图5所示,该神经网络模型可以包括卷积Conv+激活函数Relu下采样模块和反卷积ConvTranspose+激活函数Relu上采样模块。其中,该神经网络模型输入的是上述的I1、I3、FCVU2-1、FCVU2-3、WM1-2和WM3-2;该神经网络模型输出的是融合参数图m,融合参数图m用于在后续过程中参与计算得到目标插帧IN2。该神经网络模型的训练过程会在后续内容中说明。
步骤1063、基于融合参数图m、将第三映射图WM1-2和第四映射图WM3-2进行融合计算,得到目标插帧IN2
在一种可能的实施方式中,步骤1063、基于融合参数图m、将第三映射图WM1- 2和第四映射图WM3-2进行融合计算,得到目标插帧IN2的过程包括:将第三映射图WM1-2和融合参数图m中各像素值分别对应相乘得到第一融合图WM1-2×m,其中,融合参数图m的分辨率与任意视频帧的分辨率相同,融合参数图m的多个像素值与第三映射图WM1-2的多个像素值一一对应,融合参数图m的每个像素值范围为0~1,第一融合图的多个像素值与第三映射图WM1-2的多个像素值一一对应,第三映射图WM1-2的多个像素值和融合参数图m的多个像素值一一对应相乘得到的多个乘积值分别为第一融合图WM1-2×m的多个像素值;将1与融合参数图m相减得到差值融合参数图(1-m),差值融合参数图(1-m)的多个像素值与融合参数图m的多个像素值一一对应,1与融合参数图m的多个像素值分别相减后得到的多个差值分别为差值融合参数图(1-m)的多个像素值;将第四映射图WM3-2和差值融合参数图(1-m)相乘得到第二融合图WM3-2×(1-m),第四映射图WM3-2的多个像素值和差值融合参数图(1-m)的多个像素值一一对应相乘得到的多个乘积值分别为第二融合图的多个像素值;将第一融合图WM1-2×m和第二融合图WM3-2×(1-m)相加得到目标插帧IN2,第一融合图的多个像素值和第二融合图的多个像素值一一对应相加得到的多个值分别为目标插帧IN2的多个像素值,用公式表示目标插帧IN2=WM1-2×m+WM3-2×(1-m),可见,第三映射图WM1-2和第四映射图WM3-2用于基于融合参数图m进行融合计算,以得到目标插帧IN2。目标插帧IN2的就算公式实际上是将第三映射图WM1-2和融合参数图m逐点相乘得到一个中间结果,将第四映射图WM3-2和(1与融合参数图m逐点相减后的结果)逐点相乘得到另一个中间结果,再将两个中间结果逐点相加。例如下方表1示意了目标插帧IN2、第三映射图WM1-2、第四映射图WM3-2和融合参数图m的对照表。
表1

假设第三映射图WM1-2、第四映射图WM3-2和融合参数图m均为2×2分辨率的图像,表1中的数值为像素值。在三个示例中,第三映射图WM1-2的像素值均为2,第四映射图WM3-2的像素值均为4。区别在于,在示例1中,融合参数图m的像素值均为0,根据公式WM1-2×m+WM3-2×(1-m)计算每个像素值后得到的目标插帧IN2的每个像素值均为4,其中,目标插帧IN2的每个像素值=2×0+4×(1-0)=4。在示例2中,融合参数图m的像素值均为1,根据公式WM1-2×m+WM3-2×(1-m)计算每个像素值后得到的目标插帧IN2的每个像素值均为2,其中,目标插帧IN2的每个像素值=2×1+4×(1-1)=2。在示例3中,融合参数图m的像素值均为0.5,根据公式WM1-2×m+WM3-2×(1-m)计算每个像素值后得到的目标插帧IN2的每个像素值均为3,其中,目标插帧IN2的每个像素值=2×0.5+4×(1-0.5)=3。
在一种可能的实施方式中,步骤1031、将第一反向光流F3-1基于预设比例变换为第一初始光流FCV2-1包括:将第一反向光流F3-1乘以预设比例值t,得到第一初始光流FCV2-1,即基于公式FCV2-1=t×F3-1将F3-1变换FCV2-1,预设比例值的范围为0.4~0.6;步骤1032、将第二反向光流F1-3基于预设比例变换为第二初始光流FCV2-3包括:将第二反向光流F1-3乘以预设比例值t,得到第二初始光流FCV2-3,即基于公式FCV2-3=t×F1-3将F1-3变换为FCV2-3。即根据预设比例对光流进行变换,可以得到两个视频帧对应位置处中间帧的光流,以利于后续计算过程中基于该光流来确定目标插帧IN2。其中预设比例值t可以为0.5,如果预设比例值t为0.5,则变换后所得到的光流为两个视频帧之间二分之一处的光流。
本申请实施例还提供一种用于视频插帧的神经网络训练方法,该神经网络训练方法可以用于训练上述的光流修正神经网络和融合神经网络,在进行神经网络训练之前,可以预先从涵盖多种场景和运动形态的1000个视频片段中提取出约100000组数据作为训练数据,例如从每个视频片段中取100组数据,1000个视频共可提取100000组训练数据,每组训练数据包括三个连续的视频帧,所有的数据通过剪裁或缩放方法归一化到统一的分辨率尺寸,例如统一为768×768分辨率。神经网络训练方法包括:
步骤201、获取一组训练数据,一组训练数据包括三个连续的视频帧,三个连续的视频帧依次为第一训练视频帧i1、第二训练视频帧i2和第三训练视频帧i3,在该步骤中,可以从训练数据中随机选取一组训练数据;
步骤202、获取第一参考反向光流fg2-1,第一参考反向光流fg2-1为从第一训练视频帧i1到第二训练视频帧i2的反向光流;
步骤203、获取第二参考反向光流fg2-3,第二参考反向光流fg2-3为从第三训练视频帧i3到第二训练视频帧i2的反向光流;
在步骤202和步骤203中,第一参考反向光流fg2-1和第二参考反向光流fg2-3可 以通过第三方的最先进的光流获取方法来获取,fg2-1和fg2-3用于作为参考光流,便于后续与神经网络输出的结果之间比较差异,进而对网络参数进行调整。
步骤204、计算第一训练反向光流f3-1,第一训练反向光流f3-1为从第一训练视频帧i1到第三训练视频帧i3的反向光流;
步骤205、计算第二训练反向光流f1-3,第二训练反向光流f1-3为从第三训练视频帧i3到第一训练视频帧i1的反向光流;
步骤206、将第一训练反向光流f3-1基于预设比例变换为第一初始训练光流fcv2- 1,第一初始训练光流fcv2-1作为从第一训练视频帧i1到第二训练视频帧i2的反向光流;
例如,步骤206、将第一训练反向光流f3-1基于预设比例变换为第一初始训练光流fcv2-1包括:基于公式fcv2-1=t×f3-1将f3-1变换fcv2-1,t=0.5;
步骤207、将第二训练反向光流f1-3基于预设比例变换为第二初始训练光流fcv2- 3,第二初始训练光流fcv2-3作为从第三训练视频帧i3到第二训练视频帧i2的反向光流fcv2-3
例如,步骤207、将从第二训练反向光流f1-3基于预设比例变换第二初始训练光流为fcv2-3包括:基于公式fcv2-3=t×f1-3将f1-3变换为fcv2-3,t=0.5。
步骤208、通过第一初始训练光流fcv2-1对第一训练视频帧i1进行映射得到第一训练映射图wf1-2
步骤209、通过第二初始训练光流fcv2-3对第三训练视频帧i3进行映射得到第二训练映射图wf3-2
步骤210、将第一训练视频帧i1、第三训练视频帧i3、第一初始训练光流fcv2- 1、第二初始训练光流fcv2-3、第一训练映射图wf1-2和第二训练映射图wf3-2输入至光流修正神经网络,得到光流修正神经网络输出的第三训练反向光流fcvu2-1和第四训练反向光流fcvu2-3,第三训练反向光流fcvu2-1为修正后的从第一训练视频帧i1到第二训练视频帧i2的反向光流,第四训练反向光流fcvu2-3为修正后的从第三训练视频帧i3到第二训练视频帧i2的反向光流;
步骤211、通过第三训练反向光流fcvu2-1对第一训练视频帧i1进行映射得到第三训练映射图wm1-2
步骤212、通过第四训练反向光流fcvu2-3对第三训练视频帧i3进行映射得到第四训练映射图wm3-2
步骤213、将第一训练视频帧i1、第三训练视频帧i3、第三训练反向光流fcvu2- 1、第四训练反向光流fcvu2-3、第一训练映射图wm1-2和第二训练映射图wm3-2输入融合神经网络,得到融合神经网络输出的融合参数图m;
步骤214、基于融合参数图m,将第三训练映射图wm1-2和第四训练映射图wm3- 2进行融合计算,得到目标插帧in2
例如,融合参数图m的分辨率与任意视频帧的分辨率相同,融合参数图m的每个像素值范围为0~1,目标插帧in2=wm1-2×m+wm3-2×(1-m)。
步骤215、基于目标插帧in2与第二训练视频帧i2之间的差异、第三训练反向光流fcvu2-1与第一参考反向光流fg2-1之间的差异、第四训练反向光流fcvu2-3与第二参 考反向光流fg2-3之间的差异调整光流修正神经网络以及融合神经网络的网络参数。
其中,在神经网络训练的过程中,第二训练视频帧i2是已知的,而目标插帧in2是基于神经网络预测得到的,因此,可以基于in2与i2之间的差异来调整网络参数,以使神经网络的预测更加准确,类似的原因,可以基于fcvu2-1与fg2-1之间的差异以及fcvu2-3与fg2-3之间的差异来调整网络参数。从上述步骤201至步骤515是一轮训练的过程,神经网络可以基于训练数据进行多轮训练。在步骤215中,具体例如计算in2与i2之间的L1损失L1loss、fcvu2-1与fg2-1之间L1loss、fcvu2-3与fg2-3之间的L1loss,并反向传播迭代至光流修正神经网络以及融合神经网络收敛,即在多轮的网络训练过程中,根据这些L1loss调整光流修正神经网络和融合神经网络的网络参数,使网络参数不断优化,直到L1loss不再下降,表示网络训练完成,此时神经网络的预测效果最好。在网络训练完成之后,即可以基于训练好的光流修正神经网络和融合神经网络,利用上述的视频插帧方法来实现视频插帧。
在一种可能的实施方式中,步骤204、计算第一训练反向光流f3-1包括:基于计算机视觉算法计算第一训练反向光流f3-1;步骤205、计算第三训练反向光流f1-3包括:基于计算机视觉算法计算第三训练反向光流f1-3
在一种可能的实施方式中,步骤214、基于融合参数图m,将第三训练映射图wm1-2和第四训练映射图wm3-2进行融合计算,得到目标插帧in2的过程包括:将第三训练映射图wm1-2和融合参数图m中各像素值分别对应相乘得到第一融合图wm1-2×m,其中,融合参数图m的分辨率与任意视频帧的分辨率相同,融合参数图m的多个像素值与第三训练映射图wm1-2的多个像素值一一对应,融合参数图m的每个像素值范围为0~1,第一融合图m的多个像素值与第三训练映射图wm1-2的多个像素值一一对应,第三训练映射图wm1-2的多个像素值和融合参数图m的多个像素值一一对应相乘得到的多个乘积值分别为第一融合图的多个像素值;将1与融合参数图m相减得到差值融合参数图(1-m),差值融合参数图的多个像素值与融合参数图m的多个像素值一一对应,1与融合参数图m的多个像素值分别相减后得到的多个差值分别为差值融合参数图的多个像素值;将第四训练映射图wm3-2和差值融合参数图(1-m)相乘得到第二融合图wm3-2×(1-m),第四训练映射图wm3-2的多个像素值和差值融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为第二融合图的多个像素值;将第一融合图wm1-2×m和第二融合图wm3-2×(1-m)相加得到目标插帧in2,第一融合图的多个像素值和第二融合图的多个像素值一一对应相加得到的多个值分别为目标插帧in2的多个像素值,用公式表示目标插帧in2=wm1-2×m+wm3-2×(1-m)。
在一种可能的实施方式中,步骤206、将第一训练反向光流f3-1基于预设比例变换为第一初始训练光流fcv2-1包括:将第一训练反向光流f3-1乘以预设比例值t,得到第一初始训练光流fcv2-1,即基于公式fcv2-1=t×f3-1将f3-1变换fcv2-1,预设比例值的范围为0.4~0.6;
步骤207、将第二训练反向光流f1-3基于预设比例变换为第二初始训练光流fcv2-3包括:将从第二训练反向光流f1-3乘以预设比例值t,得到第二初始训练光流fcv2-3,即基于公式fcv2-3=t×f1-3将f1-3变换为fcv2-3,其中预设比例值t可以为0.5。
如图6所示,本申请实施例还提供一种视频插帧装置3,包括:获取模块31,用于获取视频中相邻的两个视频帧,两个视频帧包括前一个视频帧I1和后一个视频帧I3;获取模块31还用于,计算两个视频帧之间的光流;获取模块31还用于,将两个视频帧之间的光流基于预设比例变换为初始光流;获取模块31还用于,通过初始光流对两个视频帧进行映射得到初始映射图;修正模块32,用于基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流;插帧模块33,用于根据修正后光流得到两个视频帧之间的目标插帧。该视频插帧装置可以应用上述任意实施例中的视频插帧方法,具体过程和原理在此不再赘述。
在一种可能的实施方式中,计算两个视频帧之间的光流包括:基于计算机视觉算法计算两个视频帧之间的光流;基于初始映射图对两个视频帧之间的光流进行修正包括:基于神经网络,将初始映射图作为输入对两个视频帧之间的光流进行修正。
在一种可能的实施方式中,基于初始映射图对两个视频帧之间的光流进行修正,得到修正后光流的过程包括:将两个视频帧、初始光流和初始映射图输入至光流修正神经网络,通过光流修正神经网络对初始光流进行修正,得到光流修正神经网络输出的修正后光流。
在一种可能的实施方式中,将两个视频帧之间的光流基于预设比例变换为初始光流包括:将第一反向光流F3-1基于预设比例变换为第一初始光流FCV2-1,第一反向光流F3-1属于两个视频帧之间的光流,第一反向光流F3-1为从前一个视频帧I1到后一个视频帧I3的反向光流;将第二反向光流F1-3基于预设比例变换为第二初始光流FCV2-3,第二反向光流F1-3属于两个视频帧之间的光流,第二反向光流F1-3为从后一个视频帧I3到前一个视频帧I1的反向光流;将两个视频帧经过初始光流映射得到初始映射图包括:通过第一初始光流FCV2-1对前一个视频帧I1进行映射得到第一映射图WF1-2,第一映射图WF1-2属于初始映射图;通过第二初始光流FCV2-3对后一个视频帧I3进行映射得到第二映射图WF3-2,第二映射图WF3-2属于初始映射图;将基于两个视频帧、初始光流和初始映射图输入至光流修正神经网络,通过光流修正神经网络对初始光流进行修正,得到光流修正神经网络输出的修正后光流的过程包括:将前一个视频帧I1、后一个视频帧I3、第一初始光流FCV2-1、第二初始光流FCV2- 3、第一映射图WF1-2和第二映射图WF3-2输入至光流修正神经网络,得到光流修正神经网络输出的第三反向光流FCVU2-1和第四反向光流FCVU2-3,其中,第三反向光流FCVU2-1和第四反向光流FCVU2-3属于修正后光流,第三反向光流FCVU2-1为修正后的从前一个视频帧I1到目标插帧IN2的反向光流,第四反向光流FCVU2-3为修正后的从后一个视频帧I3到目标插帧IN2的反向光流。
在一种可能的实施方式中,根据修正后光流得到两个视频帧之间的目标插帧包括:经过修正后光流对两个视频帧进行映射得到修正后映射图;将两个视频帧、修正后光流以及修正后映射图输入至融合神经网络,得到融合神经网络输出的融合参数图;基于融合参数图,将第三映射图和第四映射图进行融合计算,得到目标插帧。
在一种可能的实施方式中,根据修正后光流得到两个视频帧之间的目标插帧包 括:通过第三反向光流FCVU2-1对前一个视频帧I1进行映射得到第三映射图WM1- 2;通过第四反向光流FCVU2-3对后一个视频帧I3进行映射得到第四映射图WM3-2;将前一个视频帧I1、后一个视频帧I3、第三反向光流FCVU2-1、第四反向光流FCVU2-3、第三映射图WM1-2和第四映射图WM3-2输入至融合神经网络,得到融合神经网络输出的融合参数图m;基于融合参数图m,将第三映射图WM1-2和第四映射图WM3-2进行融合计算,得到目标插帧IN2
在一种可能的实施方式中,基于融合参数图,将第三映射图和第四映射图进行融合计算,得到目标插帧的过程包括:将第三映射图和融合参数图相乘得到第一融合图,其中,融合参数图的多个像素值与第三映射图的多个像素值一一对应,融合参数图的每个像素值范围为0~1,第一融合图的多个像素值与第三映射图的多个像素值一一对应,第三映射图的多个像素值和融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为第一融合图的多个像素值;将1与融合参数图相减得到差值融合参数图,差值融合参数图的多个像素值与融合参数图的多个像素值一一对应,1与融合参数图的多个像素值分别相减后得到的多个差值分别为差值融合参数图的多个像素值;将第四映射图和差值融合参数图相乘得到第二融合图,第四映射图的多个像素值和差值融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为第二融合图的多个像素值;将第一融合图和第二融合图相加得到目标插帧,第一融合图的多个像素值和第二融合图的多个像素值一一对应相加得到的多个值分别为目标插帧的多个像素值。
在一种可能的实施方式中,将第一反向光流基于预设比例变换为第一初始光流包括:将第一反向光流乘以预设比例值,得到第一初始光流,预设比例值的范围为0.4~0.6;将第二反向光流基于预设比例变换为第二初始光流包括:将第二反向光流乘以预设比例值,得到第二初始光流。
在一种可能的实施方式中,预设比例值为0.5。
如图7所示,本申请实施例还提供一种神经网络训练装置4,包括:获取模块41,用于:获取一组训练数据,一组训练数据包括三个连续的视频帧,三个连续的视频帧依次为第一训练视频帧i1、第二训练视频帧i2和第三训练视频帧i3;获取第一参考反向光流fg2-1,第一参考反向光流fg2-1为从第一训练视频帧i1到第二训练视频帧i2的参考反向光流;获取第二参考反向光流fg2-3,第二参考反向光流fg2-3为从第三训练视频帧i3到第二训练视频帧i2的参考反向光流;计算第一训练反向光流f3-1,第一训练反向光流f3-1为从第一训练视频帧i1到第三训练视频帧i3的反向光流;计算第二训练反向光流f1-3,第二训练反向光流f1-3为从第三训练视频帧i3到第一训练视频帧i1的反向光流;将第一训练反向光流f3-1基于预设比例变换为第一初始训练光流fcv2-1;将第二训练反向光流f1-3基于预设比例变换为第二初始训练光流fcv2-3;通过第一初始训练光流fcv2-1对第一训练视频帧i1进行映射得到第一训练映射图wf1-2;通过第二初始训练光流fcv2-3对第三训练视频帧i3进行映射得到第二训练映射图wf3-2;修正模块42,用于:将第一训练视频帧i1、第三训练视频帧i3、第一初始训练光流fcv2-1、第二初始训练光流fcv2-3、第一训练映射图wf1-2和第二训练映射图wf3-2输入至光流修正神经网络,得到光流修正神经网络输出的第三训练反向光流fcvu2-1和第 四训练反向光流fcvu2-3,第三训练反向光流fcvu2-1为修正后的从第一训练视频帧i1到第二训练视频帧i2的反向光流,第四训练反向光流fcvu2-3为修正后的从第三训练视频帧i3到第二训练视频帧i2的反向光流;插帧模块43,用于:通过第三训练反向光流fcvu2-1对第一训练视频帧i1进行映射得到第三训练映射图wm1-2;通过第四训练反向光流fcvu2-3对第三训练视频帧i3进行映射得到第四训练映射图wm3-2;将第一训练视频帧i1、第三训练视频帧i3、第三训练反向光流fcvu2-1、第四训练反向光流fcvu2-3、第三训练映射图wm1-2和第四训练映射图wm3-2输入融合神经网络,得到融合神经网络输出的融合参数图m;插帧模块43,还用于基于融合参数图m,将第三训练映射图wm1-2和第四训练映射图wm3-2进行融合计算,得到目标插帧in2;调整模块44,用于基于目标插帧in2与第二训练视频帧i2之间的差异、第三训练反向光流fcvu2-1与第一参考反向光流fg2-1之间的差异、第四训练反向光流fcvu2-3与第二参考反向光流fg2-3之间的差异调整光流修正神经网络以及融合神经网络的网络参数。神经网络训练装置可以应用上述任意实施例中的用于视频插帧的神经网络训练方法,具体过程和原理与上述实施例相同,在此不再赘述。
在一种可能的实施方式中,计算第一训练反向光流f3-1包括:基于计算机视觉算法计算第一训练反向光流f3-1;计算从第二训练反向光流f1-3包括:基于计算机视觉算法计算第二训练反向光流f1-3
在一种可能的实施方式中,基于融合参数图,将第三训练映射图和第四训练映射图进行融合计算,得到目标插帧的过程包括:将第三训练映射图和融合参数图相乘得到第一融合图,其中,融合参数图的多个像素值与第三训练映射图的多个像素值一一对应,融合参数图每个像素值范围为0~1,第一融合图的多个像素值与第三训练映射图的多个像素值一一对应,第三训练映射图的多个像素值和融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为第一融合图的多个像素值;将1与融合参数图相减得到差值融合参数图,差值融合参数图的多个像素值与融合参数图的多个像素值一一对应,1与融合参数图的多个像素值分别相减后得到的多个差值分别为差值融合参数图的多个像素值;将第四训练映射图和差值融合参数图相乘得到第二融合图,第四训练映射图的多个像素值和差值融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为第二融合图的多个像素值;将第一融合图和第二融合图相加得到目标插帧,第一融合图的多个像素值和第二融合图的多个像素值一一对应相加得到的多个值分别为目标插帧的多个像素值。
在一种可能的实施方式中,将第一训练反向光流基于预设比例变换为第一初始训练光流包括:将第一训练反向光流乘以预设比例值,得到第一初始训练光流,预设比例值的范围为0.4~0.6;将第二初始训练光流基于预设比例变换为第二初始训练光流包括:将第二初始训练光流乘以预设比例值,得到第二初始训练光流。
在一种可能的实施方式中,预设比例值为0.5。
应理解以上视频插帧装置或神经网络训练装置的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块以软件通过处理元件调用的形式实现,部分模块通过硬件的形 式实现。例如,获取模块、修正模块和插帧模块中的任意一者可以为单独设立的处理元件,也可以集成在视频插帧装置中,例如集成在视频插帧装置的某一个芯片中实现,此外,也可以以程序的形式存储于视频插帧装置的存储器中,由视频插帧装置的某一个处理元件调用并执行以上各个模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。另外,上述视频插帧装置和神经网络训练装置可以为同一个装置,也可以为不同的装置。
例如,视频插帧装置或神经网络训练装置可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit,ASIC),或,一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)等。再如,当以上某个模块通过处理元件调度程序的形式实现时,该处理元件可以是通用处理器,例如中央处理器(Central Processing Unit,CPU)或其它可以调用程序的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。
如图8所示,本申请实施例还提供一种电子设备,包括:处理器51和存储器52,存储器52用于存储至少一条指令,指令由处理器51加载并执行时以实现上述任意实施例中的方法,包括视频插帧方法或用于视频插帧的神经网络训练方法。视频插帧方法或用于视频插帧的神经网络训练方法的具体过程和原理与上述实施例相同,在此不再赘述。
处理器51的数量可以为一个或多个,处理器51和存储器52可以通过总线53或者其他方式连接。存储器52作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块,如本申请实施例中的数据处理装置对应的程序指令/模块。处理器通过运行存储在存储器中的非暂态软件程序、指令以及模块,从而执行各种功能应用以及数据处理,即实现上述任意方法实施例中的方法。存储器可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;以及必要数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。电子设备例如可以为服务器、电脑、手机等电子产品。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述任意实施例中的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通 用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
以上仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种视频插帧方法,其特征在于,包括:
    获取视频中相邻的两个视频帧,所述两个视频帧包括前一个视频帧和后一个视频帧;
    计算所述两个视频帧之间的光流;
    将所述两个视频帧之间的光流基于预设比例变换为初始光流;
    通过所述初始光流对所述两个视频帧进行映射得到初始映射图;
    基于所述初始映射图对所述两个视频帧之间的光流进行修正,得到修正后光流;
    根据所述修正后光流得到所述两个视频帧之间的目标插帧。
  2. 根据权利要求1所述的方法,其特征在于,
    所述计算所述两个视频帧之间的光流包括:基于计算机视觉算法计算所述两个视频帧之间的光流;
    所述基于所述初始映射图对所述两个视频帧之间的光流进行修正包括:基于神经网络,将所述初始映射图作为输入对所述两个视频帧之间的光流进行修正。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所述初始映射图对所述两个视频帧之间的光流进行修正,得到修正后光流的过程包括:
    将所述两个视频帧、所述初始光流和所述初始映射图输入至光流修正神经网络,通过所述光流修正神经网络对所述初始光流进行修正,得到所述光流修正神经网络输出的所述修正后光流。
  4. 根据权利要求3所述的方法,其特征在于,
    所述将所述两个视频帧之间的光流基于预设比例变换为初始光流包括:
    将第一反向光流基于预设比例变换为第一初始光流,所述第一反向光流属于所述两个视频帧之间的光流,所述第一反向光流为从所述前一个视频帧到所述后一个视频帧的反向光流;
    将第二反向光流基于预设比例变换为第二初始光流,所述第二反向光流属于所述两个视频帧之间的光流,所述第二反向光流为从所述后一个视频帧到所述前一个视频帧的反向光流;
    所述将所述两个视频帧经过所述初始光流映射得到初始映射图包括:
    通过所述第一初始光流对所述前一个视频帧进行映射得到第一映射图,所述第一映射图属于所述初始映射图;
    通过所述第二初始光流对所述后一个视频帧进行映射得到第二映射图,所述第二映射图属于所述初始映射图;
    所述将所述两个视频帧、所述初始光流和所述初始映射图输入至光流修正神经网络,通过光流修正神经网络对所述初始光流进行修正,得到所述光流修正神经网络输出的所述修正后光流的过程包括:
    将所述前一个视频帧、所述后一个视频帧、所述第一初始光流、所述第二初始光流、所述第一映射图和所述第二映射图输入至光流修正神经网络,得到光流修正神经网络输出的第三反向光流和第四反向光流,其中,所述第三反向光流和所述第 四反向光流属于所述修正后光流,所述第三反向光流为修正后的从所述前一个视频帧到所述目标插帧的反向光流,所述第四反向光流为修正后的从所述后一个视频帧到所述目标插帧的反向光流。
  5. 根据权利要求1所述的方法,其特征在于,
    所述根据所述修正后光流得到所述两个视频帧之间的目标插帧包括:
    通过所述修正后光流对所述两个视频帧进行映射得到修正后映射图;
    将所述两个视频帧、所述修正后光流以及所述修正后映射图输入至融合神经网络,得到所述融合神经网络输出的融合参数图;
    基于所述融合参数图,将所述修正后映射图进行融合计算,得到所述目标插帧。
  6. 根据权利要求4所述的方法,其特征在于,
    所述根据所述修正后光流得到所述两个视频帧之间的目标插帧包括:
    通过所述第三反向光流对所述前一个视频帧进行映射得到第三映射图;
    通过所述第四反向光流对所述后一个视频帧进行映射得到第四映射图;
    将所述前一个视频帧、所述后一个视频帧、所述第三反向光流、所述第四反向光流、所述第三映射图和所述第四映射图输入至融合神经网络,得到所述融合神经网络输出的融合参数图;
    基于所述融合参数图,将所述第三映射图和所述第四映射图进行融合计算,得到所述目标插帧。
  7. 根据权利要求6所述的方法,其特征在于,
    所述基于所述融合参数图,将所述第三映射图和所述第四映射图进行融合计算,得到所述目标插帧的过程包括:
    将所述第三映射图和所述融合参数图相乘得到第一融合图,其中,所述融合参数图的多个像素值与所述第三映射图的多个像素值一一对应,所述融合参数图的每个像素值范围为0~1,所述第一融合图的多个像素值与所述第三映射图的多个像素值一一对应,所述第三映射图的多个像素值和所述融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为所述第一融合图的多个像素值;
    将1与所述融合参数图相减得到差值融合参数图,所述差值融合参数图的多个像素值与所述融合参数图的多个像素值一一对应,1与所述融合参数图的多个像素值分别相减后得到的多个差值分别为所述差值融合参数图的多个像素值;
    将所述第四映射图和所述差值融合参数图相乘得到第二融合图,所述第四映射图的多个像素值和所述差值融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为所述第二融合图的多个像素值;
    将所述第一融合图和所述第二融合图相加得到所述目标插帧,所述第一融合图的多个像素值和所述第二融合图的多个像素值一一对应相加得到的多个值分别为所述目标插帧的多个像素值。
  8. 根据权利要求4或6或7所述的方法,其特征在于,
    所述将所述第一反向光流基于预设比例变换为第一初始光流包括:
    将所述第一反向光流乘以预设比例值,得到所述第一初始光流,所述预设比例 值的范围为0.4~0.6;
    所述将第二反向光流基于预设比例变换为第二初始光流包括:
    将所述第二反向光流乘以所述预设比例值,得到所述第二初始光流。
  9. 根据权利要求8所述的方法,其特征在于,
    所述预设比例值为0.5。
  10. 一种用于视频插帧的神经网络训练方法,其特征在于,包括:
    获取一组训练数据,所述一组训练数据包括三个连续的视频帧,所述三个连续的视频帧依次为第一训练视频帧、第二训练视频帧和第三训练视频帧;
    获取第一参考反向光流,所述第一参考反向光流为从所述第一训练视频帧到所述第二训练视频帧的反向光流;
    获取第二参考反向光流,所述第二参考反向光流为从所述第三训练视频帧到所述第二训练视频帧的反向光流;
    计算第一训练反向光流,所述第一训练反向光流为从所述第一训练视频帧到所述第三训练视频帧的反向光流;
    计算第二训练反向光流,所述第二训练反向光流为从所述第三训练视频帧到所述第一训练视频帧的反向光流;
    将所述第一训练反向光流基于预设比例变换为第一初始训练光流;
    将所述第二训练反向光流基于所述预设比例变换为第二初始训练光流;
    通过所述第一初始训练光流对所述第一训练视频帧进行映射得到第一训练映射图;
    通过所述第二初始训练光流对所述第三训练视频帧进行映射得到第二训练映射图;
    将所述第一训练视频帧、所述第三训练视频帧、所述第一初始训练光流、所述第二初始训练光流、所述第一训练映射图和所述第二训练映射图输入至光流修正神经网络,得到光流修正神经网络输出的第三训练反向光流和第四训练反向光流,所述第三训练反向光流为修正后的从所述第一训练视频帧到所述第二训练视频帧的反向光流,所述第四训练反向光流为修正后的从所述第三训练视频帧到所述第二训练视频帧的反向光流;
    通过所述第三训练反向光流对所述第一训练视频帧进行映射得到第三训练映射图;
    通过所述第四训练反向光流对所述第三训练视频帧进行映射得到第四训练映射图;
    将所述第一训练视频帧、所述第三训练视频帧、所述第三训练反向光流、所述第四训练反向光流、所述第三训练映射图和所述第四训练映射图输入融合神经网络,得到融合神经网络输出的融合参数图;
    基于所述融合参数图,将所述第三训练映射图和所述第四训练映射图进行融合计算,得到所述目标插帧;
    基于所述目标插帧与所述第二训练视频帧之间的差异、所述第三训练反向光流与所述第一参考反向光流之间的差异、所述第四训练反向光流与所述第二参考反向 光流之间的差异调整所述光流修正神经网络以及所述融合神经网络的网络参数。
  11. 根据权利要求10所述的方法,其特征在于,
    所述计算第一训练反向光流包括:基于计算机视觉算法计算所述第一训练反向光流;
    所述计算第二训练反向光流包括:基于计算机视觉算法计算所述第二训练反向光流。
  12. 根据权利要求10所述的方法,其特征在于,
    所述基于所述融合参数图,将所述第三训练映射图和所述第四训练映射图进行融合计算,得到所述目标插帧的过程包括:
    将所述第三训练映射图和所述融合参数图相乘得到第一融合图,其中,所述融合参数图的多个像素值与所述第三训练映射图的多个像素值一一对应,所述融合参数图每个像素值范围为0~1,所述第一融合图的多个像素值与所述第三训练映射图的多个像素值一一对应,所述第三训练映射图的多个像素值和所述融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为所述第一融合图的多个像素值;
    将1与所述融合参数图相减得到差值融合参数图,所述差值融合参数图的多个像素值与所述融合参数图的多个像素值一一对应,1与所述融合参数图的多个像素值分别相减后得到的多个差值分别为所述差值融合参数图的多个像素值;
    将所述第四训练映射图和所述差值融合参数图相乘得到第二融合图,所述第四训练映射图的多个像素值和所述差值融合参数图的多个像素值一一对应相乘得到的多个乘积值分别为所述第二融合图的多个像素值;
    将所述第一融合图和所述第二融合图相加得到所述目标插帧,所述第一融合图的多个像素值和所述第二融合图的多个像素值一一对应相加得到的多个值分别为所述目标插帧的多个像素值。
  13. 根据权利要求10所述的方法,其特征在于,
    所述将所述第一训练反向光流基于预设比例变换为第一初始训练光流包括:
    将所述第一训练反向光流乘以预设比例值,得到所述第一初始训练光流,所述预设比例值的范围为0.4~0.6;
    所述将所述第二初始训练光流基于预设比例变换为第二初始训练光流包括:
    将所述第二初始训练光流乘以所述预设比例值,得到所述第二初始训练光流。
  14. 根据权利要求10所述的方法,其特征在于,
    所述预设比例值为0.5。
  15. 一种视频插帧装置,其特征在于,包括:
    获取模块,用于获取视频中相邻的两个视频帧;
    所述获取模块还用于,计算所述两个视频帧之间的光流;
    所述获取模块还用于,将所述两个视频帧之间的光流基于预设比例变换为初始光流;
    所述获取模块还用于,通过所述初始光流对所述两个视频帧进行映射得到初始映射图;
    修正模块,用于基于所述初始映射图对所述两个视频帧之间的光流进行修正, 得到修正后光流;
    插帧模块,用于根据所述修正后光流得到所述两个视频帧之间的目标插帧。
  16. 一种电子设备,其特征在于,包括:
    处理器和存储器,所述存储器用于存储至少一条指令,所述指令由所述处理器加载并执行时以实现如权利要求1至14中任意一项所述的方法。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求1至14中任意一项所述的方法。
PCT/CN2023/075807 2022-02-24 2023-02-14 视频插帧方法、训练方法、装置和电子设备 WO2023160426A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210171767.5 2022-02-24
CN202210171767.5A CN114640885B (zh) 2022-02-24 2022-02-24 视频插帧方法、训练方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2023160426A1 true WO2023160426A1 (zh) 2023-08-31

Family

ID=81948635

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/075807 WO2023160426A1 (zh) 2022-02-24 2023-02-14 视频插帧方法、训练方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN114640885B (zh)
WO (1) WO2023160426A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640885B (zh) * 2022-02-24 2023-12-22 影石创新科技股份有限公司 视频插帧方法、训练方法、装置和电子设备
CN117115210B (zh) * 2023-10-23 2024-01-26 黑龙江省农业科学院农业遥感与信息研究所 基于物联网的智慧农业监控调节方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190138889A1 (en) * 2017-11-06 2019-05-09 Nvidia Corporation Multi-frame video interpolation using optical flow
CN109949221A (zh) * 2019-01-30 2019-06-28 深圳大学 一种图像处理方法及电子设备
CN110191299A (zh) * 2019-04-15 2019-08-30 浙江大学 一种基于卷积神经网络的多重帧插值方法
CN112104830A (zh) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN112995715A (zh) * 2021-04-20 2021-06-18 腾讯科技(深圳)有限公司 视频插帧处理方法、装置、电子设备及存储介质
US20210368131A1 (en) * 2020-05-20 2021-11-25 Fujitsu Limited Video frame interpolation apparatus and method
CN113949926A (zh) * 2020-07-17 2022-01-18 武汉Tcl集团工业研究院有限公司 一种视频插帧方法、存储介质及终端设备
CN114066730A (zh) * 2021-11-04 2022-02-18 西北工业大学 一种基于无监督对偶学习的视频插帧方法
CN114640885A (zh) * 2022-02-24 2022-06-17 影石创新科技股份有限公司 视频插帧方法、训练方法、装置和电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978728B (zh) * 2014-04-08 2017-11-14 南京理工大学 一种光流法的图像匹配系统
US10531093B2 (en) * 2015-05-25 2020-01-07 Peking University Shenzhen Graduate School Method and system for video frame interpolation based on optical flow method
CN113365110B (zh) * 2021-07-14 2023-01-31 北京百度网讯科技有限公司 模型训练、视频插帧方法,装置,设备以及存储介质
CN114007135B (zh) * 2021-10-29 2023-04-18 广州华多网络科技有限公司 视频插帧方法及其装置、设备、介质、产品

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190138889A1 (en) * 2017-11-06 2019-05-09 Nvidia Corporation Multi-frame video interpolation using optical flow
CN109949221A (zh) * 2019-01-30 2019-06-28 深圳大学 一种图像处理方法及电子设备
CN110191299A (zh) * 2019-04-15 2019-08-30 浙江大学 一种基于卷积神经网络的多重帧插值方法
US20210368131A1 (en) * 2020-05-20 2021-11-25 Fujitsu Limited Video frame interpolation apparatus and method
CN113949926A (zh) * 2020-07-17 2022-01-18 武汉Tcl集团工业研究院有限公司 一种视频插帧方法、存储介质及终端设备
CN112104830A (zh) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN112995715A (zh) * 2021-04-20 2021-06-18 腾讯科技(深圳)有限公司 视频插帧处理方法、装置、电子设备及存储介质
CN114066730A (zh) * 2021-11-04 2022-02-18 西北工业大学 一种基于无监督对偶学习的视频插帧方法
CN114640885A (zh) * 2022-02-24 2022-06-17 影石创新科技股份有限公司 视频插帧方法、训练方法、装置和电子设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI HAOPENG; YUAN YUAN; WANG QI: "Video Frame Interpolation Via Residue Refinement", ICASSP 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 4 May 2020 (2020-05-04), pages 2613 - 2617, XP033793646, DOI: 10.1109/ICASSP40776.2020.9053987 *
ZHANG QIAN, JIANG FENG: "Video interpolation based on deep learing", INTELLIGENT COMPUTER AND APPLICATIONS., vol. 9, no. 4, 1 July 2019 (2019-07-01), pages 252 - 257, 262, XP093069281 *

Also Published As

Publication number Publication date
CN114640885B (zh) 2023-12-22
CN114640885A (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
Zeng et al. Learning image-adaptive 3d lookup tables for high performance photo enhancement in real-time
WO2023160426A1 (zh) 视频插帧方法、训练方法、装置和电子设备
WO2021208122A1 (zh) 基于深度学习的视频盲去噪方法及装置
CN108304755B (zh) 用于图像处理的神经网络模型的训练方法和装置
US9615039B2 (en) Systems and methods for reducing noise in video streams
WO2019184657A1 (zh) 图像识别方法、装置、电子设备及存储介质
CN111986084B (zh) 一种基于多任务融合的多相机低光照图像质量增强方法
CN106169173B (zh) 一种图像插值方法
CN106780336B (zh) 一种图像缩小方法及装置
CN113327318A (zh) 图像显示方法、装置、电子设备和计算机可读介质
CN113724155A (zh) 用于自监督单目深度估计的自提升学习方法、装置及设备
CN115375536A (zh) 图像处理方法及设备
CN109274949A (zh) 一种视频图像处理方法及其装置、显示设备
CN113256506A (zh) 使用先前帧统计信息经由卷积神经网络来处理视频帧
WO2020215263A1 (zh) 一种图像处理方法及装置
WO2023082685A1 (zh) 视频增强方法、装置、计算机设备和存储介质
WO2022247394A1 (zh) 图像拼接方法及装置、存储介质及电子设备
CN114339030B (zh) 一种基于自适应可分离卷积的网络直播视频稳像方法
Steffens et al. Contrast enhancement and image completion: A cnn based model to restore ill exposed images
CN115471413A (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
CN115937290A (zh) 一种图像深度估计方法、装置、电子设备及存储介质
CN111093045B (zh) 一种缩放视频序列分辨率的方法及装置
CN115049558A (zh) 模型训练、人脸图像处理方法及装置、电子设备及可读存储介质
CN111784733A (zh) 图像处理方法、装置、终端和计算机可读存储介质
CN113556581B (zh) 插值帧的生成方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23759051

Country of ref document: EP

Kind code of ref document: A1