WO2023020492A1 - 视频帧调整方法、装置、电子设备和存储介质 - Google Patents

视频帧调整方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2023020492A1
WO2023020492A1 PCT/CN2022/112783 CN2022112783W WO2023020492A1 WO 2023020492 A1 WO2023020492 A1 WO 2023020492A1 CN 2022112783 W CN2022112783 W CN 2022112783W WO 2023020492 A1 WO2023020492 A1 WO 2023020492A1
Authority
WO
WIPO (PCT)
Prior art keywords
video frame
frame
optical flow
video
pixel
Prior art date
Application number
PCT/CN2022/112783
Other languages
English (en)
French (fr)
Inventor
龚立雪
张瑞
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023020492A1 publication Critical patent/WO2023020492A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the present disclosure relates to the field of video technology, and in particular to a video frame adjustment method, device, electronic equipment and storage medium.
  • Video is an important medium for information dissemination on the Internet. Factors such as video color, frame rate, and resolution will affect the playback effect of the video, which in turn will affect the viewing experience of the user. Wherein, the higher the frame rate of the video is, the smoother the playback of the video is, and the better the viewing experience of the user is.
  • embodiments of the present disclosure provide a video frame adjustment method, device, electronic equipment, and storage medium, which realize frame interpolation between two adjacent video frames. purpose, while ensuring the playback effect of the high frame rate video after frame insertion.
  • an embodiment of the present disclosure provides a video frame adjustment method, the method comprising:
  • the first optical flow from the first video frame to the second video frame and the first optical flow from the second video frame to the first video frame are determined by quantizing the neural network.
  • the second optical flow of a video frame, the first video frame and the second video frame are two adjacent initial video frames;
  • the intermediate frame is an estimated video frame to be inserted between the first video frame and the second video frame;
  • the intermediate frame is inserted between the first video frame and the second video frame to obtain a target video.
  • the embodiment of the present disclosure also provides a video frame adjustment device, which includes:
  • the first determination module is configured to determine the first optical flow from the first video frame to the second video frame and the first optical flow from the first video frame to the second video frame through a quantization neural network based on the first video frame and the second video frame in the initial video.
  • a second optical flow from two video frames to the first video frame, where the first video frame and the second video frame are two adjacent initial video frames;
  • a second determining module configured to determine a third optical flow from an intermediate frame to the first video frame based on the first optical flow and the second optical flow, and a third optical flow from the intermediate frame to the second video frame A fourth optical flow, wherein the intermediate frame is an estimated video frame to be inserted between the first video frame and the second video frame;
  • a third determining module configured to determine the intermediate frame according to the first video frame, the second video frame, the third optical flow, and the fourth optical flow;
  • a frame insertion module configured to insert the intermediate frame between the first video frame and the second video frame in the initial video to obtain a target video.
  • an embodiment of the present disclosure further provides an electronic device, and the electronic device includes:
  • processors one or more processors
  • the one or more processors are made to implement the above video frame adjustment method.
  • an embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the above video frame adjustment method is implemented.
  • an embodiment of the present disclosure further provides a computer program product, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is executed by a processor, implements the video frame adjustment method as described above.
  • the video frame adjustment method provided by the embodiment of the present disclosure determines the first optical flow from the first video frame to the second video frame by quantizing the neural network based on the first video frame and the second video frame in the initial video And the second optical flow from the second video frame to the first video frame, the first video frame and the second video frame are two adjacent initial video frames, so that the embodiments of the present disclosure provide
  • the video frame adjustment method can run on mobile devices, and it is more robust to videos with large motion scenes.
  • the intermediate frame is an estimated video frame to be inserted between the first video frame and the second video frame; according to the first video frame, the second video frame, the third optical flow and The fourth optical flow determines the intermediate frame; in the initial video, the intermediate frame is inserted between the first video frame and the second video frame.
  • the purpose of frame insertion between two adjacent video frames is realized, and at the same time, the playback effect of the high frame rate video after the frame insertion can be guaranteed.
  • FIG. 1 is a flowchart of a video frame adjustment method in an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a quantized neural network in an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a timing relationship between video frames in an embodiment of the present disclosure
  • Fig. 4 is a second optical flow vector of a second pixel point P from the first video frame I_0 to the intermediate frame I_t in the embodiment of the present disclosure And the first optical flow vector of the first pixel point Q from the intermediate frame I_t to the first video frame I_0 schematic diagram;
  • FIG. 5 is a schematic flow diagram of predicting an occlusion image through a preset neural network in an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of a video frame adjustment device in an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • Fig. 1 is a flow chart of a video frame adjustment method in an embodiment of the present disclosure, the method can be executed by a video frame adjustment device, the device can be implemented by software and/or hardware, and the device can be configured in an electronic device , such as a terminal, which specifically includes, but is not limited to, smart phones, handheld computers, tablet computers, wearable devices with display screens, desktop computers, notebook computers, all-in-one computers, smart home devices with display screens, and the like.
  • a terminal specifically includes, but is not limited to, smart phones, handheld computers, tablet computers, wearable devices with display screens, desktop computers, notebook computers, all-in-one computers, smart home devices with display screens, and the like.
  • the method may specifically include the following steps:
  • Step 110 based on the first video frame and the second video frame in the initial video, determine the first optical flow from the first video frame to the second video frame and the first optical flow from the second video frame to the second video frame by quantizing the neural network
  • the second optical flow of the first video frame, the first video frame and the second video frame are two adjacent initial video frames.
  • the original video refers to the original video obtained by shooting or recording through the shooting device, in other words, the original video refers to the video without frame interpolation processing.
  • the video frames in the initial video are initial video frames.
  • Two adjacent video frames refer to two video frames adjacent in time. For example, the first video frame is collected at time 0, and the second video frame is collected at time 1, then the first video frame and the second video frame are temporally Two adjacent video frames.
  • the quantized neural network may refer to a neural network that uses INT8 (8-bit fixed-point integer) to store model parameters.
  • the meaning of quantization usually refers to converting the floating-point algorithm of the neural network to fixed-point, so that the neural network can run in real time on mobile devices (typically such as smartphones), taking into account the performance limitations of mobile devices in terms of memory.
  • the video frame adjustment method By using a quantized neural network to estimate the motion of the target object in two adjacent video frames, the video frame adjustment method provided in this embodiment can produce a better frame interpolation effect for complex motion scenes, ensuring that the target object obtained after frame interpolation The playback effect of the video; and by designing an efficient quantitative neural network, the video frame adjustment method can be run in real time on the mobile terminal, specifically, it can be run in real time on the mobile terminal processor that supports quantitative calculations, and the real-time processing is performed through the mobile terminal processor.
  • the video frame adjustment can achieve the purpose of increasing the video frame rate and making the video playback smoother.
  • the two optical flow prediction branches 230 may be specifically divided into a first prediction branch 231 and a second prediction branch 232 .
  • the encoder module 210 includes a down-sampling unit 211 and an encoding unit 212, the down-sampling unit 211 is used to down-sample the two input video frames 200 (i.e.
  • the downsampled images i.e., the downsampled images of the first video frame and the downsampled images of the second video frame
  • the downsampled images are input to the encoding unit 212, so that the encoding unit 212 performs feature extraction based on the downsampled images, and obtains Encoding of the feature image and sending the encoding to the decoder module 220.
  • the decoder module 220 includes a decoding unit and an up-sampling unit, the decoding unit is used to decode the encoding of the feature image, and input the decoded feature image to the up-sampling unit, so that the up-sampling unit can analyze the decoded feature image
  • the image is up-sampled, and the obtained up-sampled images are respectively input to two optical flow prediction branches 230, so that the two optical flow prediction branches 230 are respectively based on the up-sampled images to predict the distance between the first video frame and the second video frame.
  • the first optical flow, and the second optical flow from the second video frame to the first video frame.
  • the first optical flow flow01 from the first video frame to the second video frame is predicted by the first prediction branch 231
  • the second optical flow flow10 from the second video frame to the first video frame is predicted by the second prediction branch 232 . That is, the input of the encoder module 210 is the first video frame I_0 and the second video frame I_1, before the feature extraction is performed by the encoding unit 212, the first video frame I_0 and the second video frame I_1 are down-sampled by the down-sampling unit 211 , to increase the receptive field of the neural network.
  • the advantage of this setting is that it can make the neural network more robust to motion estimation in large motion scenes, and at the same time, performing neural network inference on small-resolution images can also improve the inference speed.
  • the neural network is trained in a quantized manner. For example, an INT8-type quantized neural network can be obtained in the end, and its essence is to use the INT8 type (8-bit fixed-point integer) to store Model parameters.
  • Step 120 based on the first optical flow and the second optical flow, determine a third optical flow from the intermediate frame to the first video frame, and a fourth optical flow from the intermediate frame to the second video frame , wherein the intermediate frame is an estimated video frame to be inserted between the first video frame and the second video frame.
  • a third optical flow from the intermediate frame to the first video frame, and a third optical flow from the intermediate frame to the second video frame are determined.
  • the fourth optical flow includes: determining the first video frame to the second video frame based on the first optical flow, the second optical flow, and the movement track of the target object in the first video frame and the second video frame.
  • the corresponding one can refer to the one shown in Figure 3 A schematic diagram of the timing relationship between video frames.
  • the optical flow from the first video frame I_0 to the intermediate frame I_t is marked as the fifth optical flow
  • the optical flow from the second video frame I_1 to the intermediate frame I_t is marked as the sixth optical flow.
  • the fifth optical flow flow0t flow01*t from the first video frame I_0 to the intermediate frame I_t
  • the second video frame I_1 Optical flow to intermediate frame I_t
  • Sixth optical flow flow1t flow10*(1-t).
  • flow01 represents the first optical flow from the first video frame I_0 to the second video frame I_1
  • flow10 represents the second optical flow from the second video frame I_1 to the first video frame I_0, specifically through the above step 110 Quantified neural network acquisition.
  • the optical flow inversion technology can determine the intermediate frame I_t to the first A third optical flow flowt0 from a video frame I_0 and a fourth optical flow flowt1 from an intermediate frame I_t to a second video frame I_1.
  • the purpose of generating the third optical flow from the intermediate frame I_t to the first video frame I_0 and the fourth optical flow from the intermediate frame I_t to the second video frame I_1 is to ensure that each pixel on the intermediate frame I_t, in the first video frame I_0 There are corresponding pixels on the second video frame I_1 and the second video frame I_1 to ensure the continuity of the video picture in the intermediate frame I_t, the first video frame I_0 and the second video frame I_1, so as to obtain a better frame interpolation effect.
  • the determining the third optical flow based on the fifth optical flow and the sixth optical flow through the optical flow reversal technique includes:
  • the first optical flow vector of the first pixel from the intermediate frame to the first video frame is the second pixel
  • the first optical flow vector is the at least two second pixels from the first video frame to Weighted average of the inverse of the second optical flow vector for intermediate frames.
  • the essence of the preset relationship between the first pixel on the intermediate frame and the at least two second pixels on the first video frame is: when at least two second pixels on the first video frame move from time 0 to time t , to reach the position of the first pixel on the intermediate frame. If there is no second pixel on the first video frame that has a preset relationship with the first pixel on the intermediate frame, the first optical flow vector is 0.
  • the essence of the fact that there is no second pixel on the first video frame that has a preset relationship with the first pixel on the intermediate frame is that when all the pixels on the first video frame move from time 0 to time t, they do not reach The position of the first pixel on the intermediate frame.
  • the first optical flow vectors of each first pixel on the intermediate frame from the intermediate frame to the first video frame form the third optical flow.
  • the second optical flow vector passing through it from the first video frame I_0 to the intermediate frame I_t The position of the first pixel point Q corresponding to the second pixel point P from the first video frame I_0 to the intermediate frame I_t can be calculated, so the optical flow vector of the first pixel point Q from time t to time 0 is That is, the first optical flow vector of the first pixel point Q from the intermediate frame I_t to the first video frame I_0 is the second optical flow vector of the second pixel point P from the first video frame I_0 to the intermediate frame I_t
  • the inverse vector of specifically: in, represents the first optical flow vector, represents the second optical flow vector.
  • the optical flow vectors of multiple second pixel points P on the first video frame I_0 arrive at the same first pixel point Q on the intermediate frame I_t, that is, the first pixel point Q and At least two of the second pixel points P correspond, or, in other words, the first pixel point Q on the intermediate frame I_t has a preset relationship with at least two second pixel points P on the first video frame I_0.
  • N represents the second optical flow vector ending at the first pixel point Q the number of .
  • the first optical flow vector is 0, and the first pixel point Q is marked as an optical flow hole point.
  • the first optical flow vectors of each first pixel point Q on the intermediate frame from the intermediate frame to the first video frame form the third optical flow.
  • the method of determining the fourth optical flow is similar to the above method of determining the third optical flow. Specifically, determining the fourth optical flow based on the sixth optical flow includes:
  • the third optical flow vector is the inverse vector of the fourth optical flow vector of the fourth pixel from the second video frame to the intermediate frame, wherein the sixth optical flow includes the fourth optical flow vector .
  • the third optical flow vector is the at least two fourth pixel points respectively a weighted average of the inverse vector of the fourth optical flow vector from the second video frame to the intermediate frame. If there is no fourth pixel on the second video frame that has a preset relationship with the third pixel on the intermediate frame, then the third optical flow vector is 0. A third optical flow vector of each third pixel on the intermediate frame from the intermediate frame to the second video frame forms the fourth optical flow.
  • Step 130 Determine the intermediate frame according to the first video frame, the second video frame, the third optical flow, and the fourth optical flow.
  • the determining the intermediate frame according to the first video frame, the second video frame, the third optical flow, and the fourth optical flow includes:
  • the image affine transformation is performed on the first video frame I_0, that is, the warp transformation is performed on the first video frame I_0 to obtain the first transformed frame I_t 0 of the first video frame at the acquisition time t of the intermediate frame.
  • Image affine transformation is performed on the second video frame I_1, that is, warp transformation is performed on the second video frame I_1 to obtain a second transformed frame I_t 1 of the second video frame at the acquisition time t of the intermediate frame.
  • the purpose of the image affine transformation is to estimate the video frame of the first video frame I_0 at time t, and the video frame of the second video frame I_1 at time t, so as to provide a data source for obtaining the intermediate frame I_t.
  • a preset is designed
  • the neural network is used to predict an occlusion image mask (that is, the fusion weight of the first transformed frame and the second transformed frame), and the value range of the pixel value in the occluded image mask is 0-1, which indicates that the pixel is from the first video frame
  • the probability of I_0 the closer the pixel value is to 1, the greater the probability that the pixel comes from the first video frame I_0.
  • the input of the preset neural network includes: the first transformed frame I_t 0 , the second transformed frame I_t 1 , the third optical flow flowt0, the fourth optical flow flowt1, the downsampled image corresponding to the first video frame I_0 and the second video
  • the downsampled image corresponding to frame I_1 is the occlusion image mask.
  • a nutshell by presetting the neural network based on the downsampled image corresponding to the first video frame, the downsampled image corresponding to the second video frame, the first transformed frame, the second transformed frame, the third optical flow and the fourth optical flow prediction
  • the fusion weight of the first transformation frame and the second transformation frame based on the fusion weight, the pixels in the first transformation frame and the second transformation frame are fused to obtain an intermediate frame, and the fusion weight indicates that a pixel on the intermediate frame comes from the first Video frame or probability from a second video frame.
  • the pixels in the first transformed frame and the second transformed frame are fused to obtain an intermediate frame, including: obtaining an intermediate frame based on the following formula:
  • I t represents the intermediate frame
  • mask represents the occluded image
  • the symbol " " means pixel-by-pixel multiplication.
  • Step 140 In the initial video, insert the intermediate frame between the first video frame and the second video frame to obtain a target video.
  • the video frame adjustment method provided in this embodiment uses a quantized neural network to estimate the motion of the target object in two adjacent video frames, so that the video frame adjustment method can produce better frame interpolation effects for complex motion scenes, ensuring The final playback effect of the video; and through the design of an efficient quantization neural network, the video frame adjustment method can be run in real time on the mobile terminal; the mask network is used to predict the occluded image, making the video frame adjustment method more robust, and the fused intermediate The frame is more natural and realistic.
  • the intermediate frame between the first video frame and the second video frame before inserting the intermediate frame between the first video frame and the second video frame, it also includes: based on the target object in the first video frame and the second The motion feature and/or color feature in the video frame determine whether it is suitable to insert an intermediate frame between the first video frame and the second video frame, if it is determined that it is suitable to insert an intermediate frame between the first video frame and the second video frame, Then continue to execute the step of inserting the intermediate frame between the first video frame and the second video frame.
  • the above-mentioned operation of inserting the intermediate frame between the first video frame and the second video frame is not performed, so as to avoid the obtained Artifacts are introduced into the target video to achieve the purpose of improving the smoothness of video playback while ensuring the quality of the video after frame insertion.
  • it may be determined whether it is suitable to insert an intermediate frame between the first video frame and the second video frame through analysis of motion characteristics, for example, statistics of related indicators from color information and motion information.
  • the motion characteristics of the target object in the first video frame and the second video frame include at least one of the following: the consistency between the third optical flow and the fourth optical flow; The number of flow hole points, if there is no pixel point with a preset relationship with the specific pixel point in the intermediate frame in the first video frame and the second video frame, the specific pixel point is determined as the optical flow hole point, that is, the first video frame When there is no second pixel point P having a preset relationship with the first pixel point Q on the intermediate frame I_t on the frame I_0, the first pixel point Q is marked as an optical flow hole point.
  • the color features of the target object in the first video frame and the second video frame include: the grayscale difference between the first transformed frame and the second transformed frame, wherein the first transformed frame is obtained by performing image affine on the first video frame The second transformed frame is obtained by performing image radiation transformation on the second video frame.
  • the motion feature of the target object in the first video frame and the second video frame is the consistency between the third optical flow and the fourth optical flow
  • the consistency between the third optical flow and the fourth optical flow it is determined that Whether it is suitable to insert an intermediate frame between the first video frame and the second video frame, including:
  • the linear motion offset is determined according to the forward motion vector of the target pixel on the intermediate frame moving from the intermediate frame to the first video frame, and the backward motion vector of the target pixel moving from the intermediate frame to the second video frame Distance: the proportion of the number of pixels whose linear motion offset distance is greater than the first set threshold; if the proportion of the number of pixels is less than or equal to the second set threshold, it is determined that it is suitable for insertion between the first video frame and the second video frame an intermediate frame; if the ratio of the number of pixels is greater than the second set threshold, it is determined that it is not suitable to insert an intermediate frame between the first video frame and the second video frame.
  • the proportion of pixels whose linear motion offset distance is greater than the first set threshold that is, the ratio of the number of pixels whose linear motion offset distance is greater than the first set threshold to the total number of pixels on the intermediate frame. If the proportion is greater than the second set threshold, it is determined that it is not suitable to insert an intermediate frame between the first video frame and the second video frame.
  • the grayscale difference between the first converted frame and the second converted frame is greater than a third set threshold, it is determined that it is not suitable to insert an intermediate frame between the first video frame and the second video frame.
  • the optical flow hole points in the intermediate frame have been marked, and these optical flow hole points often occur in the occlusion area, and the number of optical flow hole points is counted, The larger the number, the larger the area of the occlusion area. If the area of the occlusion area is too large, frame interpolation errors are likely to occur. Therefore, in order to ensure the quality of the video image, in this case, it is not performed between the first video frame and the second video frame. Frame interpolation to avoid introducing artifacts into the target video obtained after frame interpolation, and improve the smoothness of video playback while ensuring the quality of the video image after frame interpolation.
  • FIG. 6 is a schematic structural diagram of an apparatus for adjusting video frames in an embodiment of the present disclosure.
  • the apparatus for adjusting video frames provided by the embodiments of the present disclosure may be configured in a terminal.
  • the apparatus for adjusting video frames specifically includes: a first determination module 610 , a second determination module 620 , a third determination module 630 and a frame insertion module 640 .
  • the first determination module 610 is configured to determine the first optical flow from the first video frame to the second video frame and The second optical flow from the second video frame to the first video frame, the first video frame and the second video frame are two adjacent initial video frames;
  • the second determination module 620 is used to Determine a third optical flow from the intermediate frame to the first video frame and a fourth optical flow from the intermediate frame to the second video frame based on the first optical flow and the second optical flow, wherein,
  • the intermediate frame is an estimated video frame to be inserted between the first video frame and the second video frame;
  • the third determination module 630 is configured to, according to the first video frame, the second video frame , the third optical flow and the fourth optical flow determine the intermediate frame;
  • the frame insertion module 640 is configured to insert the intermediate frame into the first video frame and the initial video in the initial video Between the second video frames, the target video is obtained.
  • the second determination module 620 includes:
  • a first determining unit configured to determine the first video frame to the second video frame based on the first optical flow, the second optical flow, and the movement track of the target object in the first video frame and the second video frame.
  • the second determination unit is configured to determine the fifth optical flow based on the fifth optical flow through an optical flow inversion technique a third optical flow, determining the fourth optical flow based on the sixth optical flow.
  • the second determination unit includes:
  • the first determining subunit is configured to, if the first pixel point on the intermediate frame has a preset relationship with the only second pixel point on the first video frame, then the first pixel point from the intermediate frame
  • the first optical flow vector to the first video frame is the inverse vector of the second optical flow vector of the second pixel from the first video frame to the intermediate frame, wherein the fifth optical flow Including the second optical flow vector; if the first pixel on the intermediate frame has a preset relationship with at least two second pixels on the first video frame, then the first optical flow vector is the The weighted average of the inverse vector of the second optical flow vector of the at least two second pixels from the first video frame to the intermediate frame; if there is no connection with the intermediate frame on the first video frame
  • the first pixel point on the second pixel point has a preset relationship, then the first optical flow vector is 0; each of the first pixel points on the intermediate frame is from the intermediate frame to the first pixel point
  • the first optical flow vectors of a video frame form the third
  • the second determination unit further includes:
  • the second determining subunit is configured to, if the third pixel point on the intermediate frame has a preset relationship with the only fourth pixel point on the second video frame, then the third pixel point from the intermediate frame
  • the third optical flow vector to the second video frame is the inverse vector of the fourth optical flow vector from the second video frame to the intermediate frame of the fourth pixel, wherein the sixth optical flow Including the fourth optical flow vector; if the third pixel on the intermediate frame has a preset relationship with at least two fourth pixels on the second video frame, then the third optical flow vector is the The weighted average of the inverse vector of the fourth optical flow vector of the at least two fourth pixels from the second video frame to the intermediate frame; if there is no connection with the intermediate frame on the second video frame
  • the third pixel point on the fourth pixel point has a preset relationship, then the third optical flow vector is 0; each of the third pixel points on the intermediate frame is from the intermediate frame to the second video
  • the third optical flow vectors of the frame constitute the fourth optical flow.
  • the quantized neural network includes a cascaded encoder module, a decoder module, and two optical flow prediction branches; wherein, the encoder module includes a downsampling unit and an encoding unit, and the downsampling unit is used for respectively downsampling the input first video frame and the second video frame, and input the downsampled image of the first video frame and the downsampled image of the second video frame to the encoding unit , so that the encoding unit performs feature extraction based on the down-sampled image to obtain the encoding of the feature image; the decoder module includes a decoding unit and an up-sampling unit, and the decoding unit is used to encode the feature image decoding, and input the decoded feature image to the up-sampling unit, so that the up-sampling unit performs up-sampling on the decoded feature image, and input the obtained up-sampling image to the two an optical flow prediction branch, such that one of the two optical flow prediction branches predicts the first optical
  • the third determining module 630 includes:
  • a transformation unit configured to perform image affine transformation on the first video frame to obtain a first transformed frame of the first video frame at the time of acquiring the intermediate frame; perform image radiation on the second video frame transforming to obtain a second transformed frame of the second video frame at the acquisition moment of the intermediate frame.
  • a prediction unit configured to use a preset neural network based on the downsampled image corresponding to the first video frame, the downsampled image corresponding to the second video frame, the first transformed frame, the second transformed frame, the The third optical flow and the fourth optical flow predict fusion weights of the first transformed frame and the second transformed frame.
  • a fusion unit configured to fuse pixels in the first transformed frame and the second transformed frame based on the fusion weight to obtain the intermediate frame, where the fusion weight indicates that a pixel on the intermediate frame comes from The probability of the first video frame or from the second video frame.
  • the video frame adjustment device also includes:
  • a determination module configured to, before inserting the intermediate frame between the first video frame and the second video frame, based on the motion of the target object in the first video frame and the second video frame feature and/or color feature for determining whether it is appropriate to insert said intermediate frame between said first video frame and said second video frame, if determined between said first video frame and said second video frame If it is suitable to insert the intermediate frame, continue to execute the step of inserting the intermediate frame between the first video frame and the second video frame.
  • the motion characteristics of the target object in the first video frame and the second video frame include at least one of the following: consistency between the third optical flow and the fourth optical flow;
  • the color features of the target object in the first video frame and the second video frame include: a grayscale difference between the first transformed frame and the second transformed frame, wherein, The first transformed frame is obtained by performing image affine transformation on the first video frame, and the second transformed frame is obtained by performing image radial transformation on the second video frame.
  • the determination module specifically It is used for: for linear motion, moving from the intermediate frame to the forward motion vector of the first video frame according to the target pixel on the intermediate frame, and moving the target pixel from the intermediate frame to the The backward motion vector of the second video frame determines the linear motion offset distance; counts the proportion of pixels whose linear motion offset distance is greater than the first set threshold; if the proportion of the number of pixels is less than or equal to the second set threshold threshold, then determine the first corresponding video image.
  • the video frame adjustment device uses a quantized neural network to estimate the motion of the target object in two adjacent video frames, so that the video frame adjustment method can produce better frame interpolation effects for complex motion scenes, Guarantee the final playback effect of the video; and through the design of an efficient quantitative neural network, the video frame adjustment method can be run in real time on the mobile terminal; the mask network is used to predict the occluded image, making the video frame adjustment method more robust, and the integrated The middle frame is more natural and realistic.
  • an adaptive frame interpolation judgment algorithm before interpolating frames, it is first judged whether it is suitable to insert an intermediate frame between two adjacent video frames. If it is not suitable, an intermediate frame is not inserted between the two video frames to avoid introducing Motion artifacts achieve the purpose of ensuring the quality of video images while increasing the video frame rate.
  • the video frame adjustment device provided by the embodiment of the present disclosure can execute the steps in the video frame adjustment method provided by the method embodiment of the present disclosure, and has execution steps and beneficial effects, which will not be repeated here.
  • FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure. Referring to FIG. 7 in detail below, it shows a schematic structural diagram of an electronic device 500 suitable for implementing an embodiment of the present disclosure.
  • the electronic device 500 in the embodiment of the present disclosure may include, but is not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals), wearable electronic devices, etc., and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
  • the electronic device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 500 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 501, which may be randomly accessed according to a program stored in a read-only memory (ROM) 502 or loaded from a storage device 508.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM read-only memory
  • various appropriate actions and processes are executed by programs in the memory (RAM) 503 to implement the methods of the embodiments described in the present disclosure.
  • RAM 503 various programs and data necessary for the operation of the electronic device 500 are also stored.
  • the processing device 501, ROM 502, and RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to the bus 504 .
  • the following devices can be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 507 such as a computer; a storage device 508 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509.
  • the communication means 509 may allow the electronic device 500 to perform wireless or wired communication with other devices to exchange data. While FIG. 7 shows electronic device 500 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, and the computer program includes program code for executing the method shown in the flow chart, thereby realizing the above The video frame adjustment method.
  • the computer program may be downloaded and installed from a network via communication means 509, or from storage means 508, or from ROM 502.
  • the processing device 501 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to electrical wires, optical fiber cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a digital video disc (digital video disc, DVD)), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
  • a magnetic medium such as a floppy disk, a hard disk, or a magnetic tape
  • an optical medium such as a digital video disc (digital video disc, DVD)
  • a semiconductor medium such as a solid state disk (solid state disk, SSD)
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:
  • the first optical flow from the first video frame to the second video frame and the first optical flow from the second video frame to the first video frame are determined by quantizing the neural network.
  • the second optical flow of a video frame, the first video frame and the second video frame are two adjacent initial video frames; based on the first optical flow and the second optical flow to determine the intermediate frame to The third optical flow of the first video frame, and the fourth optical flow from the intermediate frame to the second video frame, wherein the intermediate frame is to be inserted between the first video frame and the first video frame
  • the electronic device may also perform other steps described in the above embodiments.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the present disclosure provides a video frame adjustment method, the method includes: based on the first video frame and the second video frame in the initial video, by quantizing the neural network, determining the first A first optical flow from a video frame to the second video frame and a second optical flow from the second video frame to the first video frame, the first video frame and the second video frame are two adjacent initial video frames; determine the third optical flow from the intermediate frame to the first video frame based on the first optical flow and the second optical flow, and from the intermediate frame to the second video frame The fourth optical flow, wherein, the intermediate frame is an estimated video frame to be inserted between the first video frame and the second video frame; according to the first video frame, the second video frame , the third optical flow and the fourth optical flow determine the intermediate frame; in the initial video, inserting the intermediate frame between the first video frame and the second video frame, Get the target video.
  • the third optical flow of the video frame, and the fourth optical flow from the intermediate frame to the second video frame include: based on the first optical flow, the second optical flow and the target object in the first
  • the motion trajectory of the video frame and the second video frame determines the fifth optical flow from the first video frame to the intermediate frame, and the sixth optical flow from the second video frame to the intermediate frame;
  • a flow reversal technology determining the third optical flow based on the fifth optical flow, and determining the fourth optical flow based on the sixth optical flow.
  • determining the third optical flow based on the fifth optical flow by using an optical flow inversion technique includes: If the first pixel on the intermediate frame has a preset relationship with the only second pixel on the first video frame, then the first pixel from the intermediate frame to the first video frame The first optical flow vector is the inverse vector of the second optical flow vector of the second pixel from the first video frame to the intermediate frame, wherein the fifth optical flow includes the second optical flow vector ; If the first pixel on the intermediate frame has a preset relationship with at least two second pixels on the first video frame, then the first optical flow vector is the at least two second pixels Respectively from the first video frame to the weighted average of the inverse vector of the second optical flow vector of the intermediate frame; Assuming the second pixel of the relationship, the first optical flow vector is 0; the first optical flow direction of each first pixel on the intermediate frame from the intermediate frame to the first video frame Amounts make up the third
  • the determining the fourth optical flow based on the sixth optical flow includes: if the intermediate frame is The third pixel point has a preset relationship with the unique fourth pixel point on the second video frame, then the third optical flow vector of the third pixel point from the intermediate frame to the second video frame is The inverse vector of the fourth optical flow vector of the fourth pixel from the second video frame to the intermediate frame, wherein the sixth optical flow includes the fourth optical flow vector; if the intermediate frame The third pixel on the second video frame has a preset relationship with at least two fourth pixel points on the second video frame, then the third optical flow vector is the at least two fourth pixel points from the second The weighted average of the inverse vector of the fourth optical flow vector from the video frame to the intermediate frame; if there is no fourth pixel with a preset relationship with the third pixel on the intermediate frame on the second video frame , then the third optical flow vector is 0; the third optical flow vector
  • the quantized neural network includes a cascaded encoder module, a decoder module, and two optical flow prediction branches; wherein , the encoder module includes a downsampling unit and an encoding unit, the downsampling unit is used to respectively downsample the input first video frame and the second video frame, and convert the first video frame
  • the down-sampled image of the down-sampled image and the down-sampled image of the second video frame are input to the encoding unit, so that the encoding unit performs feature extraction based on the down-sampled image to obtain the encoding of the feature image
  • the decoder module includes a decoding unit and an up-sampling unit, the decoding unit is used to decode the encoding of the feature image, and input the decoded feature image to the up-sampling unit, so that the up-sampling unit can decode the decoded Upsampling of
  • the video frame adjustment method optionally, according to the first video frame, the second video frame, the third optical flow, and the The fourth optical flow determining the intermediate frame includes: performing image affine transformation on the first video frame to obtain the first transformed frame of the first video frame at the acquisition moment of the intermediate frame; Perform image radial transformation on the second video frame to obtain a second transformed frame of the second video frame at the moment of acquiring the intermediate frame; through a preset neural network based on the downsampled image corresponding to the first video frame, The downsampled image corresponding to the second video frame, the first transformed frame, the second transformed frame, the third optical flow, and the fourth optical flow predict the first transformed frame and the first transformed frame Fusion weights of two transformed frames; based on the fusion weights, pixels in the first transformed frame and the second transformed frame are fused to obtain the intermediate frame, and the fusion weight represents a pixel on the intermediate frame The probability that a pixel is from the first video frame or from
  • the video frame adjustment method provided in the present disclosure, optionally, before inserting the intermediate frame between the first video frame and the second video frame , further comprising: based on the motion characteristics and/or color characteristics of the target object in the first video frame and the second video frame, determining whether it is suitable between the first video frame and the second video frame Inserting the intermediate frame, if it is determined that the intermediate frame is suitable for insertion between the first video frame and the second video frame, then continue to perform the inserting the intermediate frame into the first video frame and the second video frame steps between the second video frames.
  • the motion characteristics of the target object in the first video frame and the second video frame include the following At least one of: the consistency between the third optical flow and the fourth optical flow; the number of optical flow hole points in the intermediate frame, if there is no such thing in the first video frame and the second video frame A pixel point having a preset relationship with a specific pixel point in the intermediate frame, then the specific pixel point is determined to be an optical flow hole point; the color of the target object in the first video frame and the second video frame
  • the features include: the gray level difference between the first transformed frame and the second transformed frame, wherein the first transformed frame is obtained by performing image affine transformation on the first video frame, and the second transformed frame is obtained by performing image affine transformation on the first video frame The second video frame is obtained by image radial transformation.
  • the motion feature of the target object in the first video frame and the second video frame is the consistency between the third optical flow and the fourth optical flow, and based on the consistency between the third optical flow and the fourth optical flow, it is determined between the first video frame and the second video frame Whether it is suitable to insert the intermediate frame, including: for linear motion, according to the forward motion vector of the target pixel point on the intermediate frame moving from the intermediate frame to the first video frame, and the target pixel point Determine the linear motion offset distance from the backward motion vector moving from the intermediate frame to the second video frame; count the proportion of pixels whose linear motion offset distance is greater than the first set threshold; if the pixel number If the proportion is less than or equal to the second set threshold, then it is determined that the first corresponding video image.
  • the present disclosure provides an apparatus for adjusting video frames, the apparatus including: a first determination module, configured to, based on the first video frame and the second video frame in the initial video, quantify A neural network, determining a first optical flow from the first video frame to the second video frame and a second optical flow from the second video frame to the first video frame, the first video frame and the first video frame
  • the second video frame is two adjacent initial video frames
  • the second determination module is used to determine the third light from the intermediate frame to the first video frame based on the first optical flow and the second optical flow flow, and the fourth optical flow from the intermediate frame to the second video frame, wherein the intermediate frame is an estimated video frame to be inserted between the two initial video frames
  • the third determination module uses Determining the intermediate frame according to the first video frame, the second video frame, the third optical flow and the fourth optical flow
  • a frame interpolation module configured to, in the initial video, insert the The video image is inserted between the two initial video frames to obtain the target video.
  • the second determining module includes: a first determining unit, configured to based on the first optical flow, the second The optical flow and the motion track of the target object in the first video frame and the second video frame determine the fifth optical flow from the first video frame to the intermediate frame, and the fifth optical flow from the second video frame to the The sixth optical flow of the intermediate frame; the second determination unit is configured to determine the third optical flow based on the fifth optical flow and determine the fourth optical flow based on the sixth optical flow by using an optical flow reversal technology .
  • the second determining unit includes: a first determining subunit, configured to, if the first The pixel point has a preset relationship with the only second pixel point on the first video frame, then the first optical flow vector of the first pixel point from the intermediate frame to the first video frame is the first optical flow vector of the first video frame The inverse vector of the second optical flow vector of two pixels from the first video frame to the intermediate frame, wherein the fifth optical flow includes the second optical flow vector; if the fifth optical flow on the intermediate frame A pixel point has a preset relationship with at least two second pixel points on the first video frame, then the first optical flow vector is the at least two second pixel points from the first video frame to The weighted average of the inverse vector of the second optical flow vector of the intermediate frame; if there is no second pixel on the first video frame that has a preset relationship with the first pixel on the intermediate frame, then The first optical flow vector is 0
  • the second determination unit further includes: a second determination subunit, configured to, if the second determination subunit on the intermediate frame The three pixels have a preset relationship with the only fourth pixel on the second video frame, and the third optical flow vector of the third pixel from the intermediate frame to the second video frame is the The inverse vector of the fourth optical flow vector of the fourth pixel from the second video frame to the intermediate frame, wherein the sixth optical flow includes the fourth optical flow vector; if the intermediate frame The third pixel point has a preset relationship with at least two fourth pixel points on the second video frame, then the third optical flow vector is the at least two fourth pixel points from the second video frame respectively To the weighted average of the inverse vector of the fourth optical flow vector of the intermediate frame; if there is no fourth pixel with a preset relationship with the third pixel on the intermediate frame on the second video frame, then The third optical flow vector 0; the third optical flow vector of each third
  • the quantized neural network includes a cascaded encoder module, a decoder module, and two optical flow prediction branches; wherein , the encoder module includes a downsampling unit and an encoding unit, the downsampling unit is used to respectively downsample the input first video frame and the second video frame, and convert the first video frame
  • the down-sampled image of the down-sampled image and the down-sampled image of the second video frame are input to the encoding unit, so that the encoding unit performs feature extraction based on the down-sampled image to obtain the encoding of the feature image
  • the decoder module includes a decoding unit and an up-sampling unit, the decoding unit is used to decode the encoding of the feature image, and input the decoded feature image to the up-sampling unit, so that the up-sampling unit can decode the decoded Upsampling of
  • the third determination module includes: a transformation unit, configured to perform image affine transformation on the first video frame to obtain The first converted frame of the first video frame at the acquisition moment of the intermediate frame; performing image radial transformation on the second video frame to obtain the second video frame at the acquisition moment of the intermediate frame The second transform frame.
  • a prediction unit configured to use a preset neural network based on the downsampled image corresponding to the first video frame, the downsampled image corresponding to the second video frame, the first transformed frame, the second transformed frame, the The third optical flow and the fourth optical flow predict fusion weights of the first transformed frame and the second transformed frame.
  • a fusion unit configured to fuse pixels in the first transformed frame and the second transformed frame based on the fusion weight to obtain the intermediate frame, where the fusion weight indicates that a pixel on the intermediate frame comes from The probability of the first video frame or from the second video frame.
  • the video frame adjustment device may optionally further include: a determination module, configured to insert the intermediate frame into the first video frame and the Before the second video frame, based on the motion feature and/or color feature of the target object in the first video frame and the second video frame, determine whether the target object is in the first video frame and the second video frame Whether it is suitable to insert the intermediate frame between the first video frame and the second video frame, if it is determined that the intermediate frame is suitable for insertion between the first video frame and the second video frame, then continue to execute the step of inserting the intermediate frame into the second video frame. A step between a video frame and the second video frame.
  • the motion characteristics of the target object in the first video frame and the second video frame include the following At least one of: the consistency between the third optical flow and the fourth optical flow; the number of optical flow hole points in the intermediate frame, if there is no such thing in the first video frame and the second video frame A pixel point having a preset relationship with a specific pixel point in the intermediate frame, then the specific pixel point is determined to be an optical flow hole point; the color of the target object in the first video frame and the second video frame
  • the features include: the gray level difference between the first transformed frame and the second transformed frame, wherein the first transformed frame is obtained by performing image affine transformation on the first video frame, and the second transformed frame is obtained by performing image affine transformation on the first video frame The second video frame is obtained by image radial transformation.
  • the determination module is specifically configured to: for linear motion, move from the intermediate frame to the first video according to the target pixel on the intermediate frame The forward motion vector of the frame, and the backward motion vector of the target pixel moving from the intermediate frame to the second video frame determine the linear motion offset distance; the statistical linear motion offset distance is greater than the first set The ratio of the number of pixels with a predetermined threshold; if the ratio of the number of pixels is less than or equal to a second threshold, determine the first corresponding video image.
  • the present disclosure provides an electronic device, including: one or more processors; a memory for storing one or more programs; when the one or more programs are executed The one or more processors are executed, so that the one or more processors implement the video frame adjustment method provided in any embodiment of the present disclosure.
  • the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, any one of the methods described in the embodiments of the present disclosure can be realized.
  • Video frame adjustment method any one of the methods described in the embodiments of the present disclosure can be realized.
  • An embodiment of the present disclosure further provides a computer program product, where the computer program product includes a computer program or an instruction, and when the computer program or instruction is executed by a processor, the video frame adjustment method as described above is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开实施例公开了一种视频帧调整方法、装置、电子设备和存储介质,该方法包括:基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定第一视频帧到第二视频帧的第一光流以及第二视频帧到第一视频帧的第二光流;基于第一光流以及第二光流确定中间帧到第一视频帧的第三光流以及中间帧到第二视频帧的第四光流,根据所述两个初始视频帧、第三光流和第四光流确定中间帧;将中间帧插入至所述两个初始视频帧之间。通过本公开实施例提供的视频帧调整方法,实现了在相邻两个视频帧之间进行插帧的目的,同时可保证插帧后的高帧率视频的播放效果。

Description

视频帧调整方法、装置、电子设备和存储介质
相关申请的交叉引用
本申请要求于2021年08月16日提交的,申请号为202110939314.8、发明名称为“视频帧调整方法、装置、电子设备和存储介质”的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及视频技术领域,尤其涉及一种视频帧调整方法、装置、电子设备和存储介质。
背景技术
视频是互联网信息传播的重要媒介,视频的色彩、帧率、清晰度等因素都会影响视频的播放效果,进而影响用户的观看体验。其中,视频的帧率越高,则视频播放越流畅,用户的观看体验越好。
随着视频播放设备硬件的发展,越来越多的视频播放设备支持高帧率视频的播放。
但是,通过目前的视频帧率提升技术实现的视频帧率提升效果并不理想。
发明内容
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开实施例提供了一种视频帧调整方法、装置、电子设备和存储介质,实现了在相邻两个视频帧之间进行插帧的目的,同时可保证插帧后的高帧率视频的播放效果。
第一方面,本公开实施例提供了一种视频帧调整方法,该方法包括:
基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;
基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;
根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;
在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得 到目标视频。
第二方面,本公开实施例还提供了一种视频帧调整装置,该装置包括:
第一确定模块,用于基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;
第二确定模块,用于基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;
第三确定模块,用于根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;
插帧模块,用于在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
第三方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上所述的视频帧调整方法。
第四方面,本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上所述的视频帧调整方法。
第五方面,本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如上所述的视频帧调整方法。
本公开实施例提供的技术方案至少具有如下优点:
本公开实施例提供的视频帧调整方法,基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧,使得本公开实施例提供的视频帧调整方法可以运行在移动设备上,且针对大运动场景的视频更具备鲁棒性。基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间。实现了在相邻两个视频帧之间进行插帧的目的,同时可保证插帧后的高帧率视频的播放效果。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例中的一种视频帧调整方法的流程图;
图2为本公开实施例中的一种量化神经网络的结构示意图;
图3为本公开实施例中的一种视频帧之间的时序关系示意图;
图4为本公开实施例中的一种第二像素点P从第一视频帧I_0到中间帧I_t的第二光流向量
Figure PCTCN2022112783-appb-000001
以及第一像素点Q从中间帧I_t到第一视频帧I_0的第一光流向量
Figure PCTCN2022112783-appb-000002
的示意图;
图5为本公开实施例中的一种通过预设神经网络预测遮挡图像的流程示意图;
图6为本公开实施例中的一种视频帧调整装置的结构示意图;
图7为本公开实施例中的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目 的,而并不是用于对这些消息或信息的范围进行限制。
图1为本公开实施例中的一种视频帧调整方法的流程图,该方法可以由视频帧调整装置执行,该装置可以采用软件和/或硬件的方式实现,该装置可配置于电子设备中,例如终端,该终端具体包括但不限于智能手机、掌上电脑、平板电脑、带显示屏的可穿戴设备、台式机、笔记本电脑、一体机、带显示屏的智能家居设备等。
如图1所示,该方法具体可以包括如下步骤:
步骤110、基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧。
其中,初始视频指通过拍摄设备拍摄或者录制获得的原始视频,换言之,初始视频指未经插帧处理的视频。初始视频中的视频帧为初始视频帧。两个相邻视频帧指时间相邻的两个视频帧,例如在0时刻采集到第一视频帧,在1时刻采集到第二视频帧,则第一视频帧和第二视频帧为时间上相邻的两个视频帧。
量化神经网络可以指采用INT8(8位的定点整数)存储模型参数的神经网络。量化的含义通常指将神经网络的浮点算法转换为定点,以使神经网络可以在移动设备(典型的例如智能手机)上实现实时运行,兼顾移动设备在内存方面的性能局限性。通过采用量化神经网络对相邻两个视频帧中目标对象的运动进行估计,使得本实施例提供的视频帧调整方法针对复杂运动场景能够产生更好的插帧效果,保证插帧后获得的目标视频的播放效果;且通过设计高效的量化神经网络,使得视频帧调整方法能够实时运行在移动端,具体的是能够在支持量化计算的移动端处理器上实时运行,通过移动端处理器进行实时的视频帧调整,达到提高视频帧率的目的,使得视频的播放更加流畅。
在一种具体实施方式中,参考如图2所示的一种量化神经网络的结构示意图,所述量化神经网络包括级联的编码器模块210、解码器模块220和两个光流预测分支230,进一步的可以将两个光流预测分支230具体划分为第一预测分支231和第二预测分支232。其中,编码器模块210包括下采样单元211和编码单元212,下采样单元211用于分别对输入的两个视频帧200(即第一视频帧和第二视频帧)进行下采样,并将两个视频帧200分别对应的下采样图像(即第一视频帧的下采样图像以及第二视频帧的下采样图像)输入至编码单元212,以使编码单元212基于下采样图像进行特征提取,获得特征图像的编码,并将该编码发送至解码器模块220。解码器模块220包括解码单元和上采样单元,解码单元用于对所述特征图像的编码进行解码,并将解码后的特征图像输入至上采样单元,以使上采样单元对所述解码后的特征图像进行上采样,并将获得的上采样图像分别输入至两个光流预测分支230,以使两个光流预测分支230分别基于所述上采样图像预测第一视频帧到第二 视频帧的第一光流,以及第二视频帧到第一视频帧的第二光流。例如通过第一预测分支231预测从第一视频帧到第二视频帧的第一光流flow01,通过第二预测分支232预测从第二视频帧到第一视频帧的第二光流flow10。即编码器模块210的输入是第一视频帧I_0和第二视频帧I_1,在通过编码单元212进行特征提取之前,通过下采样单元211对第一视频帧I_0和第二视频帧I_1进行下采样,以提高神经网络的感受野,这样设置的好处是能够使神经网络对大运动场景下的运动估计更具备鲁棒性,同时在小分辨率图像上进行神经网络推理也能够提升推理速度。为了使神经网络较高效地运行在移动设备上,在训练神经网络时以量化的方式进行,例如最终可以获得一个INT8类型的量化神经网络,其实质是采用INT8类型(8位的定点整数)存储模型参数。
步骤120、基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧。
在一种实施方式中,基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,包括:基于所述第一光流、所述第二光流以及目标对象在所述第一视频帧与所述第二视频帧的运动轨迹确定所述第一视频帧到所述中间帧的第五光流,以及所述第二视频帧到所述中间帧的第六光流;通过光流逆转技术,基于所述第五光流确定所述第三光流,基于所述第六光流确定所述第四光流。。具体的,假设第一视频帧为I_0和第二视频帧为I_1,中间帧为0-1时刻之间某个时刻t的视频帧,标记为I_t,对应的可以参考如图3所示的一种视频帧之间的时序关系示意图。将第一视频帧I_0到中间帧I_t的光流标记为第五光流,第二视频帧I_1到中间帧I_t的光流标记为第六光流。以目标对象在第一视频帧I_0和第二视频帧I_1之间的运动是线性运动为例,第一视频帧I_0到中间帧I_t的第五光流flow0t=flow01*t,第二视频帧I_1到中间帧I_t的光流第六光流flow1t=flow10*(1-t)。其中,flow01表示从第一视频帧I_0到第二视频帧I_1的第一光流,flow10表示从第二视频帧I_1到第一视频帧I_0的第二光流,具体可以通过上述步骤110中的量化神经网络获得。在得到第一视频帧I_0到中间帧I_t的第五光流flow0t,以及第二视频帧I_1到中间帧I_t的光流第六光流flow1t之后,通过光流逆转技术可以确定中间帧I_t到第一视频帧I_0的第三光流flowt0和中间帧I_t到第二视频帧I_1的第四光流flowt1。生成中间帧I_t到第一视频帧I_0的第三光流和中间帧I_t到第二视频帧I_1的第四光流的目的是为了保证中间帧I_t上的每个像素,在第一视频帧I_0和第二视频帧I_1上都有与之对应的像素,以确保视频画面在中间帧I_t、第一视频帧I_0和第二视频帧I_1的连贯性,从而获得较好的插帧效果。
在一种实施方式中,所述通过光流逆转技术,基于所述第五光流和所述第六光流确定 所述第三光流,包括:
若中间帧上的第一像素点与第一视频帧上唯一的第二像素点具有预设关系,则第一像素点从中间帧到第一视频帧的第一光流向量为第二像素点从第一视频帧到中间帧的第二光流向量的逆向量,其中,第五光流包括第二光流向量。中间帧上的第一像素点与第一视频帧上唯一的第二像素点具有预设关系的实质是:第一视频帧上唯一的一个第二像素点从0时刻运动到t时刻时,到达中间帧上的第一像素点的位置。
若中间帧上的第一像素点与第一视频帧上至少两个第二像素点具有预设关系,则第一光流向量为所述至少两个第二像素点分别从第一视频帧到中间帧的第二光流向量的逆向量的加权平均值。中间帧上的第一像素点与第一视频帧上至少两个第二像素点具有预设关系的实质是:第一视频帧上的至少两个第二像素点从0时刻运动到t时刻时,到达中间帧上的第一像素点的位置。若第一视频帧上不存在与中间帧上的第一像素点具有预设关系的第二像素点,则第一光流向量为0。第一视频帧上不存在与中间帧上的第一像素点具有预设关系的第二像素点的实质是:第一视频帧上的所有像素点从0时刻运动到t时刻时,都没有到达中间帧上的第一像素点的位置。中间帧上的每个第一像素点从中间帧到第一视频帧的第一光流向量组成第三光流。
具体的,对于第一视频帧I_0上的第二像素点P,通过其从第一视频帧I_0到中间帧I_t的第二光流向量
Figure PCTCN2022112783-appb-000003
可计算出第二像素点P从第一视频帧I_0到中间帧I_t时对应的第一像素点Q的位置,因此第一像素点Q从时刻t到时刻0的光流向量即为
Figure PCTCN2022112783-appb-000004
即第一像素点Q从中间帧I_t到第一视频帧I_0的第一光流向量为第二像素点P从第一视频帧I_0到中间帧I_t的第二光流向量
Figure PCTCN2022112783-appb-000005
的逆向量,具体为:
Figure PCTCN2022112783-appb-000006
其中,
Figure PCTCN2022112783-appb-000007
表示所述第一光流向量,
Figure PCTCN2022112783-appb-000008
表示所述第二光流向量。对应的,可以参考如图4所示的一种第二像素点P从第一视频帧I_0到中间帧I_t的第二光流向量
Figure PCTCN2022112783-appb-000009
以及第一像素点Q从中间帧I_t到第一视频帧I_0的第一光流向量
Figure PCTCN2022112783-appb-000010
的示意图。
在一种实施方式中,可能会出现第一视频帧I_0上的多个第二像素点P的光流向量抵达中间帧I_t上同一个第一像素点Q,即所述第一像素点Q与至少两个所述第二像素点P对应,或者,换言之,中间帧I_t上的第一像素点Q与第一视频帧I_0上至少两个第二像素点P具有预设关系,此时在计算第一像素点Q从中间帧I_t到第一视频帧I_0的第一光流向量时,需要对抵达第一像素点Q的多个第二像素点P从第一视频帧I_0到中间帧I_t的第二光流向量
Figure PCTCN2022112783-appb-000011
的逆向量进行加权平均,具体为:
Figure PCTCN2022112783-appb-000012
其中,N表示以第一 像素点Q为终点的第二光流向量
Figure PCTCN2022112783-appb-000013
的个数。
在一种实施方式中,可能出现没有光流向量指向第一像素点Q的情况,即不存在与所述第一像素点Q对应的所述第二像素点P,或者,换言之,第一视频帧I_0上不存在与中间帧I_t上的第一像素点Q具有预设关系的第二像素点P,则所述第一光流向量为0,同时将第一像素点Q标记为光流空洞点。中间帧上每个所述第一像素点Q从中间帧到第一视频帧的所述第一光流向量组成所述第三光流。
确定第四光流的方式,与上述确定第三光流的方式类似,具体的,基于所述第六光流确定所述第四光流,包括:
若所述中间帧上的第三像素点与所述第二视频帧上唯一的第四像素点具有预设关系,则所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量为所述第四像素点从所述第二视频帧到所述中间帧的第四光流向量的逆向量,其中,所述第六光流包括所述第四光流向量。
若所述中间帧上的第三像素点与所述第二视频帧上至少两个第四像素点具有预设关系,则所述第三光流向量为所述至少两个第四像素点分别从所述第二视频帧到所述中间帧的第四光流向量的逆向量的加权平均值。若所述第二视频帧上不存在与所述中间帧上的第三像素点具有预设关系第四像素点,则所述第三光流向量0。所述中间帧上的每个所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量组成所述第四光流。
步骤130、根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧。
在一种实施方式中,所述根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧,包括:
对所述第一视频帧进行图像仿射变换,获得所述第一视频帧在所述中间帧的获取时刻时的第一变换帧;对所述第二视频帧进行图像放射变换,获得所述第二视频帧在所述中间帧的获取时刻时的第二变换帧;通过预设神经网络基于所述第一视频帧对应的下采样图像、所述第二视频帧对应的下采样图像、所述第一变换帧、所述第二变换帧、所述第三光流和所述第四光流预测所述第一变换帧和所述第二变换帧的融合权重;基于所述融合权重对所述第一变换帧和所述第二变换帧中的像素进行融合,获得所述中间帧,所述融合权重表示所述中间帧上的一像素来自所述第一视频帧或者来自所述第二视频帧的概率。
其中,对第一视频帧I_0进行图像仿射变换,即对第一视频帧I_0进行warp变换,获得第一视频帧在中间帧的获取时刻t时的第一变换帧I_t 0。对第二视频帧I_1进行图像仿射变换,即对第二视频帧I_1进行warp变换,获得第二视频帧在中间帧的获取时刻t时的第二变换帧I_t 1。图像仿射变换的目的是估算出第一视频帧I_0在时刻t时的视频帧,第二视 频帧I_1在时刻t的视频帧,以为获得中间帧I_t提供数据源。
为了得到中间帧I_t,需要推理出中间帧中的某个像素点是来自第一视频帧I_0,还是来自第二视频帧I_1,针对该问题,在一种实施方式中,设计了一种预设神经网络来预测一个遮挡图像mask(即第一变换帧和第二变换帧的融合权重),遮挡图像mask中像素值的取值范围是0-1,其表示该像素点是来自第一视频帧I_0的概率,像素值越接近1,表示该像素点来自第一视频帧I_0的概率越大。所述预设神经网络的输入包括:第一变换帧I_t 0、第二变换帧I_t 1、第三光流flowt0、第四光流flowt1、第一视频帧I_0对应的下采样图像以及第二视频帧I_1对应的下采样图像,预设神经网络的输出为遮挡图像mask。概括性地,通过预设神经网络基于第一视频帧对应的下采样图像、第二视频帧对应的下采样图像、第一变换帧、第二变换帧、第三光流和第四光流预测第一变换帧和第二变换帧的融合权重;基于所述融合权重对第一变换帧和第二变换帧中的像素进行融合,获得中间帧,融合权重表示中间帧上的一像素来自第一视频帧或者来自第二视频帧的概率。对应的,可以参考如图5所示的一种通过预设神经网络预测遮挡图像的流程示意图,将第一视频帧对应的下采样图像、第二视频帧对应的下采样图像、第一变换帧、第二变换帧、第三光流和第四光流输入至mask网络,获得遮挡图像,即第一变换帧和第二变换帧的融合权重。
进一步的,基于融合权重(即遮挡图像中的像素值)对第一变换帧和第二变换帧中的像素进行融合,获得中间帧,包括:基于如下算式获得中间帧:
Figure PCTCN2022112783-appb-000014
其中,I t表示中间帧,mask表示遮挡图像,
Figure PCTCN2022112783-appb-000015
表示第一变换帧,
Figure PCTCN2022112783-appb-000016
表示第二变换帧,符号“
Figure PCTCN2022112783-appb-000017
”表示逐像素乘法。
步骤140、在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
本实施例提供的视频帧调整方法,通过采用量化神经网络对相邻两个视频帧中目标对象的运动进行估计,使得该视频帧调整方法针对复杂运动场景能够产生更好的插帧效果,保证视频最终的播放效果;且通过设计高效的量化神经网络,使得该视频帧调整方法能够实时运行在移动端;通过mask网络预测遮挡图像,使得视频帧调整方法更具备鲁棒性,融合出的中间帧更加自然、逼真。
可以理解的是,并不是所有相邻的两个视频帧之间都适合插入中间帧,例如镜头切换或者剧烈运动场景下通常无法估算出合理的中间帧,因此为了保证插帧后目标视频的播放效果,在上述实施方案的基础上,在一种实施方式中,在将中间帧插入至第一视频帧与第二视频帧之间之前,还包括:基于目标对象在第一视频帧与第二视频帧中的运动特征和/或 颜色特征,确定在第一视频帧和第二视频帧之间是否适合插入中间帧,若确定在第一视频帧和第二视频帧之间适合插入中间帧,则继续执行将中间帧插入至第一视频帧与第二视频帧之间的步骤。若经过判定确定在第一视频帧和第二视频帧之间不适合插入中间帧,则不执行上述将中间帧插入至第一视频帧与第二视频帧之间的操作,以避免在获得的目标视频中引入伪影,从而实现在提升视频播放流畅度的同时保证插帧后视频画面质量的目的。具体的,可通过运动特征分析,例如从颜色信息、运动信息中统计相关指标来判定是否适合在第一视频帧和第二视频帧之间插入中间帧。
进一步的,在一种实施方式中,目标对象在第一视频帧与第二视频帧中的运动特征包括下述至少一种:第三光流与第四光流的一致性;中间帧中光流空洞点的数量,若第一视频帧和第二视频帧中不存在与中间帧中特定像素点具有预设关系的像素点,则特定像素点被确定为光流空洞点,即第一视频帧I_0上不存在与中间帧I_t上的第一像素点Q具有预设关系的第二像素点P时,将第一像素点Q标记为光流空洞点。
目标对象在第一视频帧与第二视频帧中的颜色特征包括:第一变换帧与第二变换帧之间的灰度差,其中,第一变换帧通过对第一视频帧进行图像仿射变换获得,第二变换帧通过对第二视频帧进行图像放射变换获得。
具体的,若目标对象在第一视频帧与第二视频帧中的运动特征为第三光流与第四光流的一致性,基于第三光流与第四光流的一致性,判定在第一视频帧和第二视频帧之间是否适合插入中间帧,包括:
针对线性运动,根据中间帧上的目标像素点从中间帧运动到第一视频帧的前向运动向量,以及目标像素点从中间帧运动到第二视频帧的后向运动向量确定线性运动偏移距离;统计线性运动偏移距离大于第一设定阈值的像素数占比;若像素数占比小于或等于第二设定阈值,则确定在第一视频帧和第二视频帧之间适合插入中间帧;若像素数占比大于第二设定阈值,则确定在第一视频帧和第二视频帧之间不适合插入中间帧。通过计算中间帧I_t到第一视频帧I_0的第三光流flowt0和中间帧I_t到第二视频帧I_1的第四光流flowt1的一致性来判定光流是否可靠,在线性运动的假设下,中间帧上的某个像素点(即目标像素点)从中间帧运动到第一视频帧的前向运动向量,与目标像素点从中间帧运动到所述第二视频帧的后向运动向量应该是大小相等、方向相反。对于中间帧上的某一像素点Q,可通过其前向运动向量f t0与后向运动向量f t1计算出线性运动偏移距离:distance=||f t0+f t1|| 2,统计所有线性运动偏移距离大于第一设定阈值的像素点占比,即线性运动偏移距离大于第一设定阈值的像素点数量占中间帧上像素点总数的比例。如果所述占比大于第二设定阈值,则判定在所述第一视频帧和所述第二视频帧之间不适合插入中间帧。
在一种实施方式中,若第一变换帧与第二变换帧之间的灰度差大于第三设定阈值,则 判定在第一视频帧和第二视频帧之间不适合插入中间帧。
在一种实施方式中,在上述步骤120的一种实施方式中已经对中间帧中的光流空洞点进行了标记,这些光流空洞点往往发生在遮挡区域,统计光流空洞点的数量,数量越大,说明遮挡区域的面积越大,如果遮挡区域面积过大,插帧容易产生错误,因此为了保证视频画面质量,该种情况下则不在第一视频帧与第二视频帧之间进行插帧,以避免在插帧后获得的目标视频中引入伪影,在提升视频播放流畅度的同时保证了插帧后的视频画面质量。
图6为本公开实施例中的一种视频帧调整装置的结构示意图。本公开实施例所提供的视频帧调整装置可以配置于终端中。如图6所示,该视频帧调整装置具体包括:第一确定模块610、第二确定模块620、第三确定模块630和插帧模块640。
其中,第一确定模块610,用于基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;第二确定模块620,用于基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;第三确定模块630,用于根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;插帧模块640,用于在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
可选的,第二确定模块620包括:
第一确定单元,用于基于所述第一光流、所述第二光流以及目标对象在所述第一视频帧与所述第二视频帧的运动轨迹确定所述第一视频帧到所述中间帧的第五光流,以及所述第二视频帧到所述中间帧的第六光流;第二确定单元,用于通过光流逆转技术,基于所述第五光流确定所述第三光流,基于所述第六光流确定所述第四光流。
可选的,所述第二确定单元包括:
第一确定子单元,用于若所述中间帧上的第一像素点与所述第一视频帧上唯一的第二像素点具有预设关系,则所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量为所述第二像素点从所述第一视频帧到所述中间帧的第二光流向量的逆向量,其中,所述第五光流包括所述第二光流向量;若所述中间帧上的第一像素点与所述第一视频帧上至少两个第二像素点具有预设关系,则所述第一光流向量为所述至少两个第二像素点分别从所述第一视频帧到所述中间帧的第二光流向量的逆向量的加权平均值;若所述第一视频帧上不存在与所述中间帧上的第一像素点具有预设关系的第二像素点,则所述第一光流向量为0;所述中间帧上的每个所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量组 成所述第三光流。
可选的,所述第二确定单元还包括:
第二确定子单元,用于若所述中间帧上的第三像素点与所述第二视频帧上唯一的第四像素点具有预设关系,则所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量为所述第四像素点从所述第二视频帧到所述中间帧的第四光流向量的逆向量,其中,所述第六光流包括所述第四光流向量;若所述中间帧上的第三像素点与所述第二视频帧上至少两个第四像素点具有预设关系,则所述第三光流向量为所述至少两个第四像素点分别从所述第二视频帧到所述中间帧的第四光流向量的逆向量的加权平均值;若所述第二视频帧上不存在与所述中间帧上的第三像素点具有预设关系第四像素点,则所述第三光流向量0;所述中间帧上的每个所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量组成所述第四光流。
可选的,所述量化神经网络包括级联的编码器模块、解码器模块和两个光流预测分支;其中,所述编码器模块包括下采样单元和编码单元,所述下采样单元用于分别对输入的所述第一视频帧以及所述第二视频帧进行下采样,并将所述第一视频帧的下采样图像以及所述第二视频帧的下采样图像输入至所述编码单元,以使所述编码单元基于所述下采样图像进行特征提取,获得特征图像的编码;所述解码器模块包括解码单元和上采样单元,所述解码单元用于对所述特征图像的编码进行解码,并将解码后的特征图像输入至所述上采样单元,以使所述上采样单元对所述解码后的特征图像进行上采样,并将获得的上采样图像分别输入至所述两个光流预测分支,以使所述两个光流预测分支中的一个光流预测分支基于所述上采样图像预测所述第一光流,使所述两个光流预测分支中的另一个光流预测分支基于所述上采样图像预测所述第二光流。
可选的,第三确定模块630包括:
变换单元,用于对所述第一视频帧进行图像仿射变换,获得所述第一视频帧在所述中间帧的获取时刻时的第一变换帧;对所述第二视频帧进行图像放射变换,获得所述第二视频帧在所述中间帧的获取时刻时的第二变换帧。预测单元,用于通过预设神经网络基于所述第一视频帧对应的下采样图像、所述第二视频帧对应的下采样图像、所述第一变换帧、所述第二变换帧、所述第三光流和所述第四光流预测所述第一变换帧和所述第二变换帧的融合权重。融合单元,用于基于所述融合权重对所述第一变换帧和所述第二变换帧中的像素进行融合,获得所述中间帧,所述融合权重表示所述中间帧上的一像素来自所述第一视频帧或者来自所述第二视频帧的概率。
可选的,所述视频帧调整装置还包括:
判定模块,用于在将所述中间帧插入至所述第一视频帧与所述第二视频帧之间之前, 基于目标对象在所述第一视频帧与所述第二视频帧中的运动特征和/或颜色特征,确定在所述第一视频帧和所述第二视频帧之间是否适合插入所述中间帧,若确定在所述第一视频帧和所述第二视频帧之间适合插入所述中间帧,则继续执行所述将所述中间帧插入至所述第一视频帧与所述第二视频帧之间的步骤。可选的,所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征包括下述至少一种:所述第三光流与所述第四光流的一致性;所述中间帧中光流空洞点的数量,若所述第一视频帧和所述第二视频帧中不存在与中间帧中特定像素点具有预设关系的像素点,则所述特定像素点被确定为光流空洞点;所述目标对象在所述第一视频帧与所述第二视频帧中的颜色特征包括:第一变换帧与第二变换帧之间的灰度差,其中,所述第一变换帧通过对所述第一视频帧进行图像仿射变换获得,所述第二变换帧通过对所述第二视频帧进行图像放射变换获得。
可选的,若所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征为所述第三光流与所述第四光流的一致性,所述判定模块具体用于:针对线性运动,根据所述中间帧上的目标像素点从所述中间帧运动到所述第一视频帧的前向运动向量,以及所述目标像素点从所述中间帧运动到所述第二视频帧的后向运动向量确定线性运动偏移距离;统计所述线性运动偏移距离大于第一设定阈值的像素数占比;若所述像素数占比小于或等于第二设定阈值,则确定在所述第一对应的视频图像。
本公开实施例提供的视频帧调整装置,通过采用量化神经网络对相邻两个视频帧中目标对象的运动进行估计,使得该视频帧调整方法针对复杂运动场景能够产生更好的插帧效果,保证视频最终的播放效果;且通过设计高效的量化神经网络,使得该视频帧调整方法能够实时运行在移动端;通过mask网络预测遮挡图像,使得视频帧调整方法更具备鲁棒性,融合出的中间帧更加自然、逼真。通过加入自适应插帧判定算法,在插帧之前先判定相邻的两个视频帧之间是否适合插入中间帧,如果不适合,则不在该两个视频帧之间插入中间帧,以避免引入运动伪影,实现了在提升视频帧率的同时保证视频画面质量的目的。
本公开实施例提供的视频帧调整装置,可执行本公开方法实施例所提供的视频帧调整方法中的步骤,具备执行步骤和有益效果此处不再赘述。
图7为本公开实施例中的一种电子设备的结构示意图。下面具体参考图7,其示出了适于用来实现本公开实施例中的电子设备500的结构示意图。本公开实施例中的电子设备500可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)、可穿戴电子设备等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理以实现如本公开所述实施例的方法。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码,从而实现如上所述的视频帧调整方法。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF (射频)等等,或者上述的任意合适的组合。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
可选的,当上述一个或者多个程序被该电子设备执行时,该电子设备还可以执行上述实施例所述的其他步骤。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,本公开提供了一种视频帧调整方法,该方法包括: 基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
根据本公开的一个或多个实施例,在本公开提供的视频帧调整方法中,可选的,所述基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,包括:基于所述第一光流、所述第二光流以及目标对象在所述第一视频帧与所述第二视频帧的运动轨迹确定所述第一视频帧到所述中间帧的第五光流,以及所述第二视频帧到所述中间帧的第六光流;通过光流逆转技术,基于所述第五光流确定所述第三光流,基于所述第六光流确定所述第四光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,所述通过光流逆转技术,基于所述第五光流确定所述第三光流,包括:若所述中间帧上的第一像素点与所述第一视频帧上唯一的第二像素点具有预设关系,则所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量为所述第二像素点从所述第一视频帧到所述中间帧的第二光流向量的逆向量,其中,所述第五光流包括所述第二光流向量;若所述中间帧上的第一像素点与所述第一视频帧上至少两个第二像素点具有预设关系,则所述第一光流向量为所述至少两个第二像素点分别从所述第一视频帧到所述中间帧的第二光流向量的逆向量的加权平均值;若所述第一视频帧上不存在与所述中间帧上的第一像素点具有预设关系的第二像素点,则所述第一光流向量为0;所述中间帧上的每个所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量组成所述第三光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,所述基于所述第六光流确定所述第四光流,包括:若所述中间帧上的第三像素点与所述第二视频帧上唯一的第四像素点具有预设关系,则所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量为所述第四像素点从所述第二视频帧到所述中间帧的第四光流向量的逆向量,其中,所述第六光流包括所述第四光流向量;若所述中间帧上的第三像素点与所述第二视频帧上至少两个第四像素点具有预设关系,则所述第三光流向量为所述至少两个第四像素点分别从所述第二视频帧到所述中间帧的第四光流向量的逆向量的加权平均值;若所述第二视频帧上不存在与所述中间帧上的第三像素点具有预设关系第四像素点,则所述第 三光流向量0;所述中间帧上的每个所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量组成所述第四光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,所述量化神经网络包括级联的编码器模块、解码器模块和两个光流预测分支;其中,所述编码器模块包括下采样单元和编码单元,所述下采样单元用于分别对输入的所述第一视频帧以及所述第二视频帧进行下采样,并将所述第一视频帧的下采样图像以及所述第二视频帧的下采样图像输入至所述编码单元,以使所述编码单元基于所述下采样图像进行特征提取,获得特征图像的编码;所述解码器模块包括解码单元和上采样单元,所述解码单元用于对所述特征图像的编码进行解码,并将解码后的特征图像输入至所述上采样单元,以使所述上采样单元对所述解码后的特征图像进行上采样,并将获得的上采样图像分别输入至所述两个光流预测分支,以使所述两个光流预测分支中的一个光流预测分支基于所述上采样图像预测所述第一光流,使所述两个光流预测分支中的另一个光流预测分支基于所述上采样图像预测所述第二光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,所述根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧,包括:对所述第一视频帧进行图像仿射变换,获得所述第一视频帧在所述中间帧的获取时刻时的第一变换帧;对所述第二视频帧进行图像放射变换,获得所述第二视频帧在所述中间帧的获取时刻时的第二变换帧;通过预设神经网络基于所述第一视频帧对应的下采样图像、所述第二视频帧对应的下采样图像、所述第一变换帧、所述第二变换帧、所述第三光流和所述第四光流预测所述第一变换帧和所述第二变换帧的融合权重;基于所述融合权重对所述第一变换帧和所述第二变换帧中的像素进行融合,获得所述中间帧,所述融合权重表示所述中间帧上的一像素来自所述第一视频帧或者来自所述第二视频帧的概率。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,所述将所述中间帧插入至所述第一视频帧与所述第二视频帧之间之前,还包括:基于目标对象在所述第一视频帧与所述第二视频帧中的运动特征和/或颜色特征,确定在所述第一视频帧和所述第二视频帧之间是否适合插入所述中间帧,若确定在所述第一视频帧和所述第二视频帧之间适合插入所述中间帧,则继续执行所述将所述中间帧插入至所述第一视频帧与所述第二视频帧之间的步骤。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征包括下述至少一种:所述第三光流与所述第四光流的一致性;所述中间帧中光流空洞点的数量,若所述第一视频帧和所述第二视频帧中不存在与中间帧中特定像素点具有预设关系的像素点,则所述特定像素点被 确定为光流空洞点;所述目标对象在所述第一视频帧与所述第二视频帧中的颜色特征包括:第一变换帧与第二变换帧之间的灰度差,其中,所述第一变换帧通过对所述第一视频帧进行图像仿射变换获得,所述第二变换帧通过对所述第二视频帧进行图像放射变换获得。
根据本公开的一个或多个实施例,本公开提供的视频帧调整方法中,可选的,若所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征为所述第三光流与所述第四光流的一致性,基于所述第三光流与所述第四光流的一致性,判定在所述第一视频帧和所述第二视频帧之间是否适合插入所述中间帧,包括:针对线性运动,根据所述中间帧上的目标像素点从所述中间帧运动到所述第一视频帧的前向运动向量,以及所述目标像素点从所述中间帧运动到所述第二视频帧的后向运动向量确定线性运动偏移距离;统计所述线性运动偏移距离大于第一设定阈值的像素数占比;若所述像素数占比小于或等于第二设定阈值,则确定在所述第一对应的视频图像。
根据本公开的一个或多个实施例,本公开提供了一种视频帧调整装置,该装置包括:第一确定模块,用于基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;第二确定模块,用于基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述两个初始视频帧之间的估计视频帧;第三确定模块,用于根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;插帧模块,用于在所述初始视频中,将所述视频图像插入至所述两个初始视频帧之间,得到目标视频。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,第二确定模块包括:第一确定单元,用于基于所述第一光流、所述第二光流以及目标对象在所述第一视频帧与所述第二视频帧的运动轨迹确定所述第一视频帧到所述中间帧的第五光流,以及所述第二视频帧到所述中间帧的第六光流;第二确定单元,用于通过光流逆转技术,基于所述第五光流确定所述第三光流,基于所述第六光流确定所述第四光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,所述第二确定单元包括:第一确定子单元,用于若所述中间帧上的第一像素点与所述第一视频帧上唯一的第二像素点具有预设关系,则所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量为所述第二像素点从所述第一视频帧到所述中间帧的第二光流向量的逆向量,其中,所述第五光流包括所述第二光流向量;若所述中间帧上的第一像素点与所述第一视频帧上至少两个第二像素点具有预设关系,则所述第一光流向量为所述至少两个第二像素点分别从所述第一视频帧到所述中间帧的第二光流向量的逆向量的加权平均值;若所 述第一视频帧上不存在与所述中间帧上的第一像素点具有预设关系的第二像素点,则所述第一光流向量为0;所述中间帧上的每个所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量组成所述第三光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,所述第二确定单元还包括:第二确定子单元,用于若所述中间帧上的第三像素点与所述第二视频帧上唯一的第四像素点具有预设关系,则所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量为所述第四像素点从所述第二视频帧到所述中间帧的第四光流向量的逆向量,其中,所述第六光流包括所述第四光流向量;若所述中间帧上的第三像素点与所述第二视频帧上至少两个第四像素点具有预设关系,则所述第三光流向量为所述至少两个第四像素点分别从所述第二视频帧到所述中间帧的第四光流向量的逆向量的加权平均值;若所述第二视频帧上不存在与所述中间帧上的第三像素点具有预设关系第四像素点,则所述第三光流向量0;所述中间帧上的每个所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量组成所述第四光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,所述量化神经网络包括级联的编码器模块、解码器模块和两个光流预测分支;其中,所述编码器模块包括下采样单元和编码单元,所述下采样单元用于分别对输入的所述第一视频帧以及所述第二视频帧进行下采样,并将所述第一视频帧的下采样图像以及所述第二视频帧的下采样图像输入至所述编码单元,以使所述编码单元基于所述下采样图像进行特征提取,获得特征图像的编码;所述解码器模块包括解码单元和上采样单元,所述解码单元用于对所述特征图像的编码进行解码,并将解码后的特征图像输入至所述上采样单元,以使所述上采样单元对所述解码后的特征图像进行上采样,并将获得的上采样图像分别输入至所述两个光流预测分支,以使所述两个光流预测分支中的一个光流预测分支基于所述上采样图像预测所述第一光流,使所述两个光流预测分支中的另一个光流预测分支基于所述上采样图像预测所述第二光流。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,第三确定模块包括:变换单元,用于对所述第一视频帧进行图像仿射变换,获得所述第一视频帧在所述中间帧的获取时刻时的第一变换帧;对所述第二视频帧进行图像放射变换,获得所述第二视频帧在所述中间帧的获取时刻时的第二变换帧。预测单元,用于通过预设神经网络基于所述第一视频帧对应的下采样图像、所述第二视频帧对应的下采样图像、所述第一变换帧、所述第二变换帧、所述第三光流和所述第四光流预测所述第一变换帧和所述第二变换帧的融合权重。融合单元,用于基于所述融合权重对所述第一变换帧和所述第二变换帧中的像素进行融合,获得所述中间帧,所述融合权重表示所述中间帧上的一像素来自所 述第一视频帧或者来自所述第二视频帧的概率。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,还包括:判定模块,用于在将所述中间帧插入至所述第一视频帧与所述第二视频帧之间之前,基于目标对象在所述第一视频帧与所述第二视频帧中的运动特征和/或颜色特征,确定在所述第一视频帧和所述第二视频帧之间是否适合插入所述中间帧,若确定在所述第一视频帧和所述第二视频帧之间适合插入所述中间帧,则继续执行所述将所述中间帧插入至所述第一视频帧与所述第二视频帧之间的步骤。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征包括下述至少一种:所述第三光流与所述第四光流的一致性;所述中间帧中光流空洞点的数量,若所述第一视频帧和所述第二视频帧中不存在与中间帧中特定像素点具有预设关系的像素点,则所述特定像素点被确定为光流空洞点;所述目标对象在所述第一视频帧与所述第二视频帧中的颜色特征包括:第一变换帧与第二变换帧之间的灰度差,其中,所述第一变换帧通过对所述第一视频帧进行图像仿射变换获得,所述第二变换帧通过对所述第二视频帧进行图像放射变换获得。
根据本公开的一个或多个实施例,本公开提供的视频帧调整装置中,可选的,若所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征为所述第三光流与所述第四光流的一致性,所述判定模块具体用于:针对线性运动,根据所述中间帧上的目标像素点从所述中间帧运动到所述第一视频帧的前向运动向量,以及所述目标像素点从所述中间帧运动到所述第二视频帧的后向运动向量确定线性运动偏移距离;统计所述线性运动偏移距离大于第一设定阈值的像素数占比;若所述像素数占比小于或等于第二设定阈值,则确定在所述第一对应的视频图像。
根据本公开的一个或多个实施例,本公开提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开任一实施例提供的所述的视频帧调整方法。
根据本公开的一个或多个实施例,本公开提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本公开实施例提供的任一所述的视频帧调整方法。
本公开实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序或指令,该计算机程序或指令被处理器执行时实现如上所述的视频帧调整方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案, 同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (13)

  1. 一种视频帧调整方法,其特征在于,包括:
    基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;
    基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;
    根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;
    在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,包括:
    基于所述第一光流、所述第二光流以及目标对象在所述第一视频帧与所述第二视频帧的运动轨迹确定所述第一视频帧到所述中间帧的第五光流,以及所述第二视频帧到所述中间帧的第六光流;
    通过光流逆转技术,基于所述第五光流确定所述第三光流,基于所述第六光流确定所述第四光流。
  3. 根据权利要求2所述的方法,其特征在于,所述通过光流逆转技术,基于所述第五光流确定所述第三光流,包括:
    若所述中间帧上的第一像素点与所述第一视频帧上唯一的第二像素点具有预设关系,则所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量为所述第二像素点从所述第一视频帧到所述中间帧的第二光流向量的逆向量,其中,所述第五光流包括所述第二光流向量;
    若所述中间帧上的第一像素点与所述第一视频帧上至少两个第二像素点具有预设关系,则所述第一光流向量为所述至少两个第二像素点分别从所述第一视频帧到所述中间帧的第二光流向量的逆向量的加权平均值;
    若所述第一视频帧上不存在与所述中间帧上的第一像素点具有预设关系的第二像素点,则所述第一光流向量为0;
    所述中间帧上的每个所述第一像素点从所述中间帧到所述第一视频帧的第一光流向量组成所述第三光流。
  4. 根据权利要求2所述的方法,其特征在于,所述基于所述第六光流确定所述第四光流,包括:
    若所述中间帧上的第三像素点与所述第二视频帧上唯一的第四像素点具有预设关系,则所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量为所述第四像素点从所述第二视频帧到所述中间帧的第四光流向量的逆向量,其中,所述第六光流包括所述第四光流向量;
    若所述中间帧上的第三像素点与所述第二视频帧上至少两个第四像素点具有预设关系,则所述第三光流向量为所述至少两个第四像素点分别从所述第二视频帧到所述中间帧的第四光流向量的逆向量的加权平均值;
    若所述第二视频帧上不存在与所述中间帧上的第三像素点具有预设关系第四像素点,则所述第三光流向量0;
    所述中间帧上的每个所述第三像素点从所述中间帧到所述第二视频帧的第三光流向量组成所述第四光流。
  5. 根据权利要求1所述的方法,其特征在于,所述量化神经网络包括级联的编码器模块、解码器模块和两个光流预测分支;
    其中,所述编码器模块包括下采样单元和编码单元,所述下采样单元用于分别对输入的所述第一视频帧以及所述第二视频帧进行下采样,并将所述第一视频帧的下采样图像以及所述第二视频帧的下采样图像输入至所述编码单元,以使所述编码单元基于所述下采样图像进行特征提取,获得特征图像的编码;
    所述解码器模块包括解码单元和上采样单元,所述解码单元用于对所述特征图像的编码进行解码,并将解码后的特征图像输入至所述上采样单元,以使所述上采样单元对所述解码后的特征图像进行上采样,并将获得的上采样图像分别输入至所述两个光流预测分支,以使所述两个光流预测分支中的一个光流预测分支基于所述上采样图像预测所述第一光流,使所述两个光流预测分支中的另一个光流预测分支基于所述上采样图像预测所述第二光流。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧,包括:
    对所述第一视频帧进行图像仿射变换,获得所述第一视频帧在所述中间帧的获取时刻时的第一变换帧;
    对所述第二视频帧进行图像放射变换,获得所述第二视频帧在所述中间帧的获取时刻时的第二变换帧;
    通过预设神经网络基于所述第一视频帧对应的下采样图像、所述第二视频帧对应的下 采样图像、所述第一变换帧、所述第二变换帧、所述第三光流和所述第四光流预测所述第一变换帧和所述第二变换帧的融合权重;
    基于所述融合权重对所述第一变换帧和所述第二变换帧中的像素进行融合,获得所述中间帧,所述融合权重表示所述中间帧上的一像素来自所述第一视频帧或者来自所述第二视频帧的概率。
  7. 根据权利要求1-5任一项所述的方法,其特征在于,所述将所述中间帧插入至所述第一视频帧与所述第二视频帧之间之前,还包括:
    基于目标对象在所述第一视频帧与所述第二视频帧中的运动特征和/或颜色特征,确定在所述第一视频帧和所述第二视频帧之间是否适合插入所述中间帧,若确定在所述第一视频帧和所述第二视频帧之间适合插入所述中间帧,则继续执行所述将所述中间帧插入至所述第一视频帧与所述第二视频帧之间的步骤。
  8. 根据权利要求7所述的方法,其特征在于,所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征包括下述至少一种:
    所述第三光流与所述第四光流的一致性;
    所述中间帧中光流空洞点的数量,其中,若所述第一视频帧和所述第二视频帧中不存在与中间帧中特定像素点具有预设关系的像素点,则所述特定像素点被确定为光流空洞点;
    所述目标对象在所述第一视频帧与所述第二视频帧中的颜色特征包括:
    第一变换帧与第二变换帧之间的灰度差,其中,所述第一变换帧通过对所述第一视频帧进行图像仿射变换获得,所述第二变换帧通过对所述第二视频帧进行图像放射变换获得。
  9. 根据权利要求7所述的方法,其特征在于,若所述目标对象在所述第一视频帧与所述第二视频帧中的运动特征为所述第三光流与所述第四光流的一致性,基于所述第三光流与所述第四光流的一致性,确定在所述第一视频帧和所述第二视频帧之间是否适合插入所述中间帧,包括:
    针对线性运动,根据所述中间帧上的目标像素点从所述中间帧运动到所述第一视频帧的前向运动向量,以及所述目标像素点从所述中间帧运动到所述第二视频帧的后向运动向量确定线性运动偏移距离;
    统计所述线性运动偏移距离大于第一设定阈值的像素数占比;
    若所述像素数占比小于或等于第二设定阈值,则确定在所述第一视频帧和所述第二视频帧之间适合插入所述中间帧。
  10. 一种视频帧调整装置,其特征在于,包括:
    第一确定模块,用于基于初始视频中的第一视频帧和第二视频帧,通过量化神经网络,确定所述第一视频帧到所述第二视频帧的第一光流以及所述第二视频帧到所述第一视频帧 的第二光流,所述第一视频帧与所述第二视频帧为两个相邻的初始视频帧;
    第二确定模块,用于基于所述第一光流以及所述第二光流确定中间帧到所述第一视频帧的第三光流,以及所述中间帧到所述第二视频帧的第四光流,其中,所述中间帧为待插入到所述第一视频帧与所述第二视频帧之间的估计视频帧;
    第三确定模块,用于根据所述第一视频帧、所述第二视频帧、所述第三光流和所述第四光流确定所述中间帧;
    插帧模块,用于在所述初始视频中,将所述中间帧插入至所述第一视频帧与所述第二视频帧之间,得到目标视频。
  11. 一种电子设备,其特征在于,所述电子设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一项所述的方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-9中任一项所述的方法。
  13. 一种计算机程序产品,包括计算机程序/指令,其特征在于,该计算机程序/指令被处理器执行时实现如权利要求1-9中任一项所述的方法。
PCT/CN2022/112783 2021-08-16 2022-08-16 视频帧调整方法、装置、电子设备和存储介质 WO2023020492A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110939314.8 2021-08-16
CN202110939314.8A CN115706810A (zh) 2021-08-16 2021-08-16 视频帧调整方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2023020492A1 true WO2023020492A1 (zh) 2023-02-23

Family

ID=85180404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/112783 WO2023020492A1 (zh) 2021-08-16 2022-08-16 视频帧调整方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN115706810A (zh)
WO (1) WO2023020492A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117082295B (zh) * 2023-09-21 2024-03-08 荣耀终端有限公司 图像流处理方法、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109151474A (zh) * 2018-08-23 2019-01-04 复旦大学 一种生成新视频帧的方法
CN112104830A (zh) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN112712537A (zh) * 2020-12-21 2021-04-27 深圳大学 视频时空超分辨率实现方法及装置
WO2021085757A1 (ko) * 2019-10-31 2021-05-06 한국과학기술원 예외적 움직임에 강인한 비디오 프레임 보간 방법 및 그 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109151474A (zh) * 2018-08-23 2019-01-04 复旦大学 一种生成新视频帧的方法
WO2021085757A1 (ko) * 2019-10-31 2021-05-06 한국과학기술원 예외적 움직임에 강인한 비디오 프레임 보간 방법 및 그 장치
CN112104830A (zh) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 视频插帧方法、模型训练方法及对应装置
CN112712537A (zh) * 2020-12-21 2021-04-27 深圳大学 视频时空超分辨率实现方法及装置

Also Published As

Publication number Publication date
CN115706810A (zh) 2023-02-17

Similar Documents

Publication Publication Date Title
CN110413812B (zh) 神经网络模型的训练方法、装置、电子设备及存储介质
CN113436620B (zh) 语音识别模型的训练方法、语音识别方法、装置、介质及设备
WO2020248889A1 (zh) 一种视频质量预测方法、装置和电子设备
CN113115067A (zh) 直播系统、视频处理方法及相关装置
WO2023020492A1 (zh) 视频帧调整方法、装置、电子设备和存储介质
WO2023273610A1 (zh) 语音识别方法、装置、介质及电子设备
CN112203085B (zh) 图像处理方法、装置、终端和存储介质
CN112714273A (zh) 屏幕共享显示方法、装置、设备及存储介质
CN113747242B (zh) 图像处理方法、装置、电子设备及存储介质
CN113329226B (zh) 数据的生成方法、装置、电子设备及存储介质
CN113038176B (zh) 视频抽帧方法、装置和电子设备
CN112203086B (zh) 图像处理方法、装置、终端和存储介质
US20240029196A1 (en) System, devices and/or processes for temporal upsampling image frames
CN111797665B (zh) 用于转换视频的方法和装置
US11729349B2 (en) Method, electronic device, and computer program product for video processing
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks
CN111737575B (zh) 内容分发方法、装置、可读介质及电子设备
CN114972021A (zh) 一种图像处理方法、装置、电子设备及存储介质
CN113706385A (zh) 一种视频超分辨率方法、装置、电子设备及存储介质
WO2023093481A1 (zh) 基于傅里叶域的超分图像处理方法、装置、设备及介质
WO2023093838A1 (zh) 超分图像处理方法、装置、设备及介质
CN114066722B (zh) 用于获取图像的方法、装置和电子设备
US11647153B1 (en) Computer-implemented method, device, and computer program product
WO2023024986A1 (zh) 一种视频流畅度确定方法、装置、设备及介质
CN114173134B (zh) 视频编码方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22857803

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE