CN116418932A

CN116418932A - Video frame interpolation method and device

Info

Publication number: CN116418932A
Application number: CN202111650327.XA
Authority: CN
Inventors: 陈加忠; 王晟; 代敏; 胡康康
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2023-07-11

Abstract

The invention discloses a video frame interpolation method and device. The method comprises the following steps: acquiring a first video frame at a first moment and a second video frame at a second moment; generating a first mask corresponding to a third moment according to the first motion vector diagram or the second motion vector diagram, performing motion compensation on the first video frame according to the third motion vector diagram, performing motion compensation on the second video frame according to the fourth motion vector diagram to obtain a more accurate first reference frame and a more accurate second reference frame, performing image fusion on the first reference frame and the second reference frame according to the unprotected area in the first mask, and obtaining a third video frame at the third moment, so that the first video frame and the second video frame are commonly referred to, and performing image fusion according to the unprotected area in the first mask, thereby ensuring the quality of video frame interpolation.

Description

Video frame interpolation method and device

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and apparatus for interpolating video frames.

Background

Currently, for smoother user experience, the screen refreshing frequency of mobile phones and smart televisions is higher and higher, taking mobile phones as an example, 60 hertz (Hz) is already the main stream, and some leading manufacturers push out screens of 90Hz and even 120 Hz. However, the development of video frame rate is not kept pace with the refresh frequency of the screen, and the common video frame rates are 24FPS, 30FPS, and 60FPS, which are different from the refresh frequency of the screen. Thus, video frame interpolation (video frame interpolation, VFI) techniques are commonly employed today, which refers to techniques that newly generate one or more intermediate video frames in between two consecutive original video frames, and which can be used to boost the frame rate of the video frames to match the screen refresh frequency.

Therefore, how to guarantee the quality of the video frames in the video frame interpolation is a problem to be solved.

Disclosure of Invention

The application provides a video frame interpolation method and device, which are used for guaranteeing video frame quality of video frame interpolation.

In a first aspect, the present application provides a video frame interpolation method. The method may be performed by a video frame interpolation apparatus provided herein, which may be an electronic device provided herein, the method comprising:

acquiring a first video frame at a first moment and a second video frame at a second moment; generating a first mask corresponding to a third moment according to the first motion vector diagram or the second motion vector diagram, wherein the first mask comprises a non-protection area; the first motion vector diagram is a motion vector diagram from the first time to the second time, the second motion vector diagram is a motion vector diagram from the second time to the first time, and the third time is between the first time and the second time; performing motion compensation on the first video frame according to a third motion vector diagram to obtain a first reference frame at a third moment, and performing motion compensation on the second video frame according to a fourth motion vector diagram to obtain a second reference frame at the third moment, wherein the third motion vector diagram is a motion vector diagram from the third moment to the first moment, and the fourth motion vector diagram is a motion vector diagram from the third moment to the second moment; and carrying out image fusion on the first reference frame and the second reference frame according to the unprotected area in the first mask to obtain a third video frame at the third moment.

In the above manner, after the first video frame and the second video frame are obtained, a first mask corresponding to a third moment is generated according to the first motion vector diagram or the second motion vector diagram, the first video frame is further subjected to motion compensation according to the third motion vector diagram to obtain a first reference frame of the third moment, and the second video frame is subjected to motion compensation according to the fourth motion vector diagram to obtain a second reference frame of the third moment, so that more accurate first reference frame and second reference frame are obtained, the first video frame and the second video frame are jointly referred to, and image fusion is performed according to the non-protection area in the first mask to obtain the third video frame of the third moment, so that the quality of video frame interpolation is ensured.

In one possible design, each pixel corresponds to a first weight value and a second weight value in the unprotected area in the first mask; the image fusion of the first reference frame and the second reference frame according to the unprotected area in the first mask includes:

acquiring a second pixel at a corresponding position in the first reference frame and a third pixel at a corresponding position in the second reference frame according to the position of the first pixel in the unprotected area; performing weighted operation on the pixel value of the second pixel according to the first weight value corresponding to the first pixel to obtain a first operation result; performing weighted operation on the pixel value of the third pixel according to the second weight value corresponding to the first pixel to obtain a second operation result; and determining the sum of the first operation result and the second operation result as a pixel value of a pixel at a corresponding position in the third video frame.

In the above manner, the pixel value of the pixel at the corresponding position in the third video frame is obtained by combining the pixel value of the second pixel and the pixel value of the third pixel through the first weight value and the second weight value corresponding to the first pixel, so that any pixel value refers to the pixel values of the corresponding pixels in the first video frame and the second video frame, and image fusion is performed according to the unprotected area, thereby ensuring the quality of video frame interpolation.

In one possible design, the generating the first mask corresponding to the third moment according to the first motion vector diagram includes:

respectively determining the coordinates of each pixel at the third moment according to the coordinates and the motion vector values of each pixel in the first motion vector diagram, and generating a second mask corresponding to the third moment according to the coordinates of each pixel at the third moment; performing at least one iteration on the second mask; the iterative operation comprises mean value blurring processing and binarization processing of the second mask; and performing corrosion treatment and mean value blurring treatment on the second mask after the iterative operation to obtain the first mask corresponding to the third moment.

In the above manner, the second mask can be noise reduced through at least one iteration operation, and the pixel motion degree of the first video frame to the third video frame can be more clearly defined, so that the more accurate first mask can be obtained.

In one possible design, the generating the first mask corresponding to the third moment according to the second motion vector diagram includes:

respectively determining the coordinates of each pixel at the third moment according to the coordinates and the motion vector values of each pixel in the second motion vector diagram, and generating a third mask corresponding to the third moment according to the coordinates of each pixel at the third moment; performing at least one iteration on the third mask; the iterative operation comprises mean value blurring processing and binarization processing of the third mask; and performing corrosion treatment and mean value blurring treatment on the third mask after the iterative operation to obtain the first mask corresponding to the third moment.

In the above manner, the third mask can be noise reduced through at least one iteration operation, and the pixel motion degree from the second video frame to the third video frame can be more clearly defined, so that the more accurate first mask can be obtained.

In one possible design, the method further comprises:

generating a first motion vector diagram from the first moment to the second moment according to the first video frame and the second video frame; determining the coordinate of a fourth pixel in the third moment according to the coordinate and the motion vector value of the fourth pixel in the first motion vector diagram, wherein the fourth pixel is any pixel in the first motion vector diagram; acquiring at least one pixel adjacent to the coordinates of the fourth pixel at the third moment, and setting a motion vector value of the at least one pixel according to the motion displacement of the fourth pixel from the first moment to the third moment; if the motion vector value of the fifth pixel in the third motion vector diagram is set multiple times, determining an average value of the multiple-time set motion vector values as the motion vector value of the fifth pixel.

In the above manner, the coordinates of the fourth pixel at the third moment can be determined, and the adjacent at least one pixel is assigned according to the motion displacement, so that mutual verification can be performed according to the motion vector values of the pixels adjacent to each other, and a more accurate motion vector value can be obtained.

In one possible design, before the performing motion compensation on the first video frame according to the third motion vector diagram to obtain the first reference frame at the third moment, the method further includes:

generating a fourth mask corresponding to the third moment according to the third motion vector diagram; selecting pixels meeting the repair condition in the third motion vector diagram according to the fourth mask; updating the motion vector value of the pixel meeting the patching condition according to the motion vector value of at least one pixel associated with the pixel meeting the patching condition.

In the above manner, the pixels satisfying the repair conditions can be screened out through the fourth mask, so that the more accurate fourth mask can be obtained by updating the pixels satisfying the repair conditions in the part.

In one possible design, the selecting, according to the fourth mask, a pixel in the third motion vector diagram that meets a repair condition includes:

selecting a sixth pixel in the fourth mask, wherein the sixth pixel meets a first repair condition, and the first repair condition comprises that the pixel does not have an effective motion vector value;

the updating the motion vector value of the pixel meeting the patching condition according to the motion vector value of at least one pixel associated with the pixel meeting the patching condition comprises the following steps:

Searching in the fourth mask by taking the sixth pixel as a starting point to obtain a first pixel set corresponding to the sixth pixel, wherein the pixels in the first pixel set are the pixels closest to the sixth pixel in at least one direction, and motion vector values exist; acquiring a seventh pixel corresponding to the sixth pixel in the third motion vector diagram and a second pixel set corresponding to the first pixel set; and determining the motion vector value of the seventh pixel according to the distance between the pixel in the second pixel set and the seventh pixel and the motion vector value of the pixel in the second pixel set.

In the above manner, the motion vector value of the pixel nearest to the seventh pixel in at least one direction can be obtained through the second pixel set corresponding to the first pixel set, so that the motion vector value of the pixel can be more accurately repaired through the distance between the seventh pixel and the adjacent pixel and the motion vector value of the seventh pixel.

selecting an eighth pixel in the fourth mask, wherein the eighth pixel meets a second patching condition, the second patching condition comprises that a value of the eighth pixel indicates that a motion vector value does not exist in a corresponding ninth pixel in the third motion vector diagram of the eighth pixel, and at least one pixel adjacent to the ninth pixel in a first direction exists in the motion vector value;

acquiring the ninth pixel corresponding to the eighth pixel in the third motion vector diagram; a motion vector value of the ninth pixel is determined from a motion vector value of at least one pixel adjacent to the ninth pixel in the first direction.

In the above manner, the motion vector value of the ninth pixel is directly determined through the motion vector value of at least one pixel adjacent to the ninth pixel in the first direction, so that the efficiency of repairing the motion vector value of the pixel is improved.

In one possible design, the method further comprises:

generating a second motion vector diagram from the second moment to the first moment according to the first video frame and the second video frame; determining the coordinate of a tenth pixel in the third moment according to the coordinate and the motion vector value of the tenth pixel in the second motion vector diagram, wherein the tenth pixel is any pixel in the second motion vector diagram; acquiring at least one pixel adjacent to the coordinate of the tenth pixel at the third moment, and setting a motion vector value of the at least one pixel according to the motion displacement of the tenth pixel from the first moment to the third moment; if the motion vector value of the eleventh pixel in the fourth motion vector diagram is set a plurality of times, determining an average value of the plurality of set motion vector values as the motion vector value of the eleventh pixel.

In the above manner, the coordinates of the tenth pixel at the third moment may be determined, and the assignment is performed on at least one adjacent pixel according to the motion displacement, so that mutual verification may be performed according to the motion vector values of the pixels adjacent to each other, and a more accurate motion vector value may be obtained.

In one possible design, before the performing motion compensation on the second video frame according to the fourth motion vector diagram to obtain the second reference frame at the third moment, the method further includes:

generating a fifth mask corresponding to the third moment according to the fourth motion vector diagram; selecting pixels meeting the repair condition in the fourth motion vector diagram according to the fifth mask; updating the motion vector value of the pixel meeting the patching condition according to the motion vector value of at least one pixel associated with the pixel meeting the patching condition.

In the above manner, the pixels satisfying the repair conditions can be selected through the fifth mask, so that a more accurate fifth mask can be obtained by updating the pixels satisfying the repair conditions in the part.

In one possible design, the selecting, according to the fifth mask, a pixel in the fourth motion vector diagram that meets a repair condition includes:

Selecting a twelfth pixel in the fifth mask, wherein the twelfth pixel meets a third repairing condition, and the third repairing condition comprises that no effective motion vector value exists in the pixel;

searching a third pixel set corresponding to the twelfth pixel in the fifth mask by taking the twelfth pixel as a starting point, wherein the pixels in the third pixel set are the pixels closest to the twelfth pixel in at least one direction, and motion vector values exist; acquiring a thirteenth pixel corresponding to the twelfth pixel in the fourth motion vector diagram and a fourth pixel set corresponding to the third pixel set; and determining the motion vector value of the thirteenth pixel according to the distance between the pixel in the fourth pixel set and the thirteenth pixel and the motion vector value of the pixel in the fourth pixel set.

In the above manner, the motion vector value of the pixel nearest to the twelfth pixel in at least one direction can be obtained through the fourth pixel set corresponding to the third pixel set, so that the motion vector value of the pixel can be more accurately repaired through the distance between the twelfth pixel and the adjacent pixel and the motion vector value of the adjacent pixel of the twelfth pixel.

selecting a fourteenth pixel in the fifth mask, wherein the fourteenth pixel meets a fourth patching condition, the fourth patching condition comprises that a value of the fourteenth pixel indicates that a corresponding fifteenth pixel in the fourth motion vector diagram does not have a motion vector value, and at least one pixel adjacent to the fifteenth pixel in a second direction has a motion vector value;

acquiring the fifteenth pixel corresponding to the fourteenth pixel in the fourth motion vector diagram; a motion vector value of the fourteenth pixel is determined according to a motion vector value of at least one pixel adjacent to the fourteenth pixel in the second direction.

In the above manner, the motion vector value of the fourteenth pixel is directly determined through the motion vector value of at least one pixel adjacent to the fourteenth pixel in the second direction, so that the efficiency of repairing the motion vector value of the pixel is improved.

In a second aspect, the present application provides a video frame interpolation apparatus, comprising:

the acquisition module is used for acquiring a first video frame at a first moment and a second video frame at a second moment;

the generating module is used for generating a first mask corresponding to a third moment according to the first motion vector diagram or the second motion vector diagram, wherein the first mask comprises a non-protection area; the first motion vector diagram is a motion vector diagram from the first time to the second time, the second motion vector diagram is a motion vector diagram from the second time to the first time, and the third time is between the first time and the second time;

the processing module is used for performing motion compensation on the first video frame according to a third motion vector diagram to obtain a first reference frame at the third moment, and performing motion compensation on the second video frame according to a fourth motion vector diagram to obtain a second reference frame at the third moment, wherein the third motion vector diagram is a motion vector diagram from the third moment to the first moment, and the fourth motion vector diagram is a motion vector diagram from the third moment to the second moment; and

and the image fusion module is used for carrying out image fusion on the first reference frame and the second reference frame according to the unprotected area in the first mask to obtain a third video frame at the third moment.

In one possible design, each pixel corresponds to a first weight value and a second weight value in the unprotected area in the first mask; the processing module is specifically configured to:

acquiring a second pixel at a corresponding position in the first reference frame and a third pixel at a corresponding position in the second reference frame according to the position of the first pixel in the unprotected area; the first pixel is any pixel in the unprotected area; performing weighted operation on the pixel value of the second pixel according to the first weight value corresponding to the first pixel to obtain a first operation result; performing weighted operation on the pixel value of the third pixel according to the second weight value corresponding to the first pixel to obtain a second operation result; and determining the sum of the first operation result and the second operation result as a pixel value of a pixel at a corresponding position in the third video frame.

In a possible design, if the first mask is generated according to the first motion vector diagram, the pixel value of the first pixel in the first mask is used to indicate the first weight value, and the second weight value is equal to 1 minus the first weight value; or if the first mask is generated according to the second motion vector diagram, the pixel value of the first pixel in the first mask is used for indicating the second weight value, and the first weight value is equal to 1 minus the second weight value.

In one possible design, the generating module is specifically configured to:

In one possible design, the generating module is further configured to:

In one possible design, the generating module is specifically configured to:

selecting a sixth pixel in the fourth mask, wherein the sixth pixel meets a first repair condition, and the first repair condition comprises that the pixel does not have an effective motion vector value; searching in the fourth mask by taking the sixth pixel as a starting point to obtain a first pixel set corresponding to the sixth pixel, wherein the pixels in the first pixel set are the pixels closest to the sixth pixel in at least one direction, and motion vector values exist; acquiring a seventh pixel corresponding to the sixth pixel in the third motion vector diagram and a second pixel set corresponding to the first pixel set; and determining the motion vector value of the seventh pixel according to the distance between the pixel in the second pixel set and the seventh pixel and the motion vector value of the pixel in the second pixel set.

In one possible design, the generating module is specifically configured to:

selecting an eighth pixel in the fourth mask, wherein the eighth pixel meets a second patching condition, the second patching condition comprises that a value of the eighth pixel indicates that a motion vector value does not exist in a corresponding ninth pixel in the third motion vector diagram of the eighth pixel, and at least one pixel adjacent to the ninth pixel in a first direction exists in the motion vector value; acquiring the ninth pixel corresponding to the eighth pixel in the third motion vector diagram; a motion vector value of the ninth pixel is determined from a motion vector value of at least one pixel adjacent to the ninth pixel in the first direction.

In one possible design, the generating module is further configured to:

In one possible design, the generating module is specifically configured to:

selecting a twelfth pixel in the fifth mask, wherein the twelfth pixel meets a third repairing condition, and the third repairing condition comprises that no effective motion vector value exists in the pixel; searching a third pixel set corresponding to the twelfth pixel in the fifth mask by taking the twelfth pixel as a starting point, wherein the pixels in the third pixel set are the pixels closest to the twelfth pixel in at least one direction, and motion vector values exist; acquiring a thirteenth pixel corresponding to the twelfth pixel in the fourth motion vector diagram and a fourth pixel set corresponding to the third pixel set; and determining the motion vector value of the thirteenth pixel according to the distance between the pixel in the fourth pixel set and the thirteenth pixel and the motion vector value of the pixel in the fourth pixel set.

In one possible design, the generating module is specifically configured to:

selecting a fourteenth pixel in the fifth mask, wherein the fourteenth pixel meets a fourth patching condition, the fourth patching condition comprises that a value of the fourteenth pixel indicates that a corresponding fifteenth pixel in the fourth motion vector diagram does not have a motion vector value, and at least one pixel adjacent to the fifteenth pixel in a second direction has a motion vector value; acquiring the fifteenth pixel corresponding to the fourteenth pixel in the fourth motion vector diagram; a motion vector value of the fourteenth pixel is determined according to a motion vector value of at least one pixel adjacent to the fourteenth pixel in the second direction.

In a third aspect, an electronic device is provided, the electronic device comprising: one or more processors; one or more memories; wherein the one or more memories store one or more computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of the first aspects above.

In a fourth aspect, there is provided a computer readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method of any one of the first aspects above.

The advantages of the second aspect to the fourth aspect are described above with reference to the advantages of the first aspect, and the description is not repeated.

Drawings

Fig. 1 is a schematic diagram of a system architecture to which a video frame interpolation method according to an embodiment of the present application is applicable;

fig. 2 is a schematic diagram of a video frame interpolation module in a system architecture schematic diagram to which the video frame interpolation method provided in the embodiment of the present application is applicable;

fig. 3 is a flowchart of steps of a video frame display method that may be implemented by the video frame interpolation method according to an embodiment of the present application;

Fig. 4 is a flowchart illustrating steps of a video frame interpolation method according to an embodiment of the present application;

fig. 5 is a flowchart illustrating steps for generating a first mask in a video frame interpolation method according to an embodiment of the present application;

fig. 6 is a flowchart of steps of a video frame display method that may be implemented by the video frame interpolation method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a video frame interpolation device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The terminology used in the following embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in embodiments of the present application, "one or more" refers to one or more than two (including two); "and/or", describes an association relationship of the association object, indicating that three relationships may exist; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The term "plurality" in the embodiments of the present application means two or more, and for this reason, "plurality" may be also understood as "at least two" in the embodiments of the present application. "at least one" may be understood as one or more, for example as one, two or more. For example, including at least one means including one, two or more, and not limiting what is included. For example, at least one of A, B and C is included, then A, B, C, A and B, A and C, B and C, or A and B and C may be included. Likewise, the understanding of the description of "at least one" and the like is similar. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/", unless otherwise specified, generally indicates that the associated object is an "or" relationship.

Unless stated to the contrary, the embodiments of the present application refer to ordinal terms such as "first," "second," etc., for distinguishing between multiple objects and not for defining a sequence, timing, priority, or importance of the multiple objects.

For ease of understanding, the terms involved in the embodiments of the present application are explained as part of the summary of the embodiments of the present application.

Application scenario

The present application provides a video frame interpolation method that can be applied to the system architecture shown in fig. 1. It should be understood that the system architecture shown in fig. 1 is only one example, and in some other examples, may have more or fewer components than shown in fig. 1, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits. The system architecture shown in fig. 1 may be implemented by any device capable of displaying video, such as a terminal device, and is not limited herein.

The system architecture shown in fig. 1 may include a graphics processor (graphics processing unit, GPU) 101, a central processing unit (central processing unit, CPU) 102, a video processing chip 103, a graphics rendering engine 104, a video frame interpolation module 105, and a display screen 106. The method can be concretely as follows:

The GPU101 is configured to run a graphics rendering engine 104 or a video frame interpolation module 105.

CPU102 is operative to run graphics rendering engine 104 or video frame interpolation module 105.

It should be noted that only one component of the GPU101 and the CPU102 may be in the system architecture shown in fig. 1, or may be both. For example, in the case where the GPU101 and the CPU102 are both configured, the graphics rendering engine 104 or the video frame interpolation module 105 may be operated in a hybrid manner, and since the GPU101 has a strong parallel processing performance, and is capable of processing more pixels in the video frame at the same time, the graphics rendering engine 104 may be operated by the GPU101, and the CPU102 has a fast operation speed, the video frame interpolation module 105 may be operated by the CPU 102.

The video processing chip 103 is configured to output a video frame stream to the video frame interpolation module 105.

Graphics rendering engine 104 is configured to output a stream of video frames to video frame interpolation module 105.

It should be noted that, although the video processing chip 103 and the graphics rendering engine 104 both output video frames, the manner may not be the same. The input of the video processing chip 103 may be a video frame, and after preprocessing (such as noise reduction) the video frame, the preprocessed video frame is output; the input to graphics-rendering engine 104 may be non-video data (e.g., a rendering model, etc.), which, by rendering, generates video frames.

In some cases, the video processing chip 103 and the graphics rendering engine 104 may not all be included. For example, in a video call scenario, graphics rendering is not required, and the video processing chip 103 may directly acquire a video frame captured by the camera and output the video frame to the video frame interpolation module 105, so the system architecture may not include the graphics rendering engine 104.

The video frame interpolation module 105 is configured to generate an interpolated video frame according to a video frame interpolation algorithm based on a video frame stream (including at least two video frames) output by the video processing chip 103 or the graphics rendering engine 104. For example, a third video frame at a third time may be generated from the input first video frame at the first time and the second video frame at the second time, the third time being located between the first time and the second time. The video frame interpolation module 105 may be, in particular, system software or application software.

A display 106 for displaying video frames. When the video frame interpolation module 105 generates an interpolated video frame, the display screen 106 receives and displays the original video frame stream input by the video processing chip 103 or the graphics rendering engine 104 and the newly generated interpolated video frame.

The video interpolation module 105 in fig. 1 may specifically include sub-modules, which may be shown in fig. 2. It is apparent that the sub-modules shown in fig. 2 are only one example, and in some other examples, there may be more or fewer sub-modules than shown in fig. 2, each of which may be implemented in particular by corresponding program code in the software running the video interpolation module 105, each of which may implement the functionality of the module by invoking the program code of the corresponding function in the software, and the functions of each of which may be implemented in particular as follows:

the input frame detection module 201 is configured to detect whether the input first video frame and the second video frame have the same content and the frame size is consistent.

If the contents of the two video frames are consistent, it is obvious that the video frames still consistent are not needed to be obtained through video frame interpolation, and one of the video frames can be directly taken; when the two video frames are inconsistent in size, for example, the first video frame is a video frame of a horizontal screen, and the second video frame is a video frame of a vertical screen, the video frame interpolation cannot be performed.

The scene-switch detecting module 202 is configured to detect whether a scene switch exists between the first video frame and the second video frame.

The scene switching can be detected by adopting the optical flow method and other technologies in the prior art, and when the scene switching exists between two video frames, the content difference of the video frames is too large, and the video frame interpolation cannot be performed.

The motion vector diagram module 203 is configured to obtain a motion vector diagram according to the input first video frame and second video frame, and patch the motion vector diagram. In this embodiment of the present application, a Motion Vector (MV) indicates a direction and a distance of a pixel motion. The motion vector diagram is used to describe the corresponding motion vectors of each pixel in the video frame to be processed. For example, the motion vector image may be an image having the same size as the video frame to be processed, or may be a matrix corresponding to each pixel in one video frame to be processed. In some cases, the motion vector image may have a phenomenon that pixels of a part of areas have no motion vector value (may be called a gap or a hole), so that the motion vector image containing the gap or the hole may be repaired, so that the pixels of the areas also have the motion vector.

The motion compensation module 204 is configured to perform motion compensation (motion compensation, MC) on the input first video frame and second video frame according to the motion vector diagram, so as to obtain a first reference frame and a second reference frame to be interpolated.

Mask generation module 205: for generating a Mask (Mask) for masking the video frame (in whole or in part) to be processed, the masked area may be referred to as a protected area. The mask may control the area or process of the video frame to be processed, and the portions of the image may be processed separately through the mask. It should be noted that, the present application provides a fusion mask (mergetmask) that may be used to indicate whether a pixel is blocked, and may also be used to characterize similarity between two video frames and corresponding pixels in the two video frames when the two video frames are fused, for example, by using the value of a pixel in the mask to characterize the fused weight value of the two video frames.

And the two-frame fusion module 206 is used for carrying out two-frame fusion on the first reference frame and the second reference frame according to the fusion mask to obtain an interpolated third video frame.

The video frame interpolation method implemented by the system architecture shown in fig. 1 may implement the video frame display method shown in fig. 3. The method comprises the following specific steps:

Step 301: the video processing chip 103 or the graphics rendering engine 104 outputs video frames to the video interpolation algorithm module 105.

Step 302: the video interpolation algorithm module 105 receives and buffers the video frame as a target video frame, and determines whether the target video frame is the first video frame of the video frame stream, if so, then step 303 is performed, otherwise, step 304 is performed.

Step 303: the video interpolation algorithm module 105 outputs the target video frames to the display screen 106 for display.

Step 304: the video interpolation algorithm module 105 obtains interpolated video frames from the target video frames and the precursor video frames.

The precursor video frame refers to a video frame adjacent to the target video frame received by the video processing chip 103 or graphics rendering engine 104.

Step 305: the video interpolation algorithm module 105 outputs the interpolated video frames to the display screen 106 for display.

If the number of frames of the interpolated video frame is greater than the target number of frames, the process returns to step 304, otherwise, step 306 is performed. For example, the target video frame is not the first video frame of the video frame stream, the previous video frame of the video frame corresponds to a T time, the video frame corresponds to a t+1 time, where "1" in "t+1" refers to 1 selected unit time length, is an interval time length between input video frames, such as 5 milliseconds (ms), 10ms, etc., and "+" in "t+x" refers to a time length X selected unit time length after the T time, and X is a fraction of a time length, indicating a corresponding proportion of the time length relative to 1 selected unit time length.

The interpolated video frame may be one or more frames, for example, if there is only one frame of interpolated video frame, then the interpolated video frame may correspond to time t+0.5; if there are three interpolated video frames, the three interpolated video frames may correspond to a t+0.25 time, a t+0.5 time, a t+0.75 time, e.g., 10ms in the selected unit time length, then t+0.25 time is a time 2.5ms after T time, t+0.5 time is a time 5.0ms after T time, and t+0.75 time is a time 7.5ms after T time. Then in the case of three frames of video frames being interpolated, steps 304-305 are repeated three times, i.e. the video frame at time T +0.25 is interpolated and displayed, the video frame at time T +0.5 is interpolated and displayed, and the video frame at time T +0.75 is interpolated and displayed.

Step 306: the video interpolation algorithm module 105 outputs the target video frames to the display screen 106 for display.

Step 307: the video interpolation algorithm module 105 updates the target video frame to the precursor video frame.

After step 307, the process may return to re-execution of step 301 awaiting acquisition of a new target video frame.

It should be noted that, in the video frame display method shown in fig. 3, the step that is critical to achieve this purpose is step 304, where the video frame rate can be adapted to the screen refresh rate through video frame interpolation. A video frame interpolation method provided in the present application is described in detail below with reference to fig. 4.

As shown in fig. 4, the steps of a video frame interpolation method provided in the present application may be as follows:

step 401: and acquiring a first video frame at the first moment and a second video frame at the second moment.

The first time and the second time are in a tandem relation, the first time or the second time is not affected by the interpolation result of the video frames, the first time can be set to be before the second time, and the first video frames and the second video frames can be the same in size.

Step 402: and generating a first mask corresponding to the third moment according to the first motion vector diagram or the second motion vector diagram.

The first mask comprises a non-protection area, the first motion vector diagram is a motion vector diagram from the first moment to the second moment, the second motion vector diagram is a motion vector diagram from the second moment to the first moment, and the third moment is located between the first moment and the second moment.

Step 403: and performing motion compensation on the first video frame according to a third motion vector diagram to obtain a first reference frame at the third moment, and performing motion compensation on the second video frame according to a fourth motion vector diagram to obtain a second reference frame at the third moment.

The third motion vector diagram is a motion vector diagram from the third time to the first time, and the fourth motion vector diagram is a motion vector diagram from the third time to the second time.

Step 404: and carrying out image fusion on the first reference frame and the second reference frame according to the unprotected area in the first mask to obtain a third video frame at the third moment.

The first mask may be all non-protection areas, or may include protection areas and non-protection areas. When the first mask is all non-protected areas, how the video frame image fusion is performed may be indicated by the pixel values of the respective pixels. When the first mask includes a protection area and a non-protection area, the protection area may be used to indicate a pixel area in the first video frame and the second video frame that does not participate in the image fusion of the video frames, and may be a pixel area in the first video frame and the second video frame where the pixels do not move. Obviously, during the period from the first time to the second time, as the pixels in the protection area do not move, image fusion is not needed to be carried out on the protection area, and the corresponding pixel values in the protection area of the first video frame or the second video frame can be directly copied. The non-protection area indicates a pixel area which changes from the first time to the second time and can be a pixel area where pixels in the first video frame and the second video frame move, so that video frame image fusion is needed by combining the first video frame and the second video frame, and corresponding pixel values in the protection area of the third video frame are estimated.

The following definitions can be made: the first moment is T moment, the first video frame of the first moment is F ₀ The second moment is T+1 moment, and the second video frame of the second moment is F ₁ The third time between the first time and the second time is T, T epsilon (0, 1), 1 in the interval (0, 1) represents 1 selected unit time length, epsilon represents the belonging of the aggregate language, t+t, and F as the third video frame _t For example, t=0.5; the first motion vector diagram is Flow ₀₁ The second motion vector diagram is Flow ₁₀ The third motion vector diagram is Flow _t0 The fourth motion vector diagram is Flow _t1 The first mask (which may be referred to as a fusion mask) is MergeMask. In combination with the above definition, a specific implementation manner of a video frame interpolation method provided in the present application is described in detail:

in a possible implementation manner of step 402, a first mask corresponding to the third moment may be generated according to the first motion vector diagram, and specifically may be as follows:

For example, the default pixel value of the pixel in the second mask is a first preset value, and after the coordinates of the pixel at the third moment are obtained for any pixel in the pixels, the pixel value of the pixel corresponding to the pixel in the second mask is set to a second preset value, which indicates that the corresponding pixel in the second mask has an effective motion vector value.

The iterative operation may be to perform the mean value blurring process on the second mask first, and then perform the binarization process. The binarization process is more definite because the mean value blurring process can be performed.

After repeated iterative operation and corrosion treatment and mean value blurring treatment, the proximity degree between each pixel at the third moment and the corresponding pixel in the first video frame or the second video frame can be represented through the first mask, and the weight value when the pixels are fused in step 404 can be set according to the pixel value of each pixel in the first mask.

The steps of the foregoing implementation manner may be shown in fig. 5, and the steps shown in fig. 5 are merely examples of the foregoing implementation manner, and the foregoing implementation manner may also be implemented through other steps, which are not limited herein.

Step 501: a second mask is generated.

The second mask may be sized to be the first video frame F ₀ Second video frame F ₁ In agreement with the first motion vector diagram Flow ₀₁ Is uniform in size. The pixel value of each pixel of the second mask may indicate whether a pixel motion reaches the coordinate of the pixel at the third time t, where the first preset value (e.g. 0) indicates that no pixel motion reaches the coordinate of the pixel, and the second preset value (e.g. 1) indicates that there is a pixel motion to the coordinate of the pixel. The pixel value of each pixel of the second mask may be initialized to 0, indicating that no pixel is moved to the coordinates of the pixel. Then, for the Flow ₀₁ The motion vector value of each pixel on the first mask is calculated by the following formula, and if the coordinate of the pixel is (x, y) and the motion vector value of the pixel is (u, v), the coordinate (xt, yt) of the position where the pixel arrives at the third time t is calculated, and then the pixel value of the second mask at the position (xt, yt) is set to 1.

xt＝round(x+u*t)

yt＝round(y+v*t)

Wherein round represents rounding.

Step 502: and carrying out mean value blurring processing on the second mask.

Parameters in the mean value fuzzy (blur) algorithm include window size and the like, and the values of the parameters are not limited.

Step 503: and performing binarization processing on the average value after the blurring processing.

Parameters of a binarization (threshold) algorithm include a binarization threshold, and the value of the threshold is not limited.

It is apparent that step 502 and step 503 are steps of iterative operation.

Step 504: determining whether the execution times of the iterative operation are greater than or equal to the preset times.

If yes, go to step 505; otherwise, step 502 is performed.

Step 505: and etching the second mask after the iterative operation.

The corrosion (erode) algorithm includes parameters such as window size, and the values of the parameters are not limited.

Step 506: and carrying out mean value blurring treatment on the second mask subjected to corrosion treatment to obtain a first mask.

The window size in the parameters of the mean blur algorithm may be set to be consistent with the window size of the erosion algorithm in step 505.

The video frame interpolation method provided by the application can be directly operated based on the GPU and the CPU to realize video frame interpolation. In the video frame interpolation method provided by the application, the first mask can be generated according to the first motion vector diagram and the second motion vector diagram, so that a foundation is provided for generating an interpolated video frame.

In another possible implementation manner of step 402, the first mask corresponding to the third moment may be generated according to the second motion vector diagram, and specifically may be as follows:

The implementation manner of generating the first mask according to the second motion vector diagram may refer to the description of generating the first mask according to the first motion vector diagram, and the implementation manner also includes similar implementation steps as those of steps 501 to 506, and may refer to the description of steps 501 to 506, which are not repeated herein.

In the video frame interpolation method provided by the application, the numerical value on the pixel in each of the first motion vector diagram, the second motion vector diagram, the third motion vector diagram and the fourth motion vector diagram may contain 2 components, which respectively represent the motion amplitude of the pixel in the horizontal direction and the vertical direction. The first motion vector image and the second motion vector image can be directly obtained by the first video frame and the second video frame by adopting the prior art such as an optical flow method, a block matching method and the like.

One implementation of the third motion vector map generation may be as follows:

For example, a first motion vector diagram (Flow ₀₁ ) The coordinates of the fourth pixel are (x, y), the motion vector value of the fourth pixel is (u, v),the coordinates (x _t ,y _t ) The method can be obtained by calculation by the following formula:

x _t ＝x+u*t；

y _t ＝y+v*t；

wherein x is _t Representing the horizontal coordinate of the fourth pixel at the third time t, y _t The vertical coordinate of the fourth pixel at the third time t is represented, u represents the displacement of the fourth pixel in the horizontal direction from the first time to the third time, v represents the displacement of the fourth pixel in the vertical direction from the first time to the third time, and "×" represents the multiplication operation.

It should be noted that, since the value of the motion vector may not be an integer, x is _t ,y _t May not be an integer, and at least one pixel adjacent to the fourth pixel at the coordinates of the third time may be (x) _t ,y _t ) Pixels at integer positions around, e.g. (x) _t ,y _t ) And 4 pixels are adjacent to each other on the upper, lower, left and right sides, then the motion vector values of the 4 pixels are assigned (i.e. set) to (-u x t, -v x t). Note that this assignment is based on the coordinates (x _t ,y _t ) Couple (x) _t ,y _t ) Once the motion vector values of the pixels at the surrounding integer positions are assigned, when all the pixels in the first motion vector map have been traversed, a third motion vector map (Flow _t0 ) If the motion vector value of the fifth pixel is set multiple times, determining an average value of the motion vector values set multiple times as the motion vector value of the fifth pixel, if the motion vector of the fifth pixel is set only once, determining the motion vector value which is set only once as the motion vector value of the fifth pixel, and if the motion vector of the fifth pixel is not set, determining the motion vector of the fifth pixel as the motion vector value of the fifth pixel.

The implementation manner of the third motion vector diagram generation is merely an example, and there may be various implementations manners, for example, an approximate pixel corresponding to the coordinate of the fourth pixel at the third moment may be obtained, the coordinate of the approximate pixel may be a rounding of the coordinate of the fourth pixel at the third moment, and the motion vector value of the approximate pixel may be set according to the motion displacement of the adjacent coordinate of the fourth pixel coordinate in the first motion vector diagram, for example, the opposite number of the motion displacement of the adjacent coordinate to the left of the fourth pixel coordinate is used as the motion vector value of the approximate pixel.

It should be noted that, before step 403, a fourth mask corresponding to the third moment may also be generated, and the third motion vector image may be repaired according to the fourth mask, where one possible implementation manner is as follows:

For example, the pixel value of the fourth mask to which the effective motion vector value (which may be understood as a motion vector value other than 0) is set to the second preset value, and the pixel value of the fourth mask to which the coordinates of the motion vector value pixel are set may be set to the second preset value or the third preset value. The fourth mask may set the first preset value as a default value, for example, the first preset value is 0, which indicates that the pixel has no valid motion vector value, and the second preset value is 1, which indicates that the pixel has a valid motion vector value, and the pixel with the pixel value being the first preset value in the fourth mask may be selected for repairing, or the pixel with the pixel value being the first preset value and the adjacent pixel in the fourth mask may be selected for repairing, which may be flexibly implemented according to a specific scene, and is not limited herein.

One implementation of patching the third motion vector map according to the fourth mask may be as follows:

selecting a sixth pixel meeting a first patching condition in the fourth mask, searching the fourth mask to obtain a first pixel set corresponding to the sixth pixel by taking the sixth pixel as a starting point, and obtaining a seventh pixel corresponding to the sixth pixel in the third motion vector diagram and a second pixel set corresponding to the first pixel set; and determining the motion vector value of the seventh pixel according to the distance between the pixel in the second pixel set and the seventh pixel and the motion vector value of the pixel in the second pixel set.

Wherein the first patching condition comprises that no significant motion vector value (which may be understood as a motion vector value other than 0) is present for a pixel, the pixels in the first set of pixels being the closest pixels in at least one direction to the sixth pixel, and a motion vector value is present.

For example, the fourth mask is FlowMask ₀ Traversing FlowMask ₀ If the pixel value of the pixel is 1, the pixel is represented in a third motion vector diagram (Flow _t0 ) The corresponding pixel in (a) has a motion vector value, and the pixel does not need to be repaired; if the pixel is a coincident pixel value of 0, the pixel is represented in a third motion vector diagram (Flow _t0 ) If there is no valid motion vector value for the corresponding pixel, then the pixel is the sixth pixel.

The first patching condition may further include whether the pixel needs to participate in the image fusion of the first reference frame and the second reference frame, and whether the pixel needs to participate in the image fusion of the first reference frame and the second reference frame may be determined according to a first mask, if a weight value indicating that the pixel participates in the image fusion in the first mask is 0, the pixel does not participate in the image fusion of the first reference frame and the second reference frame, and the pixel is not necessarily patched.

Then, searching is performed in the up, down, left and right directions with the sixth pixel as a starting point, until a pixel (which may be referred to as an effective pixel) having a corresponding pixel value of 1 in each of the up, down, left and right directions is found or a pixel reaching the fourth mask boundary is stopped, and the obtained pixel set is the first pixel set.

The sixth pixel corresponds to the seventh pixel in the third motion vector diagram and the first set of pixels corresponds to the second set of pixels in the third motion vector diagram. The motion vector value for the seventh pixel can be calculated as follows:

the distance between the seventh pixel and the left effective pixel in the second pixel set is nLeft pixels, and the variable left=1 indicates that the effective pixel is searched to the left, and the effective pixel is in Flow _t0 The motion vector value obtained above is (uLeft, vLeft), and if no valid pixel is found by searching to the left, nleft=1, left=0 can be set; the distance between the seventh pixel and the right effective pixel in the second pixel set is nRight pixels, and the variable right=1 indicates that the effective pixel is searched to the right, and the effective pixel is in Flow _t0 The motion vector value obtained above is (upight, vRight), and if no valid pixel is found by searching to the right, nright=1, right=0 may be set; the distance between the seventh pixel and the lower effective pixel in the second pixel set is nBottom pixels, and the variable bottom=1 indicates that the effective pixel is searched downwards, and the effective pixel is in Flow _t0 The motion vector value obtained above is (uBottom, vBottom), and if no valid pixel is found in the lower search, nbottom=1, and bottom=0 may be set; the distance between the seventh pixel and the upper effective pixel in the second pixel set is nTop pixels, and the variable top=1 indicates that the upper effective pixel is searched, and the effective pixel is in the Flow _t0 The motion vector value obtained above is (uTop, vTop), and if no valid pixel is found by the upward search, ntop=1 and top=0 may be set.

Then, the sum (uSum, vSum) of the motion vectors of the seventh pixel is calculated by the following formula:

the sum of coefficients cSum of the seventh pixel can be calculated by the following formula, and if the calculated cSum value is 0, cSum is set to a non-zero value (such as 1, 2, etc.):

the motion vector value (u, v) of the seventh pixel is calculated by the following formula:

and will flow mask ₀ And assigning a value of 1 to the pixel value of the sixth pixel, wherein the value indicates that a valid motion vector value exists in a seventh pixel corresponding to the sixth pixel.

There may be various implementations of repairing the third motion vector image according to the fourth mask, and another implementation may be as follows:

selecting an eighth pixel meeting a second repairing condition in the fourth mask, and acquiring the ninth pixel corresponding to the eighth pixel in the third motion vector diagram; a motion vector value of the ninth pixel is determined from a motion vector value of at least one pixel adjacent to the ninth pixel in the first direction.

Wherein the second patching condition includes that a value of the eighth pixel indicates that the eighth pixel does not have a motion vector value in a corresponding ninth pixel in the third motion vector map, and that at least one pixel adjacent to the ninth pixel in a first direction has a motion vector value.

For example, the fourth mask FlowMask is traversed ₀ If the pixel is in the third motion vector diagram (Flow _t0 ) The corresponding pixel in the (a) has no motion vector value, and the pixels in the first direction (left and right adjacent or up and down adjacent) have motion vector values, and the pixel is an eighth pixel, so as to be left and rightFor example, the average value of the motion vector values of the pixels adjacent to the ninth pixel on the left and right may be used as the motion vector value of the ninth pixel, and the pixel value of the eighth pixel may be assigned to 1, which indicates that there is a valid motion vector value of the ninth pixel corresponding to the eighth pixel. The fourth mask can reduce the workload of repairing the motion vector image, and only the region needing to be repaired is repaired.

The two implementations of the repairing the third motion vector image according to the fourth mask may be alternatively performed or may be performed in combination, which is not limited herein.

Accordingly, one implementation of fourth motion vector diagram generation may be as follows:

The specific implementation manner of the fourth motion vector diagram generation may refer to the generation manner of the third motion vector diagram, and will not be repeated here.

It should be noted that, before step 403, a fifth mask corresponding to the third moment may also be generated, and the fourth motion vector image may be repaired according to the fifth mask, where one possible implementation manner is as follows:

The above implementation manner is similar to the corresponding steps of the fourth mask, and reference may be made to the generation of the fourth mask and the repair of the third motion vector image, which are not described herein.

selecting a twelfth pixel which meets a third repairing condition in the fifth mask; searching in the fifth mask by taking the twelfth pixel as a starting point to obtain a third pixel set corresponding to the twelfth pixel; acquiring a thirteenth pixel corresponding to the twelfth pixel in the fourth motion vector diagram and a fourth pixel set corresponding to the third pixel set; and determining the motion vector value of the thirteenth pixel according to the distance between the pixel in the fourth pixel set and the thirteenth pixel and the motion vector value of the pixel in the fourth pixel set.

Wherein the third patching condition includes that no effective motion vector value exists for a pixel, the pixel in the third set of pixels is the nearest pixel to the twelfth pixel in at least one direction, and a motion vector value exists.

selecting a fourteenth pixel which accords with a fourth repairing condition in the fifth mask, and acquiring the fifteenth pixel corresponding to the fourteenth pixel in the fourth motion vector diagram; a motion vector value of the fourteenth pixel is determined according to a motion vector value of at least one pixel adjacent to the fourteenth pixel in the second direction.

Wherein the fourth patching condition includes that the value of the fourteenth pixel indicates that the fourteenth pixel does not have a motion vector value in a corresponding fifteenth pixel in the fourth motion vector map, and that at least one pixel adjacent to the fifteenth pixel in a second direction has a motion vector value.

The two implementation manners of repairing the fourth motion vector image according to the fifth mask may be alternatively executed or may be combined to execute, which is not limited herein, and the specific implementation manner may refer to the step of repairing the third motion vector image by the fourth mask, which is not described herein.

In the video frame interpolation method provided by the application, a basis is provided for motion compensation of a video frame by patching the third motion vector diagram and the fourth motion vector diagram, and the motion compensation of the video frame can be the process described in step 403.

The execution of step 403 may be as follows:

for example, according to a third motion vector diagram Flow _t0 For the first video frame F ₀ Performing motion compensation to obtain a first reference frame Warp corresponding to a third time t ₀ The method comprises the steps of carrying out a first treatment on the surface of the Flow according to fourth motion vector diagram _t0 For the second video frame F ₁ Performing motion compensation to obtain a second reference frame Warp corresponding to a third time t ₁ 。

For any first pixel in the unprotected area of the first mask (MergeMask), the coordinates of the first pixel are set to be (x, y), then the first pixel corresponds to the second pixel P in the first reference frame _t0 Is (x, y), and the first pixel corresponds to the third pixel P in the second reference frame _t1 The coordinates of (c) are (x, y). If the second pixel is in Flow _t0 The corresponding motion vector value in (b) is (u) ₀ ,v ₀ ) Then P _t0 The pixel value of (2) is equal to F ₀ (x+u ₀ ,y+v ₀ )，F ₀ (x+u ₀ ,y+v ₀ ) Represents F ₀ In (x+u) ₀ ,y+v ₀ ) Pixel value at the position, if the third pixel is in Flow _t1 The corresponding motion vector value in (b) is (u) ₁ ,v ₁ ) Then P _t1 The pixel value of (2) is equal to F ₁ (x+u ₁ ,y+v ₁ )，F ₁ (x+u ₁ Y+v1) represents F ₁ In (x+u) ₁ ,y+v ₁ ) Pixel values at.

It is noted that byThe motion vector value is not necessarily an integer, and thus (x+u) ₀ ,y+v ₀ ) Or (x+u) ₁ ,y+v ₁ ) And may not be an integer. Then (x+u) can be used at this time ₀ ,y+v ₀ ) Or (x+u) ₁ ,y+v ₁ ) The value of the nearest integer coordinate neighbor pixel (nearest neighbor method) can also be used by bilinear interpolation and bicubic interpolation methods to utilize (x+u) ₀ ,y+v ₀ ) Or (x+u) ₁ ,y+v ₁ ) And calculating pixels of surrounding integer coordinates (such as 4 coordinates on the upper, lower, left and right of the surrounding).

After the video frame is subjected to motion compensation, the first reference frame and the second reference frame can be subjected to image fusion, and the specific process can be as follows:

in one possible implementation, each pixel corresponds to a first weight value and a second weight value in the unprotected area in the first mask; the specific steps of step 404 may be as follows:

Therefore, the first reference frame and the second reference frame can be subjected to image fusion according to the first mask, so that the accuracy of the independent reference frames can be compensated.

It should be noted that, there may be multiple setting manners of the first weight value and the second weight value, for example, the first weight value and the second weight value may be set in advance, and may also be set in the following manner:

if the first mask is generated according to the first motion vector diagram, the pixel value of the first pixel in the first mask is used for indicating the first weight value, and the second weight value is equal to 1 minus the first weight value; or if the first mask is generated according to the second motion vector diagram, the pixel value of the first pixel in the first mask is used for indicating the second weight value, and the first weight value is equal to 1 minus the second weight value.

Taking the first mask as an example according to the first motion vector image generation, it is obvious that when the pixel value of the first pixel in the first mask is 1, the pixel value of the first pixel corresponding to the second pixel in the first reference frame may be directly adopted as the pixel value of the pixel at the corresponding position in the third video frame, if the pixel value of the first pixel is 0, which indicates that the pixel value of the first pixel corresponding to the second pixel in the first video frame is invalid, the pixel value of the second pixel may have larger error, and when the pixel value of the first pixel is greater than 0 and less than 1, the pixel value of the corresponding third pixel in the second video frame may be adopted as the pixel value of the pixel at the corresponding position in the third video frame, and fusion is performed according to the pixel value of the second pixel, the weight value of the second pixel, the pixel value of the third pixel and the weight value of the third pixel.

For example, if Mergemask is according to Flow ₀₁ The pixel value of each pixel of the MergeMask is generated to represent a first weight value; conversely, if Merrgemask is according to Flow ₁₀ Generated, then the pixel value of each pixel of Mergemask represents the second weight value, and the third video frame F is set _t The pixel value of the last coordinate is P _t ，Warp ₀ The pixel value at the same coordinate is P ₀ ，Warp ₁ The pixel value at the same coordinate is P ₁ Mergemask has a pixel value of a, a e [0.0,1.0 ] at the same coordinates]。

If Mergemask is according to Flow ₀₁ Generated, then P _t Obtained by calculation by the following formula:

P _t ＝P ₀ *a+P ₁ *(1.0-a)；

if Merrgemask is generated according to Flow10, then P _t By the followingThe formula is calculated and obtained:

P _t ＝P ₀ *(1.0-a)+P ₁ *a。

thereby obtaining P _t For pixel values having a plurality of color components, the calculation of the color components may refer to the above formula.

In summary, when the video frame interpolation methods of steps 401 to 404 are applied to video frame display, the flow of the video frame display method can be as shown in fig. 6.

Step 601: and acquiring a first video frame at the first moment, a second video frame at the second moment and a third moment.

Step 602: it is determined whether the first video frame and the second video frame meet an interpolation condition.

For example, the interpolation condition may be that the sizes and scenes of the first video frame and the second video frame are consistent, and the contents of the first video frame and the second video frame are different.

If yes, go to step 603; otherwise, step 609 is performed.

Step 603: a first motion vector map and a second motion vector map are acquired.

Step 604: a first mask at a third time is generated.

Step 605: and generating a third motion vector diagram and a fourth mask and a fourth motion vector diagram and a fifth mask according to the first motion vector diagram and the second motion vector diagram.

Step 606: repairing the third motion vector diagram according to the first mask and the fourth mask, and updating the third motion vector diagram; and repairing the fourth motion vector diagram according to the first mask and the fifth mask, and updating the fourth motion vector diagram.

Step 607: performing motion compensation on the first video frame according to the third motion vector diagram to obtain a first reference frame; and performing motion compensation on the second video frame according to the fourth motion vector diagram to obtain a second reference frame.

Step 608: and according to the first mask, performing image fusion on the first reference frame and the second reference frame, obtaining a third video frame and outputting the third video frame.

Step 609: outputting the first video frame or the second video frame.

As shown in fig. 7, the present application provides a video frame interpolation apparatus, including:

an acquiring module 701, configured to acquire a first video frame at a first time and a second video frame at a second time;

A generating module 702, configured to generate a first mask corresponding to a third moment according to the first motion vector diagram or the second motion vector diagram, where the first mask includes a non-protection area; the first motion vector diagram is a motion vector diagram from the first time to the second time, the second motion vector diagram is a motion vector diagram from the second time to the first time, and the third time is between the first time and the second time;

a processing module 703, configured to perform motion compensation on the first video frame according to the third motion vector diagram to obtain a first reference frame at the third time, and perform motion compensation on the second video frame according to the fourth motion vector diagram to obtain a second reference frame at the third time, where the third motion vector diagram is a motion vector diagram from the third time to the first time, and the fourth motion vector diagram is a motion vector diagram from the third time to the second time; and

In one possible design, each pixel corresponds to a first weight value and a second weight value in the unprotected area in the first mask; the processing module 703 is specifically configured to:

In one possible design, the generating module 702 is specifically configured to:

In one possible design, the generating module 702 is further configured to:

The embodiment of the application also provides an electronic device, which may have a structure as shown in fig. 8, and may be a computer device or a chip system capable of supporting the computer device to implement the method.

The electronic device as shown in fig. 8 may comprise at least one processor 801, said at least one processor 801 being adapted to be coupled to a memory, read and execute instructions in said memory to implement the steps of video frame interpolation provided by embodiments of the present application. Optionally, the electronic device may further comprise a communication interface 802 for supporting the electronic device for signaling or data reception or transmission. A communication interface 802 in the electronic device may be used to enable interaction with other electronic devices. The processor 801 may be used to implement steps of an electronic device performing video frame interpolation. Optionally, the electronic device may further comprise a memory 803 in which computer instructions are stored, the memory 803 may be coupled to the processor 801 and/or the communication interface 802 for supporting the steps of the processor 801 invoking the computer instructions in the memory 803 to effect interpolation of video frames; in addition, the memory 803 may also be used to store data related to embodiments of the methods of the present application, for example, to store data, instructions necessary to support interaction by the communication interface 802, and/or to store configuration information necessary for the electronic device to perform the methods described by embodiments of the present application.

Embodiments of the present application also provide a computer readable storage medium, where computer instructions are stored, where the computer instructions, when executed by a computer, may cause the computer to perform the method involved in any one of the possible designs of the method embodiments and the method embodiments described above. In the embodiment of the present application, the computer readable storage medium is not limited, and may be, for example, RAM (random-access memory), ROM (read-only memory), or the like.

The present application also provides a chip that may include a processor and interface circuitry for performing the methods referred to in any one of the possible implementations of the method embodiments described above, wherein "coupled" means that the two components are directly or indirectly joined to each other, which may be fixed or movable.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, optical fiber), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The steps of a method or algorithm described in the embodiments of the present application may be embodied directly in hardware, in a software element executed by a processor, or in a combination of the two. The software elements may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a terminal device. In the alternative, the processor and the storage medium may reside in different components in a terminal device.

These computer instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the invention has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations thereof can be made without departing from the scope of the invention. Accordingly, the specification and drawings are merely exemplary illustrations of the present invention as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of video frame interpolation, comprising:

acquiring a first video frame at a first moment and a second video frame at a second moment;

generating a first mask corresponding to a third moment according to the first motion vector diagram or the second motion vector diagram, wherein the first mask comprises a non-protection area; the first motion vector diagram is a motion vector diagram from the first time to the second time, the second motion vector diagram is a motion vector diagram from the second time to the first time, and the third time is between the first time and the second time;

Performing motion compensation on the first video frame according to a third motion vector diagram to obtain a first reference frame at a third moment, and performing motion compensation on the second video frame according to a fourth motion vector diagram to obtain a second reference frame at the third moment, wherein the third motion vector diagram is a motion vector diagram from the third moment to the first moment, and the fourth motion vector diagram is a motion vector diagram from the third moment to the second moment;

and carrying out image fusion on the first reference frame and the second reference frame according to the unprotected area in the first mask to obtain a third video frame at the third moment.

2. The method of claim 1, wherein each pixel corresponds to a first weight value and a second weight value in the unprotected area in the first mask;

the image fusion of the first reference frame and the second reference frame according to the unprotected area in the first mask includes:

acquiring a second pixel at a corresponding position in the first reference frame and a third pixel at a corresponding position in the second reference frame according to the position of the first pixel in the unprotected area; the first pixel is any pixel in the unprotected area;

Performing weighted operation on the pixel value of the second pixel according to the first weight value corresponding to the first pixel to obtain a first operation result;

performing weighted operation on the pixel value of the third pixel according to the second weight value corresponding to the first pixel to obtain a second operation result;

and determining the sum of the first operation result and the second operation result as a pixel value of a pixel at a corresponding position in the third video frame.

3. The method of claim 2, wherein:

if the first mask is generated according to the first motion vector diagram, the pixel value of the first pixel in the first mask is used for indicating the first weight value, and the second weight value is equal to 1 minus the first weight value; or alternatively

If the first mask is generated according to the second motion vector diagram, the pixel value of the first pixel in the first mask is used for indicating the second weight value, and the first weight value is equal to 1 minus the second weight value.

4. A method according to any one of claims 1-3, wherein generating a first mask corresponding to the third moment from the first motion vector map comprises:

Respectively determining the coordinates of each pixel at the third moment according to the coordinates and the motion vector values of each pixel in the first motion vector diagram, and generating a second mask corresponding to the third moment according to the coordinates of each pixel at the third moment;

performing at least one iteration on the second mask; the iterative operation comprises mean value blurring processing and binarization processing of the second mask;

and performing corrosion treatment and mean value blurring treatment on the second mask after the iterative operation to obtain the first mask corresponding to the third moment.

5. A method according to any one of claims 1-3, wherein generating the first mask corresponding to the third moment from the second motion vector map comprises:

respectively determining the coordinates of each pixel at the third moment according to the coordinates and the motion vector values of each pixel in the second motion vector diagram, and generating a third mask corresponding to the third moment according to the coordinates of each pixel at the third moment;

performing at least one iteration on the third mask; the iterative operation comprises mean value blurring processing and binarization processing of the third mask;

And performing corrosion treatment and mean value blurring treatment on the third mask after the iterative operation to obtain the first mask corresponding to the third moment.

6. The method of any one of claims 1-5, wherein the method further comprises:

generating the first motion vector diagram from the first moment to the second moment according to the first video frame and the second video frame;

determining the coordinate of a fourth pixel in the third moment according to the coordinate and the motion vector value of the fourth pixel in the first motion vector diagram, wherein the fourth pixel is any pixel in the first motion vector diagram;

acquiring at least one pixel adjacent to the coordinates of the fourth pixel at the third moment, and setting a motion vector value of the at least one pixel according to the motion displacement of the fourth pixel from the first moment to the third moment;

if the motion vector value of the fifth pixel in the third motion vector diagram is set multiple times, determining an average value of the multiple-time set motion vector values as the motion vector value of the fifth pixel.

7. The method according to any one of claims 1-6, wherein before performing motion compensation on the first video frame according to a third motion vector map to obtain the first reference frame at the third time instant, the method further comprises:

Generating a fourth mask corresponding to the third moment according to the third motion vector diagram;

selecting pixels meeting the repair condition in the third motion vector diagram according to the fourth mask;

updating the motion vector value of the pixel meeting the patching condition according to the motion vector value of at least one pixel associated with the pixel meeting the patching condition.

8. The method of claim 7, wherein:

and selecting pixels meeting the repair condition in the third motion vector diagram according to the fourth mask, including:

searching in the fourth mask by taking the sixth pixel as a starting point to obtain a first pixel set corresponding to the sixth pixel, wherein the pixels in the first pixel set are the pixels closest to the sixth pixel in at least one direction, and motion vector values exist;

Acquiring a seventh pixel corresponding to the sixth pixel in the third motion vector diagram and a second pixel set corresponding to the first pixel set;

and determining the motion vector value of the seventh pixel according to the distance between the pixel in the second pixel set and the seventh pixel and the motion vector value of the pixel in the second pixel set.

9. The method of claim 7 or 8, wherein:

acquiring the ninth pixel corresponding to the eighth pixel in the third motion vector diagram;

A motion vector value of the ninth pixel is determined from a motion vector value of at least one pixel adjacent to the ninth pixel in the first direction.

10. The method of any one of claims 1-5, wherein the method further comprises:

generating the second motion vector diagram from the second moment to the first moment according to the first video frame and the second video frame;

determining the coordinate of a tenth pixel in the third moment according to the coordinate and the motion vector value of the tenth pixel in the second motion vector diagram, wherein the tenth pixel is any pixel in the second motion vector diagram;

acquiring at least one pixel adjacent to the coordinate of the tenth pixel at the third moment, and setting a motion vector value of the at least one pixel according to the motion displacement of the tenth pixel from the first moment to the third moment;

if the motion vector value of the eleventh pixel in the fourth motion vector diagram is set a plurality of times, determining an average value of the plurality of set motion vector values as the motion vector value of the eleventh pixel.

11. The method according to any one of claims 1-5 and 10, wherein before motion compensating the second video frame according to a fourth motion vector diagram to obtain the second reference frame at the third time instant, further comprising:

Generating a fifth mask corresponding to the third moment according to the fourth motion vector diagram;

selecting pixels meeting the repair condition in the fourth motion vector diagram according to the fifth mask;

12. The method of claim 11, wherein:

and selecting pixels meeting the repair condition in the fourth motion vector diagram according to the fifth mask, wherein the pixels comprise:

searching a third pixel set corresponding to the twelfth pixel in the fifth mask by taking the twelfth pixel as a starting point, wherein the pixels in the third pixel set are the pixels closest to the twelfth pixel in at least one direction, and motion vector values exist;

Acquiring a thirteenth pixel corresponding to the twelfth pixel in the fourth motion vector diagram and a fourth pixel set corresponding to the third pixel set;

and determining the motion vector value of the thirteenth pixel according to the distance between the pixel in the fourth pixel set and the thirteenth pixel and the motion vector value of the pixel in the fourth pixel set.

13. The method of claim 11 or 12, wherein:

Acquiring the fifteenth pixel corresponding to the fourteenth pixel in the fourth motion vector diagram;

a motion vector value of the fourteenth pixel is determined according to a motion vector value of at least one pixel adjacent to the fourteenth pixel in the second direction.

14. A video frame interpolation apparatus, comprising:

15. The apparatus of claim 14, wherein each pixel corresponds to a first weight value and a second weight value in the unprotected area in the first mask; the processing module is specifically configured to:

16. The apparatus as recited in claim 15, wherein:

17. The apparatus according to any of claims 14-16, wherein the generating module is specifically configured to:

18. The apparatus according to any of claims 14-16, wherein the generating module is specifically configured to:

19. The apparatus of any of claims 14-18, wherein the generating module is further to:

20. The apparatus of any of claims 14-19, wherein the generating module is further to:

21. The apparatus of claim 20, wherein the generating module is specifically configured to:

22. The apparatus according to claim 20 or 21, wherein the generating module is specifically configured to:

23. The apparatus of any of claims 14-18, wherein the generating module is further to:

24. The apparatus of any one of claims 14-18, 23, wherein the generating module is further to:

25. The apparatus of claim 24, wherein the generating module is specifically configured to:

26. The apparatus according to claim 24 or 25, wherein the generating module is specifically configured to:

27. An electronic device, the electronic device comprising: one or more processors; one or more memories; wherein the one or more memories store one or more computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-13.

28. A computer readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 13.