CN116980549A

CN116980549A - Video frame processing method, device, computer equipment and storage medium

Info

Publication number: CN116980549A
Application number: CN202310253892.5A
Authority: CN
Inventors: 姜博源; 罗栋豪; 邰颖; 汪铖杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2023-10-31

Abstract

The application relates to an artificial intelligence-based video frame processing method, an artificial intelligence-based video frame processing device, a video frame processing computer device, a video frame processing storage medium and a video frame processing computer program product. The method comprises the following steps: inserting an inserting frame video frame between the first video frame and the second video frame according to the first video frame and the second video frame in the video; extracting video frame characteristics from the inserted frame video frames, and carrying out video frame quality detection on the inserted frame video frames based on the video frame characteristics to obtain first video frame quality characterization quantities corresponding to the inserted frame video frames; acquiring an optical flow field of an inserted frame corresponding to the inserted frame video frame, extracting optical flow characteristics based on the optical flow field of the inserted frame, and detecting the quality of the video frame of the inserted frame according to the optical flow characteristics to acquire a second video frame quality characterization corresponding to the inserted frame video frame; and fusing the first video frame quality characterization quantity and the second video frame quality characterization quantity to obtain an interpolated video frame quality characterization quantity corresponding to the interpolated video frame. By adopting the method, the accuracy of detecting the quality of the video frames generated by the interpolation frames can be improved.

Description

Video frame processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to a video frame processing method, apparatus, computer device, storage medium, and computer program product, and a quality detection model processing method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of computer technology, image quality evaluation (Image Quality Assessment, IQA) technology has emerged. Image quality evaluation is one of the basic technologies in image processing, and mainly by performing characteristic analysis research on an image, then evaluating the image quality.

Conventional image quality evaluation, for example, is based on deep learning reference-free quality evaluation algorithms, which are mainly used for evaluating the image quality of an image or video globally. However, for the problem of frame interpolation in video, the global quality of the generated intermediate frame is mainly determined by the video itself of the frame to be interpolated, and the local details of the intermediate frame need to be evaluated, so the current no-reference evaluation algorithm is not suitable for evaluating the frame interpolation result.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video frame processing method, apparatus, computer device, computer readable storage medium, and computer program product, and a quality detection model processing method, apparatus, computer device, computer readable storage medium, and computer program product that can improve accuracy.

The application provides a video frame processing method, which comprises the following steps:

inserting an inserting frame video frame between a first video frame and a second video frame according to the first video frame and the second video frame in the video;

extracting video frame characteristics from the frame inserting video frames, and carrying out video frame quality detection on the frame inserting video frames based on the video frame characteristics to obtain first video frame quality characterization quantities corresponding to the frame inserting video frames;

acquiring an interpolated optical flow field corresponding to the interpolated video frame, extracting optical flow characteristics based on the interpolated optical flow field, and performing video frame quality detection on the interpolated video frame according to the optical flow characteristics to obtain a second video frame quality characterization corresponding to the interpolated video frame;

and fusing the first video frame quality characterization quantity and the second video frame quality characterization quantity to obtain an interpolated video frame quality characterization quantity corresponding to the interpolated video frame.

The application also provides a video frame processing device, which comprises:

the inserting module is used for inserting an inserting frame video frame between a first video frame and a second video frame according to the first video frame and the second video frame in the video;

The first detection module is used for extracting video frame characteristics from the frame inserting video frames, detecting the video frame quality of the frame inserting video frames based on the video frame characteristics, and obtaining a first video frame quality characterization quantity corresponding to the frame inserting video frames;

the second detection module is used for acquiring an optical flow field of the frame corresponding to the frame inserting video frame, extracting optical flow characteristics based on the optical flow field of the frame inserting, and carrying out video frame quality detection on the frame inserting video frame according to the optical flow characteristics to obtain a second video frame quality characterization quantity corresponding to the frame inserting video frame;

and the fusion module is used for fusing the first video frame quality characterization quantity and the second video frame quality characterization quantity to obtain an inserted frame video frame quality characterization quantity corresponding to the inserted frame video frame.

In one embodiment, the inserting module is further configured to obtain a first video frame at a first time and a second video frame at a second time in the video; transforming the first video frame from the first time to a frame inserting time to obtain a first predicted video frame corresponding to the frame inserting time; the frame inserting time is between a first time and the second time; transforming the second video frame from the second time to the frame inserting time to obtain a second predicted video frame corresponding to the frame inserting time; and fusing the first predicted video frame and the second predicted video frame to obtain an inserted frame video frame corresponding to the inserted frame moment.

In one embodiment, the inserting module is further configured to perform spatial location mapping on the first video frame to obtain a first optical flow field from the first time to the frame inserting time; and based on the first optical flow field, converting the first video frame into a first predicted video frame corresponding to the frame inserting moment.

In one embodiment, the inserting module is further configured to obtain a second optical flow field from the second time to the frame inserting time, where the second optical flow field is obtained by performing spatial location mapping on the second video frame; and based on the second optical flow field, converting the second video frame into a second predicted video frame corresponding to the frame inserting moment.

In one embodiment, the second detection module is further configured to fuse the first optical flow field and the second optical flow field to obtain an interpolated optical flow field corresponding to the interpolated video frame.

In one embodiment, the first detection module is further configured to perform video frame quality detection of at least one dimension on the interpolated video frame based on the video frame feature, to obtain a video frame quality representation corresponding to the interpolated video frame in at least one dimension; the video frame quality characterization of the at least one dimension includes at least one of brightness, contrast, saturation, ambiguity, or resolution of the interpolated video frame; and fusing the video frame quality characterization quantity corresponding to at least one dimension of the frame inserting video frame to obtain a first video frame quality characterization quantity corresponding to the frame inserting video frame.

In one embodiment, the first detection module is further configured to perform local quality detection on at least one area of the frame-inserted video frame based on the video frame feature, to obtain a quality representation corresponding to the at least one area of the frame-inserted video frame; and forming a first video frame quality representation corresponding to the frame inserting video frame according to the quality representation corresponding to the at least one region of the frame inserting video frame.

In one embodiment, the first detection module is further configured to perform region identification on the frame-inserted video frame based on the video frame feature to obtain a foreground region and a background region in the frame-inserted video frame; according to the foreground characteristics of the foreground region, carrying out local quality detection on the foreground region to obtain a quality characterization quantity corresponding to the foreground region; and carrying out local quality detection on the background region according to the background characteristics of the background region to obtain a quality characterization quantity corresponding to the background region.

In one embodiment, the second detection module is configured to extract an optical flow feature based on the interpolated optical flow field, and perform at least one type of video frame quality detection on the interpolated video frame according to the optical flow feature, so as to obtain a quality representation of the at least one type of video frame corresponding to the interpolated video frame; the at least one type of video frame quality attribute comprises at least one of a deformation, an integrity, or a contour sharpness of the interpolated video frame; and fusing the quality characterization quantity of the video frames corresponding to the at least one type of video frames to obtain a second quality characterization quantity of the video frames corresponding to the frame frames.

In one embodiment, the apparatus further comprises a correction module; the correction module is used for reserving the frame inserting video frame when the frame inserting video frame quality characterization quantity corresponding to the frame inserting video frame meets the frame inserting quality condition; and when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, correcting the frame inserting video frame according to the first video frame and the second video frame to obtain a corrected frame inserting video frame.

The application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

inserting an inserting frame video frame between a first video frame and a second video frame according to the first video frame and the second video frame in the video; extracting video frame characteristics from the frame inserting video frames, and carrying out video frame quality detection on the frame inserting video frames based on the video frame characteristics to obtain first video frame quality characterization quantities corresponding to the frame inserting video frames; acquiring an interpolated optical flow field corresponding to the interpolated video frame, extracting optical flow characteristics based on the interpolated optical flow field, and performing video frame quality detection on the interpolated video frame according to the optical flow characteristics to obtain a second video frame quality characterization corresponding to the interpolated video frame; and fusing the first video frame quality characterization quantity and the second video frame quality characterization quantity to obtain an interpolated video frame quality characterization quantity corresponding to the interpolated video frame.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

The application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:

The video frame processing method, the video frame processing device, the computer equipment, the storage medium and the computer program product insert the frame inserting video frame between the first video frame and the second video frame according to the first video frame and the second video frame in the video, so that the frame inserting is formed in the video, and the transition between the first video frame and the second video frame is more natural and coordinated. Extracting video frame characteristics from the inserted frame video frames, detecting the quality of the inserted frame video frames based on the video frame characteristics, obtaining a first video frame quality characterization quantity corresponding to the inserted frame video frames, and detecting the quality of the video frames from the generated video frame content. The method comprises the steps of obtaining an optical flow field of an inserted frame corresponding to the inserted frame video frame, extracting optical flow characteristics based on the optical flow field of the inserted frame, detecting the quality of the inserted frame video frame according to the optical flow characteristics, obtaining a second video frame quality characterization quantity corresponding to the inserted frame video frame, wherein the optical flow is used as a characteristic for describing the relative motion between frames, is not easily influenced by the quality of an original video frame, and can detect the quality of the generated inserted frame video frame from the aspect of motion information of the video frame. The first video frame quality characterization quantity and the second video frame quality characterization quantity are fused, so that the frame inserting video frame quality characterization quantity corresponding to the frame inserting video frame can be accurately determined, and the quality of the frame inserting result can be more accurately reflected.

The application provides a quality detection model processing method, which comprises the following steps:

inserting a sample frame inserting video frame between a first sample video frame and a second sample video frame in a sample video according to the first sample video frame and the second sample video frame in the sample video;

extracting sample video frame characteristics from the sample frame inserting video frames, and carrying out video frame quality detection on the sample frame inserting video frames based on the sample video frame characteristics to obtain first predicted video frame quality characterization quantities corresponding to the sample frame inserting video frames;

acquiring a sample frame inserting optical flow field corresponding to the sample frame inserting video frame, extracting sample optical flow characteristics based on the sample frame inserting optical flow field, and carrying out video frame quality detection on the sample frame inserting video frame according to the sample optical flow characteristics to acquire a second predicted video frame quality characterization quantity corresponding to the sample frame inserting video frame;

obtaining a reference quality characterization quantity corresponding to the sample frame inserting video frame, and performing model training based on the difference between the first predicted video frame quality characterization quantity and the reference quality characterization quantity and the difference between the second predicted video frame quality characterization quantity and the reference quality characterization quantity to obtain a video frame processing model.

The application also provides a quality detection model processing device, which comprises:

the sample inserting module is used for inserting sample inserting frame video frames between a first sample video frame and a second sample video frame according to the first sample video frame and the second sample video frame in the sample video;

the first sample detection module is used for extracting sample video frame characteristics from the sample frame inserting video frames, and carrying out video frame quality detection on the sample frame inserting video frames based on the sample video frame characteristics to obtain first predicted video frame quality characterization quantities corresponding to the sample frame inserting video frames;

the second sample detection module is used for acquiring a sample frame inserting optical flow field corresponding to the sample frame inserting video frame, extracting sample optical flow characteristics based on the sample frame inserting optical flow field, and carrying out video frame quality detection on the sample frame inserting video frame according to the sample optical flow characteristics to acquire a second predicted video frame quality representation corresponding to the sample frame inserting video frame;

the training module is used for acquiring a reference quality representation quantity corresponding to the sample frame inserting video frame, and performing model training based on the difference between the first predicted video frame quality representation quantity and the reference quality representation quantity and the difference between the second predicted video frame quality representation quantity and the reference quality representation quantity to obtain a video frame processing model.

In one embodiment, the training module is further configured to obtain, from the sample video, a reference video frame corresponding to the sample frame insertion video frame; and detecting the video frame quality of the sample frame inserting video frame based on the reference video frame to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

In one embodiment, the training module is further configured to extract a reference video segment from the sample video, where the reference video segment includes the first sample video frame, the second sample video frame, and the reference video frame; generating a predicted video segment according to the first sample video frame, the second sample video frame and the sample interpolation video frame; and detecting the video frame quality of the sample frame inserting video frame according to the reference video fragment and the predicted video fragment to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

Inserting a sample frame inserting video frame between a first sample video frame and a second sample video frame in a sample video according to the first sample video frame and the second sample video frame in the sample video; extracting sample video frame characteristics from the sample frame inserting video frames, and carrying out video frame quality detection on the sample frame inserting video frames based on the sample video frame characteristics to obtain first predicted video frame quality characterization quantities corresponding to the sample frame inserting video frames; acquiring a sample frame inserting optical flow field corresponding to the sample frame inserting video frame, extracting sample optical flow characteristics based on the sample frame inserting optical flow field, and carrying out video frame quality detection on the sample frame inserting video frame according to the sample optical flow characteristics to acquire a second predicted video frame quality characterization quantity corresponding to the sample frame inserting video frame; obtaining a reference quality characterization quantity corresponding to the sample frame inserting video frame, and performing model training based on the difference between the first predicted video frame quality characterization quantity and the reference quality characterization quantity and the difference between the second predicted video frame quality characterization quantity and the reference quality characterization quantity to obtain a video frame processing model.

The quality detection model processing method, the quality detection model processing device, the computer equipment, the storage medium and the computer program product insert the sample frame inserting video frames between the first sample video frames and the second sample video frames according to the first sample video frames and the second sample video frames in the sample video, so that the frame inserting frames are formed in the sample video, and the transition between the first sample video frames and the second sample video frames is more natural and coordinated. Sample video frame characteristics are extracted from sample frame inserting video frames, video frame quality detection is carried out on the sample frame inserting video frames based on the sample video frame characteristics, a first predicted video frame quality characterization quantity corresponding to the sample frame inserting video frames is obtained, and the quality of the video frames can be detected from the generated video frame content. The optical flow is used as a characteristic for describing the relative motion between frames, is not easily influenced by the quality of an original video frame, the sample optical flow field corresponding to the sample frame inserting video frame is obtained, the sample optical flow characteristic is extracted based on the sample frame inserting optical flow field, the sample frame inserting video frame is subjected to video frame quality detection according to the sample optical flow characteristic, the second predicted video frame quality representation quantity corresponding to the sample frame inserting video frame is obtained, and the quality of the generated video frame can be detected from the aspect of the motion information of the video frame. The method comprises the steps of obtaining a reference quality representation quantity corresponding to a sample frame-inserting video frame, performing model training based on the difference between a first predicted video frame quality representation quantity and the reference quality representation quantity and the difference between a second predicted video frame quality representation quantity and the reference quality representation quantity, and obtaining a video frame processing model, wherein the frame-inserting video frame quality representation quantity corresponding to the frame-inserting video frame can be accurately determined, and the quality of a frame-inserting result can be more accurately reflected.

Drawings

FIG. 1 is an application environment diagram of a video frame processing method in one embodiment;

FIG. 2 is a flow chart of a video frame processing method in one embodiment;

FIG. 3 is a schematic diagram of an embodiment generating an interpolated video frame for an interpolated time;

FIG. 4 is a schematic diagram of an embodiment of generating an interpolated video frame and an interpolated optical flow field;

FIG. 5 is a flow diagram of inserting an inter-frame video frame between a first video frame and a second video frame in a video according to the first video frame and the second video frame in an embodiment;

FIG. 6 is a schematic diagram of an embodiment of a fusion of a first optical flow field and a second optical flow field;

FIG. 7 is a schematic diagram of an embodiment for performing local quality detection on a foreground region and a background region of an interpolated video frame to obtain corresponding quality metrics;

FIG. 8 is a flow chart of a quality inspection model processing method in one embodiment;

FIG. 9 is a flowchart of detecting video frame quality of a sample frame based on a reference video frame to obtain a reference quality attribute corresponding to the sample frame;

FIG. 10 is a diagram illustrating a comparison between a reference video segment and a predicted video segment in one embodiment;

FIG. 11 is a flow chart of a quality inspection model processing method according to another embodiment;

FIG. 12 is a flowchart of a video frame processing method according to another embodiment;

FIG. 13 is a block diagram showing the structure of an apparatus for processing video frames in one embodiment;

FIG. 14 is a block diagram showing a structure of a quality inspection model processing apparatus in one embodiment;

fig. 15 is an internal structural view of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like. For example, it is applicable to the field of artificial intelligence (Artificial Intelligence, AI) technology, where artificial intelligence is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The scheme provided by the embodiment of the application relates to an artificial intelligence video frame processing method and a quality detection model processing method, and is specifically described by the following embodiments.

The video frame processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. Both the terminal 102 and the server 104 may independently perform the video frame processing method provided in the embodiment of the present application. The terminal 102 and the server 104 may also cooperate to perform the video frame processing method provided in the embodiments of the present application. When the terminal 102 and the server 104 cooperate to perform the video frame processing method provided in the embodiment of the present application, the terminal 102 acquires a first video frame and a second video frame in a video, and sends the first video frame and the second video frame to the server 104. The server 104 inserts an interpolated video frame between the first video frame and the second video frame based on the first video frame and the second video frame in the video. The server 104 extracts video frame characteristics from the frame inserting video frames, and performs video frame quality detection on the frame inserting video frames based on the video frame characteristics to obtain first video frame quality characterization quantities corresponding to the frame inserting video frames. The server 104 obtains an optical flow field of the frame corresponding to the frame-inserted video frame, extracts optical flow characteristics based on the optical flow field of the frame-inserted video frame, and performs video frame quality detection on the frame-inserted video frame according to the optical flow characteristics to obtain a second video frame quality characterization quantity corresponding to the frame-inserted video frame. The server 104 merges the first video frame quality characterization and the second video frame quality characterization to obtain an interpolated video frame quality characterization corresponding to the interpolated video frame. The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

The quality detection model processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The terminal 102 and the server 104 may each independently execute the quality detection model processing method provided in the embodiment of the present application. The terminal 102 and the server 104 may also cooperate to perform the quality detection model processing method provided in the embodiments of the present application. When the terminal 102 and the server 104 cooperate to perform the quality detection model processing method provided in the embodiment of the present application, the terminal 102 acquires a first sample video frame and a second sample video frame in the sample video, and sends the first sample video frame and the second sample video frame to the server 104. The server 104 inserts sample plug-in video frames between the first sample video frames and the second sample video frames based on the first sample video frames and the second sample video frames in the sample video. The server 104 extracts sample video frame characteristics from the sample frame inserting video frames, and performs video frame quality detection on the sample frame inserting video frames based on the sample video frame characteristics to obtain first predicted video frame quality characterization quantities corresponding to the sample frame inserting video frames. The server 104 obtains a sample frame inserting optical flow field corresponding to the sample frame inserting video frame, extracts sample optical flow characteristics based on the sample frame inserting optical flow field, and performs video frame quality detection on the sample frame inserting video frame according to the sample optical flow characteristics to obtain a second predicted video frame quality characterization quantity corresponding to the sample frame inserting video frame. The server 104 obtains a reference quality characterization quantity corresponding to the sample frame inserting video frame, and performs model training based on the difference between the first predicted video frame quality characterization quantity and the reference quality characterization quantity and the difference between the second predicted video frame quality characterization quantity and the reference quality characterization quantity to obtain a video frame processing model.

In one embodiment, as shown in fig. 2, a video frame processing method is provided, and the method is applied to the computer device in fig. 1 (the computer device may be a terminal or a server in fig. 1) for illustration, and includes the following steps:

step S202, inserting an inserting frame video frame between the first video frame and the second video frame according to the first video frame and the second video frame in the video.

The frame inserting video frame refers to an inserting image generated based on the first video frame and the second video frame.

Specifically, the computer device may obtain a video that requires frame insertion and obtain a first video frame and a second video frame from the video. And performing frame inserting processing according to the first video frame and the second video frame so as to insert the frame inserting video frame between the first video frame and the second video frame.

In this embodiment, the computer device may acquire the adjacent first video frame and second video frame from the video, or may acquire the first video frame and second video frame that are not adjacent.

In this embodiment, the frame-inserted video frame is any one of RGB (Red, green, blue) image, RAW image, gray-scale image, depth image, YUV image, and the like. The RAW image is RAW data of converting the captured light source signal into a digital signal by the image sensor. "Y" in YUV images represents brightness (Luminance or Luma), i.e., gray scale values, and "U" and "V" represent chromaticity (Chroma) to describe the image color and saturation for the color of a given pixel.

In this embodiment, as shown in fig. 3, the computer device may acquire a first video frame at a first time and a second video frame at a second time in the video. And performing frame inserting processing according to the first video frame and the second video frame to generate an inserted frame video frame at the frame inserting time. Wherein the frame insertion time is between the first time and the second time.

Step S204, extracting video frame characteristics from the frame inserting video frames, and carrying out video frame quality detection on the frame inserting video frames based on the video frame characteristics to obtain first video frame quality characterization values corresponding to the frame inserting video frames.

The video frame quality detection means detection of quality of a video frame, and may include at least one of brightness, contrast, saturation, ambiguity, resolution, deformation, integrity, or contour sharpness of the video frame. The brightness of a video frame, i.e. the brightness of an image, characterizes the brightness of the image. Contrast, i.e. image contrast, characterizes the difference between the brightest and darkest of various colors in an image, the greater the difference the higher the contrast.

Saturation, i.e. the saturation of a color, refers to the vividness of the color, also called purity. Ambiguity refers to the degree to which an interpolated video frame has a blurring distortion.

The deformation amount is used for representing whether the frame inserting video frame has deformation or not and the degree of deformation. The deformation amount may specifically be a combination of deformation degrees of respective subjects in the interpolated video frame. Integrity is used to characterize whether or not there is a content loss in an interpolated video frame, and the extent of the content loss. Profile sharpness is used to represent the sharpness of edge profiles in an interpolated video frame, the sharpness of each subject, etc., as well as the degree of sharpness.

The video frame features characterize key features in the interpolated video frame. The video frame features may include at least one of a brightness feature, a contrast feature, a saturation feature, a blur feature, or a resolution feature of the interpolated video frame.

The first video frame quality characterization quantity refers to a characterization quantity for characterizing the quality of the interpolated video frame, and is obtained based on the detection of the interpolated video frame.

Specifically, the computer device performs feature extraction on the inserted frame video frame to obtain corresponding video frame features. The computer equipment performs video frame quality detection on the inserted frame video frames based on the video frame characteristics to obtain first video frame quality characterization quantities corresponding to the inserted frame video frames.

Step S206, obtaining an optical flow field of the frame corresponding to the frame-inserted video frame, extracting optical flow characteristics based on the optical flow field of the frame-inserted video frame, and carrying out video frame quality detection on the frame-inserted video frame according to the optical flow characteristics to obtain a second video frame quality characterization quantity corresponding to the frame-inserted video frame.

The frame-inserted optical flow field refers to an optical flow field corresponding to the frame-inserted video frame. The optical flow field refers to a two-dimensional (2D) instantaneous velocity field formed by all pixel points in an image, wherein the two-dimensional velocity vector is the projection of a three-dimensional velocity vector of a visible point in a scene on an imaging surface. The optical flow field not only contains the motion information of the observed object, but also contains rich information about the three-dimensional structure of the scenery. The optical flow features characterize key features in the interpolated optical flow field.

The second video frame quality characterization quantity is a characterization quantity for characterizing the quality of the interpolated video frame, and is obtained based on the detection of the interpolated optical flow field of the interpolated video frame.

The first video frame quality characterization quantity and the second video frame quality characterization quantity are both characterization quantities for reflecting the global quality of the video frames, namely, the overall quality of the inserted frame video frames.

Specifically, the computer equipment acquires an interpolated optical flow field corresponding to the interpolated video frame, and performs feature extraction on the interpolated optical flow field to obtain a corresponding optical flow feature. And the computer equipment performs video frame quality detection on the inserted frame video frames according to the optical flow characteristics to obtain second video frame quality characterization quantities corresponding to the inserted frame video frames.

As shown in fig. 4, the frame inserting process is performed according to the first video frame and the second video frame, so as to generate an inserted frame video frame between the first video frame and the second video frame, and an inserted frame optical flow field corresponding to the inserted frame video frame.

Step S208, the first video frame quality characterization quantity and the second video frame quality characterization quantity are fused, and the frame inserting video frame quality characterization quantity corresponding to the frame inserting video frame is obtained.

Specifically, the computer equipment performs fusion processing on a first video frame quality characterization quantity and a second video frame quality characterization quantity corresponding to the frame inserting video frames to obtain frame inserting video frame quality characterization quantities corresponding to the frame inserting video frames.

In this embodiment, the computer device may obtain a weight parameter corresponding to the first video frame quality attribute and a weight parameter corresponding to the second video frame quality attribute, and perform fusion processing on the first video frame quality attribute and the second video frame quality attribute according to each weight parameter to obtain an interpolated video frame quality attribute corresponding to the interpolated video frame.

Further, weighting and summing are carried out according to the first video frame quality characterization quantity, the second video frame quality characterization quantity and the weight parameters of the second video frame quality characterization quantity, and accordingly the frame inserting video frame quality characterization quantity corresponding to the frame inserting video frame is obtained.

In this embodiment, according to the first video frame and the second video frame in the video, the frame inserting video frame is inserted between the first video frame and the second video frame, so that the frame inserting is formed in the video, and the transition between the first video frame and the second video frame is more natural and coordinated. Extracting video frame characteristics from the inserted frame video frames, detecting the quality of the inserted frame video frames based on the video frame characteristics, obtaining a first video frame quality characterization quantity corresponding to the inserted frame video frames, and detecting the quality of the video frames from the generated video frame content. The method comprises the steps of obtaining an optical flow field of an inserted frame corresponding to the inserted frame video frame, extracting optical flow characteristics based on the optical flow field of the inserted frame, detecting the quality of the video frame of the inserted frame according to the optical flow characteristics, obtaining a second video frame quality characterization quantity corresponding to the inserted frame video frame, wherein the optical flow is used as a characteristic for describing the relative motion between frames, is not easily influenced by the quality of an original video frame, and can detect the quality of the generated video frame from the aspect of motion information of the video frame. The first video frame quality characterization quantity and the second video frame quality characterization quantity are fused, so that the frame inserting video frame quality characterization quantity corresponding to the frame inserting video frame can be accurately determined, and the quality of the frame inserting result can be more accurately reflected.

In one embodiment, as shown in fig. 5, inserting an interpolated video frame between a first video frame and a second video frame in a video according to the first video frame and the second video frame, includes:

step S502, a first video frame at a first time and a second video frame at a second time in a video are acquired.

Specifically, the computer device acquires two video frames at different moments from the video, wherein the two acquired different moments are respectively used as a first moment and a second moment, the video frame at the first moment is a first video frame, and the video frame at the second moment is a second video frame.

In this embodiment, the first time and the second time are adjacent times, and the first video frame and the second video frame are adjacent video frames.

In this embodiment, among the two acquired video frames at different times, the video frame at the previous time is taken as the first video frame at the first time, and the video frame at the subsequent time is taken as the second video frame at the second time.

Step S504, converting the first video frame from the first time to the frame inserting time to obtain a first predicted video frame corresponding to the frame inserting time; the frame insertion time is between the first time and the second time.

The first predicted video frame is a video frame formed based on the first video frame transformation and at the frame inserting time.

In particular, the computer device may determine a moment of frame insertion between the first moment and the second moment. The computer device transforms the first video frame from the first time to the interpolation time to form a first predicted video frame corresponding to the interpolation time.

In this embodiment, the computer device extracts a first video frame feature from the first video frame, and performs video frame reconstruction at the frame insertion time based on the first video frame feature, to obtain a first predicted video frame corresponding to the frame insertion time.

In step S506, the second video frame is transformed from the second time to the frame inserting time, and the second predicted video frame corresponding to the frame inserting time is obtained.

The second predicted video frame is a video frame formed based on the second video frame transformation and at the frame inserting time.

In particular, the computer device may transform the second video frame from the second time instant to an interpolation time instant, forming a second predicted video frame corresponding to the interpolation time instant.

In this embodiment, the computer device extracts a second video frame feature from the second video frame, and performs video frame reconstruction at the frame insertion time based on the second video frame feature, to obtain a second predicted video frame corresponding to the frame insertion time.

Step S508, fusing the first predicted video frame and the second predicted video frame to obtain an inserted frame video frame corresponding to the inserted frame time.

Specifically, the computer equipment performs fusion processing on the first predicted video frame and the second predicted video frame to obtain an inserted frame video frame corresponding to the inserted frame time.

Further, the weight corresponding to the first video frame and the weight corresponding to the second video frame can be obtained, and fusion processing is carried out on the first predicted video frame and the second predicted video frame according to each weight, so that the frame inserting video frame corresponding to the frame inserting moment is obtained.

In this embodiment, a first video frame at a first time and a second video frame at a second time in a video are obtained, the first video frame is transformed from the first time to an inserting frame time, a first predicted video frame corresponding to the inserting frame time is initially obtained, the second video frame is transformed from the second time to the inserting frame time, a second predicted video frame corresponding to the inserting frame time is initially obtained, and the inserting frame time is between the first time and the second time, so that a predicted video frame formed by reconstructing video frames at the front time and the rear time is obtained. And fusing the first predicted video frame and the second predicted video frame which are obtained at the two moments, so that the frame inserting video frame corresponding to the frame inserting moment can be accurately generated.

In one embodiment, transforming the first video frame from the first time to the frame inserting time to obtain the first predicted video frame corresponding to the frame inserting time includes:

performing spatial position mapping on the first video frame to obtain a first optical flow field from a first moment to a frame inserting moment; and based on the first optical flow field, converting the first video frame into a first predicted video frame corresponding to the frame inserting moment.

The first optical flow field refers to an optical flow field from a first moment to an interpolation moment of a first video frame.

Specifically, the computer device performs spatial location mapping on the first video frame to map the first video frame from a first time to an interpolation time, and forms a first optical flow field from the first time to the interpolation time.

Further, the computer device extracts a first video frame feature from the first video frame, and performs spatial location mapping based on the first video frame feature to map the first video frame from a first time to an interpolation time, so as to form a first optical flow field from the first time to the interpolation time.

And the computer equipment carries out video frame reconstruction at the frame inserting moment on the first video frame based on the first optical flow field to obtain a first predicted video frame corresponding to the frame inserting moment.

In this embodiment, the optical flow field can represent motion information of an object in an image, and perform spatial location mapping on a first video frame to obtain a first optical flow field from a first moment to a frame inserting moment, and the motion information of the object in the first video frame from the first moment to the frame inserting moment can be represented by the first optical flow field, so that based on the motion information from the first moment to the frame inserting moment, the first video frame is reconstructed, and a first predicted video frame corresponding to the frame inserting moment is accurately generated.

In one embodiment, transforming the second video frame from the second time to the frame inserting time to obtain the second predicted video frame corresponding to the frame inserting time includes:

acquiring a second optical flow field from a second moment to a frame inserting moment, which is obtained by performing spatial position mapping on a second video frame; and based on the second optical flow field, converting the second video frame into a second predicted video frame corresponding to the frame inserting moment.

The second optical flow field refers to an optical flow field from the second moment to the frame inserting moment of the second video frame.

Specifically, the computer device performs spatial location mapping on the second video frame to map the second video frame from the second time to the frame insertion time, and forms a second optical flow field from the second time to the frame insertion time.

Further, the computer device extracts a second video frame feature from the second video frame, and performs spatial location mapping based on the second video frame feature to map the second video frame from the second time to the frame insertion time, so as to form a second optical flow field from the second time to the frame insertion time.

And the computer equipment carries out video frame reconstruction at the frame inserting moment on the second video frame based on the second optical flow field to obtain a second predicted video frame corresponding to the frame inserting moment.

In this embodiment, the optical flow field can represent motion information of an object in the image, and perform spatial location mapping on the second video frame to obtain a second optical flow field from the second moment to the frame inserting moment, and can represent motion information of the object in the second video frame from the second moment to the frame inserting moment through the second optical flow field, so that the second video frame is reconstructed based on the motion information from the second moment to the frame inserting moment, and a second predicted video frame corresponding to the frame inserting moment is accurately generated.

In one embodiment, obtaining an interpolated optical flow field corresponding to an interpolated video frame includes: and merging the first optical flow field and the second optical flow field to obtain an inserted frame optical flow field corresponding to the inserted frame video frame.

Specifically, the computer device may perform fusion processing on the first optical flow field corresponding to the frame inserting time and the second optical flow field corresponding to the frame inserting time to obtain a frame inserting optical flow field corresponding to the frame inserting time, and use the frame inserting optical flow field corresponding to the frame inserting time as the frame inserting optical flow field corresponding to the frame inserting video frame.

In this embodiment, the weights of the first optical flow field and the second optical flow field may be obtained, and the first optical flow field and the second optical flow field are fused according to the weights of the first optical flow field and the second optical flow field, so as to obtain an interpolated optical flow field corresponding to the interpolated video frame.

As shown in fig. 6, the first optical flow field and the second optical flow field are fused, so that an interpolated optical flow field corresponding to the interpolated video frame can be obtained.

In this embodiment, the first optical flow field and the second optical flow field are fused, so that two optical flow fields at the time of frame insertion can be fused, and the frame insertion optical flow field corresponding to the frame insertion video frame can be obtained more accurately.

In one embodiment, performing video frame quality detection on an interpolated video frame based on a video frame feature to obtain a first video frame quality attribute corresponding to the interpolated video frame, including:

based on the video frame characteristics, performing video frame quality detection of at least one dimension on the frame-inserted video frame to obtain a video frame quality characterization quantity of at least one dimension corresponding to the frame-inserted video frame; the video frame quality metrics of at least one dimension include at least one of brightness, contrast, saturation, ambiguity, or resolution of the interpolated video frame; and fusing the video frame quality characterization quantity corresponding to at least one dimension of the frame inserting video frame to obtain a first video frame quality characterization quantity corresponding to the frame inserting video frame.

Wherein the at least one dimension includes at least one of a brightness dimension, a contrast dimension, a saturation dimension, a blur dimension, or a resolution dimension of the interpolated video frame.

Specifically, the computer device obtains at least one dimension preset for evaluating the quality of the video frames, including at least one of a brightness dimension, a contrast dimension, a saturation dimension, a ambiguity dimension, or a resolution dimension of the interpolated video frames.

And detecting the quality of the video frames in at least one dimension of brightness dimension, contrast dimension, saturation dimension, ambiguity dimension or resolution dimension based on the video frame characteristics, and obtaining the quality characterization quantity of the video frames in at least one dimension of brightness dimension, contrast dimension, saturation dimension, ambiguity dimension or resolution dimension corresponding to the frame frames.

When only one dimension exists, the video frame quality representation quantity of the inserted frame video frame corresponding to the dimension is used as a first video frame quality representation quantity of the inserted frame video frame. When a plurality of dimensions exist, fusing the video frame quality characterization quantities of the inserted frame video frames corresponding to the plurality of dimensions to form a first video frame quality characterization quantity corresponding to the inserted frame video frames.

In this embodiment, the video frame features may include at least one of a brightness feature, a contrast feature, a saturation feature, a blur feature, or a resolution feature of the interpolated video frame. Performing video frame quality detection of brightness dimension on the frame inserting video frame based on the brightness characteristics of the frame inserting video frame to obtain brightness corresponding to the frame inserting video frame; performing contrast dimension video frame quality detection on the frame-inserted video frame based on contrast characteristics of the frame-inserted video frame to obtain contrast corresponding to the frame-inserted video frame; performing saturation dimension video frame quality detection on the frame inserting video frame based on the saturation characteristics of the frame inserting video frame to obtain saturation corresponding to the frame inserting video frame; performing video frame quality detection of the ambiguity dimension on the frame inserting video frame based on the ambiguity feature of the frame inserting video frame to obtain the ambiguity corresponding to the frame inserting video frame; and detecting the quality of the video frames in the resolution dimension based on the resolution characteristics of the frame inserting video frames to obtain the resolution corresponding to the frame inserting video frames.

In this embodiment, based on the video frame characteristics, at least one dimension of video frame quality detection is performed on the interpolated video frame, so as to obtain a video frame quality representation quantity corresponding to at least one dimension of the interpolated video frame, where the video frame quality representation quantity of at least one dimension includes at least one of brightness, contrast, saturation, ambiguity or resolution of the interpolated video frame, so that qualities of the interpolated video frame in different dimensions can be considered, and thus, in combination with qualities in different dimensions, a first video frame quality representation quantity corresponding to the interpolated video frame is accurately calculated.

based on the video frame characteristics, carrying out local quality detection on at least one region of the frame-inserted video frame to obtain a quality characterization quantity corresponding to the at least one region of the frame-inserted video frame; and forming a first video frame quality characterization quantity corresponding to the frame inserting video frame according to the quality characterization quantity corresponding to the at least one region of the frame inserting video frame.

Specifically, the computer equipment divides the frame inserting video frame into a plurality of areas, and performs local quality detection on at least one area in the plurality of areas based on the video frame characteristics to obtain a quality characterization quantity corresponding to the at least one area. When only one region is subjected to local quality detection, the quality characterization quantity corresponding to the region is used as a first video frame quality characterization quantity corresponding to the frame inserting video frame. When the local quality detection is carried out on the plurality of areas, the quality characterization amounts corresponding to the plurality of areas are fused, and a first video frame quality characterization amount corresponding to the frame inserting video frame is formed.

Further, the local quality detection of the region may be based on the video frame feature, and the quality detection of at least one dimension of the region is performed to obtain a quality characterization quantity of the region corresponding to at least one dimension; the quality characterizing quantity of the at least one dimension includes at least one of brightness, contrast, saturation, ambiguity, or resolution of the region. And the computer equipment fuses the quality characterization quantity of the region corresponding to at least one dimension to obtain the quality characterization quantity corresponding to the region.

In this embodiment, based on the video frame characteristics, performing local quality detection on at least one region of the frame-inserted video frame to obtain a quality characterization quantity corresponding to the at least one region of the frame-inserted video frame, including:

performing main body identification on the inserted frame video frame based on the video frame characteristics to obtain a main body region in the inserted frame video frame; extracting main body characteristics of the main body region, and carrying out local quality detection on the main body region based on the main body characteristics to obtain a quality characterization quantity corresponding to the main body region; and taking the quality characterization quantity corresponding to the main body area as a first video frame quality characterization quantity corresponding to the frame inserting video frame.

In this embodiment, local quality detection is performed on at least one region of an interpolated video frame based on video frame features, so as to obtain quality characterization amounts corresponding to the at least one region of the interpolated video frame, so as to detect local quality of each region, and image quality of each local region in the interpolated video is considered, so that a first video frame quality characterization amount of the interpolated video frame can be reflected more accurately based on each local quality.

In one embodiment, based on the video frame characteristics, performing local quality detection on at least one region of the interpolated video frame to obtain a quality characterization quantity corresponding to the at least one region of the interpolated video frame, including:

Performing region identification on the frame inserting video frame based on the video frame characteristics to obtain a foreground region and a background region in the frame inserting video frame; according to foreground features of the foreground region, carrying out local quality detection on the foreground region to obtain a quality characterization quantity corresponding to the foreground region; and carrying out local quality detection on the background area according to the background characteristics of the background area to obtain a quality characterization quantity corresponding to the background area.

Wherein, in the video frame, the foreground is the scene which is positioned in front of the main body and is closest to the lens; the background is a scene located behind the subject relatively far from the lens. The foreground region is a region located in front of the subject, and the background region is a region located behind the subject.

Specifically, the computer equipment performs region identification on the frame-inserted video frame based on the video frame characteristics to obtain a foreground region in the frame-inserted video frame, and takes a region except the foreground region in the frame-inserted video frame as a background region.

The computer equipment can extract foreground features from the foreground region, and perform local quality detection on the foreground region based on the foreground features to obtain quality characterization quantities corresponding to the foreground region.

In this embodiment, the computer device may perform local quality detection on the foreground region based on the foreground region feature in the video frame features, to obtain a quality characterization quantity corresponding to the foreground region. The computer equipment can perform local quality detection on the background area based on the background area characteristics in the video frame characteristics to obtain the quality characterization quantity corresponding to the background area.

In this embodiment, the local quality detection may specifically be performed on the foreground area to perform quality detection of at least one dimension, so as to obtain a quality representation quantity of the foreground area corresponding to at least one dimension; the quality characterizing quantity of the at least one dimension includes at least one of brightness, contrast, saturation, ambiguity, or resolution of the foreground region. And the computer equipment fuses the quality characterization quantity of the foreground region corresponding to at least one dimension to obtain the quality characterization quantity corresponding to the foreground region.

The local quality detection is performed on the background area, specifically, the quality detection of at least one dimension is performed on the background area, and the quality characterization quantity of the background area corresponding to the at least one dimension is obtained; the quality characterizing quantity of the at least one dimension includes at least one of brightness, contrast, saturation, ambiguity, or resolution of the background region. And the computer equipment fuses the quality characterization quantity corresponding to at least one dimension of the background area to obtain the quality characterization quantity corresponding to the background area.

In this embodiment, region identification is performed on an interpolated video frame based on video frame features to obtain a foreground region and a background region in the interpolated video frame, local quality detection is performed on the foreground region according to the foreground features of the foreground region to obtain a quality characterization quantity corresponding to the foreground region, and local quality detection is performed on the background region according to the background features of the background region to obtain a quality characterization quantity corresponding to the background region, so that when quality evaluation of the video frame is performed, differences between the foreground and the background are considered, and therefore, global quality of the video frame determined based on respective qualities of the foreground region and the background region is more accurate.

As shown in fig. 7, an interpolated video frame 700 is region identified based on video frame characteristics, resulting in a foreground region 702 and a background region 704 in the interpolated video frame. The portrait area is a foreground area 702, and the areas other than the portrait area are background areas 704. And according to the foreground characteristics of the foreground region 702, carrying out local quality detection on the foreground region to obtain a quality characterization quantity corresponding to the foreground region 702. And according to the background characteristics of the background area 704, carrying out local quality detection on the background area 704 to obtain a quality characterization quantity corresponding to the background area. The quality characterizations of the foreground region 702 and the background region 704 are fused to obtain a first video frame quality characterizations of the interpolated video frame 700.

In one embodiment, extracting optical flow features based on the optical flow field of the interpolated frame, and performing video frame quality detection on the interpolated frame video frame according to the optical flow features to obtain a second video frame quality attribute corresponding to the interpolated frame video frame, including:

extracting optical flow characteristics based on the optical flow field of the inserted frame, and carrying out at least one type of video frame quality detection on the inserted frame video frame according to the optical flow characteristics to obtain at least one type of video frame quality characterization quantity corresponding to the inserted frame video frame; the at least one type of video frame quality characterizer includes at least one of a deformation, integrity, or contour sharpness of the interpolated video frame; and fusing the quality characterization quantity of the video frames corresponding to at least one type of video frames to obtain a second quality characterization quantity of the video frames corresponding to the frame frames.

Wherein the at least one type includes at least one of an amount of deformation type, an integrity type, or a contour sharpness type of the interpolated video frame.

Specifically, the computer device performs feature extraction on the interpolated optical flow field to obtain corresponding optical flow features. And detecting the quality of the video frames of at least one of the deformation type, the integrity type or the contour definition type according to the optical flow characteristics to obtain the quality characterization quantity of the video frames of at least one of the deformation type, the integrity type or the contour definition type corresponding to the inserted video frames.

For example, performing video frame quality detection of deformation type on the inserted frame video frame according to the optical flow characteristics to obtain the corresponding deformation of the inserted frame video frame; performing video frame quality detection of the integrity type on the inserted frame video frames according to the optical flow characteristics to obtain the corresponding integrity of the inserted frame video frames; and detecting the quality of the video frames with the contour definition type according to the optical flow characteristics to obtain the contour definition corresponding to the inserted video frames.

When only one type exists, the video frame quality representation quantity corresponding to the type of the frame inserting video frame is used as a second video frame quality representation quantity corresponding to the frame inserting video frame. And when multiple types exist, fusing the quality characterization quantities of the video frames corresponding to the multiple types respectively to form a second video frame quality characterization quantity corresponding to the frame inserting video frames.

In this embodiment, optical flow features are extracted based on the optical flow field of the frame to be inserted, and at least one type of video frame quality detection is performed on the frame to be inserted according to the optical flow features, so as to obtain at least one type of video frame quality characterization quantity corresponding to the frame to be inserted, where the at least one type of video frame quality characterization quantity includes at least one of deformation, integrity or contour sharpness of the frame to be inserted, so that qualities of the frame to be inserted on different types can be considered, and accordingly, a second video frame quality characterization quantity corresponding to the frame to be inserted is accurately calculated in combination with the qualities on different types.

In one embodiment, extracting optical flow features based on the optical flow field of the interpolated frame, and performing at least one type of video frame quality detection on the interpolated frame video frame according to the optical flow features to obtain a quality characterization of the interpolated frame video frame corresponding to the at least one type of video frame, including:

performing target identification on the frame inserting video frame to obtain a target object in the frame inserting video frame; and extracting optical flow characteristics based on the interpolated optical flow field, detecting the deformation amount of the target object according to the optical flow characteristics, and taking the deformation amount of the target object as the video frame quality characterization amount of the interpolated video frame.

In one embodiment, the method further comprises:

when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame meets the frame inserting quality condition, reserving the frame inserting video frame; and when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, correcting the frame inserting video frame according to the first video frame and the second video frame to obtain a corrected frame inserting video frame.

The frame inserting quality condition is a condition for screening the frame inserting video frames, and the frame inserting video frames meeting the frame inserting quality condition are good in representing quality and can be reserved in the video; the frame-inserted video frames which do not meet the frame-inserted quality condition have poor representation quality and can be deleted from the video or subjected to correction processing.

Specifically, the computer device may obtain an inter-frame quality condition, and match an inter-frame video frame quality corresponding to the inter-frame video frame with the inter-frame quality condition to determine whether the inter-frame video frame quality meets the inter-frame quality condition. When the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame meets the frame inserting quality condition, the frame inserting video frame is reserved in the video frame. And when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, correcting the frame inserting video frame according to the first video frame and the second video frame to obtain a corrected frame inserting video frame.

In this embodiment, when the quality characterization of the frame-inserted video frame corresponding to the frame-inserted video frame does not satisfy the frame-inserted quality condition, the frame-inserted video frame is deleted from the video.

For example, when the quality characterizer of the interpolated video frame corresponding to the interpolated video frame is greater than or equal to the quality characterizer threshold, retaining the inserted frame video frame; when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame is smaller than the quality characterization quantity threshold, correcting the frame inserting video frame according to the first video frame and the second video frame to obtain a corrected frame inserting video frame.

In this embodiment, when the quality characterization of the frame inserting video frame corresponding to the frame inserting video frame meets the frame inserting quality condition, the frame inserting video frame is reserved, and when the quality characterization of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, the frame inserting video frame is deleted, so that the video frame with poor frame inserting quality is deleted from the video. Or when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, correcting the frame inserting video frame according to the first video frame and the second video frame, so that the quality of the corrected frame inserting video frame can reach the frame inserting quality condition, and the transition connection between the frame inserting video frame and the first video frame and the second video frame is more natural and coordinated.

In one embodiment, as shown in fig. 8, a processing method of a video frame quality detection model is provided, and the method is applied to the computer device in fig. 1 (the computer device may be a terminal or a server in fig. 1) for explanation, and includes the following steps:

step S802, inserting sample inserting frame video frames between the first sample video frames and the second sample video frames according to the first sample video frames and the second sample video frames in the sample video.

The sample frame inserting video frame refers to an inserting image generated based on the first sample video frame and the second sample video frame.

In particular, the computer device may obtain a sample video that requires frame insertion and obtain a first sample video frame and a second sample video frame from the sample video.

In this embodiment, the computer device may acquire the adjacent first sample video frame and second sample video frame from the sample video, or may acquire the first sample video frame and second sample video frame that are not adjacent.

Step S804, extracting sample video frame characteristics from the sample frame inserting video frames, and carrying out video frame quality detection on the sample frame inserting video frames based on the sample video frame characteristics to obtain first predicted video frame quality characterization values corresponding to the sample frame inserting video frames.

Sample video frame features characterize key features in the interpolated video frame. The sample video frame features may include at least one of a brightness feature, a contrast feature, a saturation feature, a blur feature, or a resolution feature of the sample-interleaved video frame.

The first predicted video frame quality characterization quantity is a characterization quantity for characterizing the quality of the sample-interleaved video frames, and is obtained based on detection of the sample-interleaved video frames.

Specifically, the computer equipment performs feature extraction on the sample frame inserting video frame to obtain corresponding sample video frame features. The computer equipment performs video frame quality detection on the sample inserted frame video frame based on the sample video frame characteristics, and a first predicted video frame quality characterization quantity corresponding to the sample inserted frame video frame is obtained.

Step S806, a sample frame inserting optical flow field corresponding to the sample frame inserting video frame is obtained, sample optical flow characteristics are extracted based on the sample frame inserting optical flow field, video frame quality detection is carried out on the sample frame inserting video frame according to the sample optical flow characteristics, and a second predicted video frame quality representation corresponding to the sample frame inserting video frame is obtained.

The sample frame inserting optical flow field refers to an optical flow field corresponding to a sample frame inserting video frame.

The second predicted video frame quality characterization quantity is a characterization quantity for characterizing the quality of the interpolated video frame, and is obtained based on the interpolated optical flow field detection of the interpolated video frame.

The first predicted video frame quality characterization quantity and the second predicted video frame quality characterization quantity are characterization quantities for reflecting the global quality of video frames, namely, the overall quality of the sample frame inserting video frames.

Specifically, the computer equipment acquires a sample frame inserting optical flow field corresponding to a sample frame inserting video frame, and performs feature extraction on the sample frame inserting optical flow field to obtain corresponding sample optical flow features. And the computer equipment performs video frame quality detection on the sample inserted frame video frame according to the sample optical flow characteristics to obtain a second predicted video frame quality characterization quantity corresponding to the sample inserted frame video frame.

Step S808, obtaining a reference quality characterization quantity corresponding to the sample frame inserting video frame, and performing model training based on the difference between the first predicted video frame quality characterization quantity and the reference quality characterization quantity and the difference between the second predicted video frame quality characterization quantity and the reference quality characterization quantity to obtain a video frame processing model.

Specifically, the computer equipment acquires a reference quality characterization quantity corresponding to a sample frame inserting video frame from a sample video, and determines a difference between a first predicted video frame quality characterization quantity and the reference quality characterization quantity and a difference between a second predicted video frame quality characterization quantity and the reference quality characterization quantity. The computer equipment carries out model training according to the difference between the first predicted video frame quality characterization quantity and the reference quality characterization quantity and the difference between the second predicted video frame quality characterization quantity and the reference quality characterization quantity so as to adjust model parameters and continue training until the training stopping condition is met, and a video frame processing model is obtained.

The training stopping condition may be that the number of iterations of training reaches a preset number of iterations, the target loss of training reaches a minimum, etc., but is not limited thereto, and may be set according to requirements.

In this embodiment, the computer device determines a quality characterization quantity loss corresponding to the sample frame inserting video frame according to the difference between the quality characterization quantity of the first predicted video frame and the reference quality characterization quantity and the difference between the quality characterization quantity of the second predicted video frame and the reference quality characterization quantity, and performs model training based on the quality characterization quantity loss to obtain a video frame processing model.

In this embodiment, according to the first sample video frame and the second sample video frame in the sample video, the sample plug-in frame video frame is inserted between the first sample video frame and the second sample video frame, so that the plug-in frame is formed in the sample video, and the transition between the first sample video frame and the second sample video frame is more natural and coordinated. Sample video frame characteristics are extracted from sample frame inserting video frames, video frame quality detection is carried out on the sample frame inserting video frames based on the sample video frame characteristics, a first predicted video frame quality characterization quantity corresponding to the sample frame inserting video frames is obtained, and the quality of the video frames can be detected from the generated video frame content. The optical flow is used as a characteristic for describing the relative motion between frames, is not easily influenced by the quality of an original video frame, the sample optical flow field corresponding to the sample frame inserting video frame is obtained, the sample optical flow characteristic is extracted based on the sample frame inserting optical flow field, the sample frame inserting video frame is subjected to video frame quality detection according to the sample optical flow characteristic, the second predicted video frame quality representation quantity corresponding to the sample frame inserting video frame is obtained, and the quality of the generated video frame can be detected from the aspect of the motion information of the video frame. The method comprises the steps of obtaining a reference quality representation quantity corresponding to a sample frame-inserting video frame, performing model training based on the difference between a first predicted video frame quality representation quantity and the reference quality representation quantity and the difference between a second predicted video frame quality representation quantity and the reference quality representation quantity, and obtaining a video frame processing model, wherein the frame-inserting video frame quality representation quantity corresponding to the frame-inserting video frame can be accurately determined, and the quality of a frame-inserting result can be more accurately reflected.

In one embodiment, obtaining a reference quality attribute corresponding to a sample-inserted video frame includes:

acquiring a reference video frame corresponding to a sample frame inserting video frame from a sample video; and carrying out video frame quality detection on the sample frame inserting video frame based on the reference video frame to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

Wherein the reference video frame is a reference for the sample-interleaved video frame.

Specifically, the computer equipment obtains a sample video frame between a first sample video frame and a second sample video frame from the sample video as a reference video frame, and then obtains a reference video frame corresponding to the sample frame inserting video frame. And the computer equipment performs video frame quality detection on the sample frame inserting video frame based on the reference video frame to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

In this embodiment, according to a first sample video frame at a first sample time and a second sample video frame at a second sample time in a sample video, a sample frame inserting video frame is inserted at a sample frame inserting time; the sample frame inserting time is between the first sample time and the second sample time; the computer equipment acquires a sample video frame at the frame inserting moment from the sample video, and the sample video frame is used as a reference video frame corresponding to the sample frame inserting video frame.

In this embodiment, a reference video frame corresponding to a sample frame inserting video frame is obtained from a sample video, and the reference video frame is used as a reference, so that video frame quality detection is performed on the sample frame inserting video frame based on the reference video frame, and a reference quality characterization quantity corresponding to the sample frame inserting video frame is accurately determined.

In one embodiment, as shown in fig. 9, performing video frame quality detection on a sample frame inserting video frame based on a reference video frame to obtain a reference quality representation corresponding to the sample frame inserting video frame, including:

in step S902, a reference video clip is extracted from the sample video, where the reference video clip includes a first sample video frame, a second sample video frame, and a reference video frame.

Specifically, the computer device determines a first sample video frame, a second sample video frame, and a reference video frame in the sample video to determine a reference video clip containing the first sample video frame, the second sample video frame, and the reference video frame, thereby extracting the reference video clip from the sample video.

Step S904, generating a predicted video clip according to the first sample video frame, the second sample video frame and the sample interpolation video frame.

Specifically, the computer device splices the first sample video frame, the second sample video frame, and the sample plug-in video frame to form a predicted video clip.

In this embodiment, as shown in fig. 10, the reference video frame is in the reference video clip between the first sample video frame and the second sample video frame. Sample-interleaved video is in the predicted video segment between the first sample video frame and the second sample video frame.

Step S906, according to the reference video segment and the predicted video segment, performing video frame quality detection on the sample frame inserting video frame to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

Specifically, the computer equipment determines the similarity between a reference video frame and a sample frame inserting video frame according to the reference video fragment and the predicted video fragment; and obtaining the reference quality characterization quantity corresponding to the sample frame inserting video frame based on the similarity. Further, the higher the similarity, the higher the reference quality characterization quantity; the lower the similarity, the lower the reference quality characterization quantity.

In this embodiment, a reference video segment is extracted from a sample video, where the reference video segment includes a first sample video frame, a second sample video frame, and a reference video frame, and the reference segment is used as a reference for predicting the video segment. According to the first sample video frame, the second sample video frame and the sample inserted frame video frame, a predicted video segment is generated, and according to the reference video segment and the predicted video segment, video frame quality detection is carried out on the sample inserted frame video frame, so that the reference quality characterization quantity corresponding to the sample inserted frame video frame can be accurately detected based on the transition between the reference video frame and the other two frames in the reference segment and the comparison between the sample inserted frame video frame and the other two frames in the predicted video segment.

In one embodiment, a video frame processing method and a quality detection model processing method are provided, applied to a computer device, including:

training process:

and inserting sample inserting frame video frames between the first sample video frames and the second sample video frames according to the first sample video frames and the second sample video frames in the sample video.

Extracting sample video frame characteristics from the sample frame inserting video frames, and carrying out video frame quality detection on the sample frame inserting video frames based on the sample video frame characteristics to obtain first predicted video frame quality characterization quantities corresponding to the sample frame inserting video frames.

And acquiring a sample frame inserting optical flow field corresponding to the sample frame inserting video frame, extracting sample optical flow characteristics based on the sample frame inserting optical flow field, and carrying out video frame quality detection on the sample frame inserting video frame according to the sample optical flow characteristics to obtain a second predicted video frame quality characterization quantity corresponding to the sample frame inserting video frame.

Acquiring a reference video frame corresponding to a sample frame inserting video frame from a sample video; extracting a reference video segment from the sample video, the reference video segment comprising a first sample video frame, a second sample video frame and a reference video frame; generating a predicted video segment according to the first sample video frame, the second sample video frame and the sample interpolation video frame; and carrying out video frame quality detection on the sample frame inserting video frames according to the reference video fragments and the predicted video fragments to obtain reference quality characterization quantities corresponding to the sample frame inserting video frames.

Model training is carried out based on the difference between the first predicted video frame quality characterization quantity and the reference quality characterization quantity and the difference between the second predicted video frame quality characterization quantity and the reference quality characterization quantity, and a video frame processing model is obtained.

The application process comprises the following steps:

a first video frame at a first moment and a second video frame at a second moment in a video are acquired.

Performing spatial position mapping on the first video frame to obtain a first optical flow field from a first moment to a frame inserting moment; the frame inserting time is between the first time and the second time; and based on the first optical flow field, converting the first video frame into a first predicted video frame corresponding to the frame inserting moment.

Fusing the first predicted video frame and the second predicted video frame to obtain an inserted frame video frame corresponding to the inserted frame moment; and merging the first optical flow field and the second optical flow field to obtain an inserted frame optical flow field corresponding to the inserted frame video frame.

Extracting video frame characteristics from the inserted frame video frames, and carrying out video frame quality detection of at least one dimension on the inserted frame video frames based on the video frame characteristics to obtain video frame quality characterization quantities of at least one dimension corresponding to the inserted frame video frames; the video frame quality metrics of at least one dimension include at least one of brightness, contrast, saturation, ambiguity, or resolution of the interpolated video frame; and fusing the video frame quality characterization quantity corresponding to at least one dimension of the frame inserting video frame to obtain a first video frame quality characterization quantity corresponding to the frame inserting video frame.

Acquiring an optical flow field of an inserted frame corresponding to the inserted frame video frame, extracting optical flow characteristics based on the optical flow field of the inserted frame, and performing at least one type of video frame quality detection on the inserted frame video frame according to the optical flow characteristics to acquire at least one type of video frame quality characterization quantity of the inserted frame video frame; the at least one type of video frame quality characterizer includes at least one of a deformation, integrity, or contour sharpness of the interpolated video frame; and fusing the quality characterization quantity of the video frames corresponding to at least one type of video frames to obtain a second quality characterization quantity of the video frames corresponding to the frame frames.

In one embodiment, the video frame processing method can be applied to any scene applying video frame inserting technology, such as live video, video on demand, and the like, so as to insert frames into the video and detect the quality of the inserted frames in the video in real time. Further, the quality of the video may also be detected. The video interpolation (Video Frame Interpolation) is an image generation algorithm that outputs frames at intermediate times by inputting two consecutive frames of images. The video frame inserting method can be used for increasing the video frame rate and improving the fluency of video playing, and can also be used for linking similar fragments. Video interpolation may be applied to scenes such as frame rate boosting, segment linking, etc.

In one embodiment, an application scenario of a video frame processing method and a quality detection model processing method is provided, including:

in this embodiment, the frame interpolation algorithm is based on the frame interpolation of optical flow, and after two adjacent frames of video frames in the video are input into the network, optical flow information (i.e. optical flow field) from the middle time to the adjacent time is output through a series of stacked convolution layers. Two adjacent frames of video frames are a first video frame at a first moment and a second video frame at a second moment in the video, and the middle moment is the frame inserting moment. This optical flow information describes how video frames at adjacent moments are transformed to intermediate moments, so that the interpolated video frames at intermediate moments can be obtained by the following formula:

wherein I is ₀ For t in video ₀ Video frames of time of day, I ₁ For t in video ₁ Video frames of time; f (F) _t→0 Is I ₀ The optical flow field being shifted to intermediate instant t, i.e. from t ₀ Optical flow field F from moment to moment t _t→0 ；F _t→1 Is I ₁ Optical flow field shifted to intermediate instant t, i.e. from instant t to t ₁ Optical flow field F at time _t→1 The method comprises the steps of carrying out a first treatment on the surface of the w is the transform operation, M is the fusion coefficient for applying I ₀ And I ₁ The video frames transformed to the time t are fused together, and R is a residual error for enhancing high-frequency details in the fused image.

Specifically, based on the optical flow field F _t→0 Frame I of video ₀ Converting to a first predicted video frame at time tBased on the optical flow field F _t→1 Frame I of video ₁ Second predicted video frame converted to time t +.>First predictive video frame->Second predicted video frame of sum->Weighting fusion is carried out to obtain a final intermediate video frame +.>I.e., plug-in frames of video.

As shown in fig. 11, the quality detection model may include an image scoring model and an optical Flow scoring model, that is, an RGB scoring model and a Flow scoring model, and the specific processing procedure is as follows:

1) Collecting various types of sample video V_set;

2) Decoding the collected video into a video frame set i_set;

3) Sampling adjacent sample video frames I from the frequency frame set according to different intervals ₀ 、I ₁ ；

4) The frame inserting algorithm is operated to obtain an intermediate video frame mid_rgb and an intermediate optical flow field mid_flow which are output by the frame inserting algorithm; wherein mid_rgb is a 3-channel matrix; mid_flow is a matrix of 4 channels, the first 2 channels being I ₀ Optical flow field F shifted to intermediate instant t _t→0 The last 2 channels are I ₁ Optical flow field F shifted to intermediate instant t _t→1 The method comprises the steps of carrying out a first treatment on the surface of the The middle video frame mid_rgb is a sample frame inserting video frame, and the middle optical flow field mid_flow is a sample frame inserting optical flow field.

5) Will I ₀ Mid_rgb and I ₁ Is combined into a predictive video segment V_int and extracted from a sample video to contain I ₀ Reference frame and I ₁ Is included.

6) Independently watching the predicted video segment V_int and the reference video segment by m persons, and scoring the inserted frame video frames according to 1-10 scores; specifically, if the interpolated video frame is similar to the reference frame, the score gt_score of the interpolated video frame is higher, and if the interpolated video frame is dissimilar, the score is lower;

for example, the video frames are assembled with video frames 1, 2, 3, 4, 5, and when 1 and 5 are taken as input, the intermediate video frame of 1 and 5 is 3, then 3 is taken as the reference video frame; generating an interpolated video frame based on the 1 and 5 interpolated frames, in step 6), m persons watch the predicted video clip v_int and the reference video clip, score the interpolated video frame, and if the interpolated video frame is similar to the reference frame 3, the interpolated video frame score gt_score is higher.

7) Averaging the scores of m persons, and normalizing to obtain a score gt_score of the video frame of the section of the interpolated frame, namely a reference quality characterization quantity;

8) Taking mid_rgb as input of an RGB scoring model, the structure of the RGB scoring model can be any classification model, such as ResNet, denseNet, swin-Transformer and the like, and obtaining the rgb_score. Constraint is performed between gt_score and rgb_score by an L1 loss function, and the L1 loss function: l_rgb= |gt_score-rgb_score|;

9) Taking mid_flow as an input of a Flow scoring model, the structure of the Flow scoring model can be any classification model, such as ResNet (residual network), denseNet (densely connected convolutional network), swin-Transformer (visual transducer), and the like, so as to obtain a flow_score. Constraint is carried out between the gt_score and the flow_score through an L1 loss function, namely, l_flow= |gt_score-flow_score|;

10 The RGB scoring model and the Flow scoring model are trained by a random gradient descent algorithm (stochasticGradientDesceptin)), respectively. Specifically, for the RGB scoring model, calculating a loss value l_rgb of a corresponding score, if the loss value l_rgb is greater than a loss threshold value, adjusting parameters to continue training, and when the loss value l_rgb is less than or equal to the loss threshold value, stopping training to obtain the RGB scoring model. And calculating a loss value l_rgb of the corresponding score for the Flow scoring model, adjusting parameters to continue training if the loss value l_rgb is greater than a loss threshold, and stopping training when the loss value l_rgb is less than or equal to the loss threshold to obtain the Flow scoring model.

In practical application, the processing flow of the video frame processing method is shown in fig. 12, and includes multiple processing procedures such as video decoding, video frame insertion, scoring of frame insertion results, determining whether to retain frame insertion results according to frame insertion scoring results, video re-encoding, and result display.

Specifically, generating an intermediate-time frame inserting video frame through two adjacent frame video frames, and an inserting optical flow field corresponding to the intermediate-time frame inserting video frame; inputting the frame inserting video frame into an RGB scoring model to obtain the scoring output by the RGB scoring model. Inputting the interpolated optical Flow field into a Flow scoring model to obtain a corresponding score.

The final interpolated frame quality score is obtained by weighting and fusing the scoring output by the RGB scoring model and the scoring output by the Flow scoring model, and the fused formula is as follows: score=a rgb_score+ (1-a) flow_score, a is the fusion coefficient, i.e. weight; the fusion coefficient can be adjusted according to actual conditions.

And judging the quality of the frame inserting result according to the score, so as to judge whether the frame inserting result needs to be reserved in the video, and finally recoding and displaying the result, so that the final video quality is better.

It can be understood that the network structure and the layer number of the RGB scoring model and the Flow scoring model can be adjusted according to the requirement.

In this embodiment, a mode of mixing an RGB video frame and an optical flow is used as input, and the optical flow is used as a feature for describing the relative motion between frames, which is not easily affected by the quality of the original video frame, so that the quality of the frame interpolation result can be reflected more accurately. Specifically, the interpolation result is generated by the interpolation algorithm, and by analyzing the interpolation result, the interpolation result can be further optimized. The good quality and good effect of the inserted frame results are reserved, and the poor quality and poor effect of the inserted frame results are generated, so that the influence on the overall appearance of the final video is avoided.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a video frame processing device for realizing the video frame processing method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the video frame processing apparatus provided below may be referred to the limitation of the video frame processing method hereinabove, and will not be described herein.

In one embodiment, as shown in fig. 13, there is provided a video frame processing apparatus 1300 comprising: an insertion module 1302, a first detection module 1304, a second detection module 1306, and a fusion module 1308, wherein:

the inserting module 1302 is configured to insert an inserting frame video frame between a first video frame and a second video frame according to the first video frame and the second video frame in the video.

The first detection module 1304 is configured to extract a video frame feature from the frame-inserted video frame, perform video frame quality detection on the frame-inserted video frame based on the video frame feature, and obtain a first video frame quality representation corresponding to the frame-inserted video frame.

The second detection module 1306 is configured to obtain an optical flow field of the frame corresponding to the frame-inserted video frame, extract optical flow features based on the optical flow field of the frame-inserted video frame, and perform video frame quality detection on the frame-inserted video frame according to the optical flow features to obtain a second video frame quality representation corresponding to the frame-inserted video frame.

The fusion module 1308 is configured to fuse the first video frame quality representation and the second video frame quality representation to obtain an interpolated video frame quality representation corresponding to the interpolated video frame.

In one embodiment, the inserting module 1302 is further configured to obtain a first video frame at a first time and a second video frame at a second time in the video; transforming the first video frame from the first moment to the frame inserting moment to obtain a first predicted video frame corresponding to the frame inserting moment; the frame inserting time is between the first time and the second time; transforming the second video frame from the second moment to the frame inserting moment to obtain a second predicted video frame corresponding to the frame inserting moment; and fusing the first predicted video frame and the second predicted video frame to obtain an inserting frame video frame corresponding to the inserting frame moment.

In one embodiment, the inserting module 1302 is further configured to perform spatial location mapping on the first video frame to obtain a first optical flow field from the first time to the frame inserting time; and based on the first optical flow field, converting the first video frame into a first predicted video frame corresponding to the frame inserting moment.

In one embodiment, the inserting module 1302 is further configured to obtain a second optical flow field from a second time to a frame inserting time obtained by performing spatial location mapping on a second video frame; and based on the second optical flow field, converting the second video frame into a second predicted video frame corresponding to the frame inserting moment.

In one embodiment, the second detection module 1306 is further configured to fuse the first optical flow field and the second optical flow field to obtain an interpolated optical flow field corresponding to the interpolated video frame.

In one embodiment, the first detection module 1304 is further configured to perform video frame quality detection of at least one dimension on the interpolated video frame based on the video frame feature, to obtain a video frame quality attribute corresponding to the interpolated video frame in at least one dimension; the video frame quality metrics of at least one dimension include at least one of brightness, contrast, saturation, ambiguity, or resolution of the interpolated video frame; and fusing the video frame quality characterization quantity corresponding to at least one dimension of the frame inserting video frame to obtain a first video frame quality characterization quantity corresponding to the frame inserting video frame.

In one embodiment, the first detection module 1304 is further configured to perform local quality detection on at least one area of the interpolated video frame based on the video frame feature, to obtain a quality attribute corresponding to the at least one area of the interpolated video frame; and forming a first video frame quality characterization quantity corresponding to the frame inserting video frame according to the quality characterization quantity corresponding to the at least one region of the frame inserting video frame.

In one embodiment, the first detection module 1304 is further configured to perform region identification on the interpolated video frame based on the video frame feature, so as to obtain a foreground region and a background region in the interpolated video frame;

according to foreground features of the foreground region, carrying out local quality detection on the foreground region to obtain a quality characterization quantity corresponding to the foreground region; and carrying out local quality detection on the background area according to the background characteristics of the background area to obtain a quality characterization quantity corresponding to the background area.

In one embodiment, the second detection module 1306 is configured to extract optical flow features based on the optical flow field of the frame, and perform at least one type of video frame quality detection on the frame-inserted video frame according to the optical flow features, so as to obtain a quality representation of the frame-inserted video frame corresponding to at least one type of video frame; the at least one type of video frame quality characterizer includes at least one of a deformation, integrity, or contour sharpness of the interpolated video frame; and fusing the quality characterization quantity of the video frames corresponding to at least one type of video frames to obtain a second quality characterization quantity of the video frames corresponding to the frame frames.

In one embodiment, the apparatus further comprises a correction module; the correction module is used for reserving the frame inserting video frames when the frame inserting video frame quality characterization quantity corresponding to the frame inserting video frames meets the frame inserting quality condition; and when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, correcting the frame inserting video frame according to the first video frame and the second video frame to obtain a corrected frame inserting video frame.

In one embodiment, as shown in fig. 14, there is provided a quality detection model processing apparatus 1400 comprising: a sample insertion module 1402, a first sample detection module 1404, a second sample detection module 1406, and a training module 1408, wherein:

the sample inserting module 1402 is configured to insert a sample inserting frame video frame between a first sample video frame and a second sample video frame according to the first sample video frame and the second sample video frame in the sample video.

The first sample detection module 1404 is configured to extract a sample video frame feature from a sample frame-inserted video frame, perform video frame quality detection on the sample frame-inserted video frame based on the sample video frame feature, and obtain a first predicted video frame quality representation corresponding to the sample frame-inserted video frame.

The second sample detection module 1406 is configured to obtain a sample frame inserting optical flow field corresponding to the sample frame inserting video frame, extract sample optical flow features based on the sample frame inserting optical flow field, and perform video frame quality detection on the sample frame inserting video frame according to the sample optical flow features, so as to obtain a second predicted video frame quality characterization corresponding to the sample frame inserting video frame.

The training module 1408 is configured to obtain a reference quality attribute corresponding to the sample frame inserted video frame, perform model training based on a difference between the first predicted video frame quality attribute and the reference quality attribute, and a difference between the second predicted video frame quality attribute and the reference quality attribute, and obtain a video frame processing model.

In one embodiment, the training module 1408 is further configured to obtain, from the sample video, a reference video frame corresponding to the sample frame insertion video frame; and carrying out video frame quality detection on the sample frame inserting video frame based on the reference video frame to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

In one embodiment, training module 1408 is further configured to extract a reference video clip from the sample video, the reference video clip including a first sample video frame, a second sample video frame, and a reference video frame; generating a predicted video segment according to the first sample video frame, the second sample video frame and the sample interpolation video frame; and carrying out video frame quality detection on the sample frame inserting video frames according to the reference video fragments and the predicted video fragments to obtain reference quality characterization quantities corresponding to the sample frame inserting video frames.

The above-mentioned video frame processing apparatus, and the respective modules in the processing apparatus of the video frame quality detection model may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 15. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a video frame processing method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 15 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile memory may include Read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high density embedded nonvolatile memory, resistive random access memory (ReRAM), magnetic random access memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric memory (Ferroelectric Random Access Memory, FRAM), phase change memory (Phase Change Memory, PCM), graphene memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of video frame processing, the method comprising:

2. The method of claim 1, wherein inserting an interpolated video frame between a first video frame and a second video frame in the video based on the first video frame and the second video frame, comprises:

acquiring a first video frame at a first moment and a second video frame at a second moment in a video;

transforming the first video frame from the first time to a frame inserting time to obtain a first predicted video frame corresponding to the frame inserting time; the frame inserting time is between a first time and the second time;

transforming the second video frame from the second time to the frame inserting time to obtain a second predicted video frame corresponding to the frame inserting time;

And fusing the first predicted video frame and the second predicted video frame to obtain an inserted frame video frame corresponding to the inserted frame moment.

3. The method of claim 2, wherein said transforming the first video frame from the first time instance to an interpolation time instance to obtain a first predicted video frame corresponding to the interpolation time instance comprises:

performing spatial position mapping on the first video frame to obtain a first optical flow field from the first moment to the frame inserting moment;

and based on the first optical flow field, converting the first video frame into a first predicted video frame corresponding to the frame inserting moment.

4. The method of claim 3, wherein said transforming the second video frame from the second time instance to the interpolation time instance to obtain a second predicted video frame corresponding to the interpolation time instance comprises:

acquiring a second optical flow field from the second moment to the frame inserting moment, which is obtained by performing spatial position mapping on the second video frame;

and based on the second optical flow field, converting the second video frame into a second predicted video frame corresponding to the frame inserting moment.

5. The method of claim 4, wherein the acquiring the interpolated optical flow field corresponding to the interpolated video frame comprises:

And fusing the first optical flow field and the second optical flow field to obtain an inserted frame optical flow field corresponding to the inserted frame video frame.

6. The method according to claim 1, wherein the performing video frame quality detection on the interpolated video frame based on the video frame feature to obtain the first video frame quality token corresponding to the interpolated video frame includes:

based on the video frame characteristics, detecting the quality of the video frames of at least one dimension of the frame inserting video frames, and obtaining the quality characterization quantity of the video frames of at least one dimension corresponding to the frame inserting video frames; the video frame quality characterization of the at least one dimension includes at least one of brightness, contrast, saturation, ambiguity, or resolution of the interpolated video frame;

and fusing the video frame quality characterization quantity corresponding to at least one dimension of the frame inserting video frame to obtain a first video frame quality characterization quantity corresponding to the frame inserting video frame.

7. The method according to claim 1, wherein the performing video frame quality detection on the interpolated video frame based on the video frame feature to obtain the first video frame quality token corresponding to the interpolated video frame includes:

Based on the video frame characteristics, carrying out local quality detection on at least one region of the frame-inserted video frame to obtain a quality characterization quantity corresponding to the at least one region of the frame-inserted video frame;

and forming a first video frame quality representation corresponding to the frame inserting video frame according to the quality representation corresponding to the at least one region of the frame inserting video frame.

8. The method according to claim 7, wherein the performing local quality detection on at least one region of the interpolated video frame based on the video frame feature to obtain a quality characterization corresponding to the at least one region of the interpolated video frame includes:

performing region identification on the frame inserting video frame based on the video frame characteristics to obtain a foreground region and a background region in the frame inserting video frame;

according to the foreground characteristics of the foreground region, carrying out local quality detection on the foreground region to obtain a quality characterization quantity corresponding to the foreground region;

and carrying out local quality detection on the background region according to the background characteristics of the background region to obtain a quality characterization quantity corresponding to the background region.

9. The method of claim 1, wherein the extracting optical flow features based on the interpolated optical flow field and performing video frame quality detection on the interpolated video frame according to the optical flow features to obtain a second video frame quality representation corresponding to the interpolated video frame comprises:

Extracting optical flow characteristics based on the frame-inserted optical flow field, and carrying out at least one type of video frame quality detection on the frame-inserted video frames according to the optical flow characteristics to obtain at least one type of video frame quality characterization quantity corresponding to the frame-inserted video frames; the at least one type of video frame quality attribute comprises at least one of a deformation, an integrity, or a contour sharpness of the interpolated video frame;

and fusing the quality characterization quantity of the video frames corresponding to the at least one type of video frames to obtain a second quality characterization quantity of the video frames corresponding to the frame frames.

10. The method according to any one of claims 1 to 9, further comprising:

when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame meets the frame inserting quality condition, reserving the frame inserting video frame;

and when the quality characterization quantity of the frame inserting video frame corresponding to the frame inserting video frame does not meet the frame inserting quality condition, correcting the frame inserting video frame according to the first video frame and the second video frame to obtain a corrected frame inserting video frame.

11. A quality inspection model processing method, the method comprising:

12. The method of claim 11, wherein the obtaining the reference quality characterizer for the sample-interleaved video frame comprises:

Acquiring a reference video frame corresponding to the sample frame inserting video frame from the sample video;

and detecting the video frame quality of the sample frame inserting video frame based on the reference video frame to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

13. The method of claim 12, wherein the performing video frame quality detection on the sample-inserted video frame based on the reference video frame to obtain the reference quality representation corresponding to the sample-inserted video frame comprises:

extracting a reference video segment from the sample video, the reference video segment comprising the first sample video frame, the second sample video frame, and the reference video frame;

generating a predicted video segment according to the first sample video frame, the second sample video frame and the sample interpolation video frame;

and detecting the video frame quality of the sample frame inserting video frame according to the reference video fragment and the predicted video fragment to obtain a reference quality characterization quantity corresponding to the sample frame inserting video frame.

14. A video frame processing apparatus, the apparatus comprising:

15. A quality inspection model processing apparatus, the apparatus comprising:

16. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 13 when the computer program is executed.

17. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 13.

18. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 13.