CN116546203A

CN116546203A - Video frame processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN116546203A
Application number: CN202310457087.4A
Authority: CN
Inventors: 刘晶; 谷嘉文; 黄博; 钟婷婷; 肖君实; 邵宇超; 刘何为; 闻兴
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-08-04

Abstract

The application relates to a video frame processing method, a device, electronic equipment and a storage medium, which belong to the technical field of video coding, and the method comprises the following steps: and determining an I frame from the first video frame determined based on the preset interval and at least one second video frame after the first video frame, wherein if any second video frame is subjected to scene switching based on the first video frame, the second video frame is determined to be the I frame, otherwise, the first video frame is determined to be the I frame. In the method, the I frame is determined from the first video frame and the second video frame nearby the first video frame based on whether scene switching occurs, and the difference information between the video frames before and after the scene switching occurs is larger, so that the difference information between the video frames after the I frame and the I frame can be reduced based on whether the scene switching occurs, and the data volume after the compression of the video frames is further reduced, thereby improving the compression rate of the video.

Description

Video frame processing method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of video encoding technologies, and in particular, to a video frame processing method, a device, an electronic apparatus, and a readable storage medium.

Background

In video coding techniques, the types of video frames include intra-coded frames (I-frames), forward predicted frames (P-frames), and bi-directionally predicted frames (B-frames). The I frame reserves a complete picture when encoding, and the complete picture can be reconstructed only by the frame data when decoding, so that other video frames do not need to be referred to; the P frame only keeps the difference information between the current frame and the previous I frame or P frame when in coding, and the I frame needs to be referred when in decoding; the B frame reserves the difference information between the present frame and the previous I frame (or P frame) and the next I frame (or P frame) when encoding, and the I frame and the P frame need to be referred to when decoding. Thus, I frames are key frames in video coding, as well as reference frames for P and B frames, and the compression rate of I frames is low, so determining an appropriate I frame at the time of video coding is particularly important for increasing the compression rate of video.

Disclosure of Invention

The application provides a video frame processing method, a video frame processing device, electronic equipment and a readable storage medium, which can improve the compression rate of video. The technical scheme of the application is as follows.

According to a first aspect of embodiments of the present application, there is provided a video frame processing method, including:

determining a candidate frame set in the target video, wherein the candidate frame set comprises a first video frame determined based on a preset interval and at least one second video frame positioned after the first video frame;

If a target video frame exists in the candidate frame set, determining the target video frame as an I frame, wherein the target video frame is a video frame with scene switching based on the first video frame;

if the target video frame does not exist in the candidate frame set, the first video frame is determined to be an I frame.

In the method, the terminal determines an I frame from a first video frame determined based on a preset interval and at least one second video frame after the first video frame, wherein if any one of the second video frames is subjected to scene switching based on the first video frame, the terminal determines the second video frame as the I frame, otherwise, the terminal determines the first video frame as the I frame. The terminal determines the I frame from the first video frame and the second video frame nearby the first video frame according to whether scene switching occurs, and the difference information between the video frames before and after the scene switching occurs is larger, so that the difference information between the video frames after the I frame and the I frame can be reduced according to whether the scene switching occurs, the data volume after the video frame compression is further reduced, and the compression rate of the video is improved.

In one possible implementation, determining the set of candidate frames in the target video includes any one of:

Determining a first video frame based on a preset interval, and determining at least one video frame after the first video frame as a second video frame;

a plurality of video frames located in a target position range based on a preset interval are determined, a first video frame of the plurality of video frames is determined as a first video frame, remaining video frames other than the first video frame of the plurality of video frames are determined as a second video frame, and a start point of the target position range is determined based on the preset interval.

In one possible embodiment, the method further comprises:

determining intra-prediction costs of each second video frame and inter-prediction costs between each second video frame and the first video frame;

if a target video frame exists in the candidate frame set, determining the target video frame as an I frame comprises the following steps:

and if the ratio of the intra-frame prediction cost to the inter-frame prediction cost of any second video frame is smaller than the first threshold value, determining the second video frame as an I frame.

In the above method, if the ratio of the intra-frame prediction cost and the inter-frame prediction cost of any second video frame is smaller than the first threshold, it is indicated that the second video frame is subjected to scene switching based on the first video frame, and because the difference information between the video frames before and after the scene switching is larger, and the I frame is the reference frame of each video frame between the I frame and the next I frame, the second video frame is determined as the I frame, so that the compression rate of the target video can be improved, the code rate is reduced, and the transmission resources are saved.

In one possible embodiment, the method further comprises:

determining a color histogram of each video frame in the candidate frame set;

determining difference information of each second video frame and the first video frame based on the color histogram;

and if the difference information of any second video frame and the first video frame is larger than a second threshold value, determining the second video frame as an I frame.

In the method, the difference of the color histograms is compared to measure the difference of the two video frames on the frequency distribution of the colors, so as to determine whether the video frames are subjected to scene switching.

In one possible implementation, the preset interval is determined based on a service type corresponding to the target video.

In the method, the terminal determines the preset interval based on the service type corresponding to the target video, and the preset intervals of the target video under different service types are different, so that the requirements of different service types on the target video can be flexibly met.

In one possible implementation, the number of second video frames in the candidate frame set is determined based on the content type of the target video.

In the method, the terminal determines the number of the second video frames based on the content type of the target video, which is favorable for reducing the number of scenes corresponding to each video frame in the candidate frame set, and determines the second video frame corresponding to the first scene switching in the candidate frame set as the I frame, so that the effectiveness of the determined I frame can be improved, the compression rate of the target video is improved, the code rate is reduced, and the transmission resources are saved.

According to a second aspect of embodiments of the present application, there is provided a video frame processing apparatus, the apparatus comprising:

a candidate frame set determination unit configured to perform determination of a candidate frame set in a target video, the candidate frame set including a first video frame determined based on a preset interval and at least one second video frame located after the first video frame;

an I-frame determining unit configured to perform determining a target video frame as an I-frame if there is the target video frame in the candidate frame set, the target video frame being a video frame in which scene switching occurs based on the first video frame;

the I-frame determination unit is further configured to perform determining the first video frame as an I-frame if the target video frame does not exist in the candidate frame set.

In a possible implementation manner, the candidate frame set determining unit is configured to perform any one of the following:

In one possible embodiment, the apparatus further comprises:

a target video frame determination unit configured to perform determination of intra prediction costs of the respective second video frames and inter prediction costs between the respective second video frames and the first video frames;

the I-frame determination unit is further configured to perform:

In one possible embodiment, the apparatus further comprises:

a color histogram determination unit configured to perform determination of color histograms of respective video frames in the candidate frame set;

A difference information determining unit configured to perform determination of difference information of each of the second video frames and the first video frame based on the color histogram;

the I-frame determination unit is further configured to perform:

According to a third aspect of embodiments of the present application, there is provided an electronic device including:

one or more processors;

a memory for storing the processor-executable program code;

wherein the processor is configured to execute the program code to implement the video frame processing method provided in the first aspect or any possible implementation manner of the first aspect.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored therein a program for executing a computer: the at least one computer program, when executed by a processor of an electronic device, enables the electronic device to perform the video frame processing method provided by the first aspect or any of the possible implementation manners of the first aspect.

According to a fifth aspect of embodiments of the present application, there is provided a computer program product comprising one or more computer programs executable by one or more processors of an electronic device, such that the electronic device is capable of performing the video frame processing method provided in the first aspect or any of the possible implementations of the first aspect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application and do not constitute an undue limitation on the application.

Fig. 1 is a schematic diagram of an implementation environment of a video frame processing method according to an embodiment of the present application;

fig. 2 is a flowchart of a video frame processing method according to an embodiment of the present application;

fig. 3 is a flowchart of a video frame processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a video frame processing method according to an embodiment of the present application;

fig. 5 is a flowchart of a video frame processing method according to an embodiment of the present application;

Fig. 6 is a block diagram of a video frame processing apparatus according to an embodiment of the present application;

fig. 7 is a block diagram of a terminal according to an embodiment of the present application;

fig. 8 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of methods and systems that are consistent with aspects of the present application, as detailed in the accompanying claims.

The terms "first," "second," and the like in this application are used to distinguish between identical or similar items that have substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the "first," "second," and "nth" terms, nor is it limited to the number or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms.

These terms are only used to distinguish one element from another element. For example, the first person threshold can be referred to as a second threshold, and similarly, the second threshold can also be referred to as a first threshold, without departing from the scope of the various examples.

Wherein "at least one" means one or more, for example, at least one second video frame may be an integer number of second video frames of any one or more of one second video frame, two second video frames, three second video frames, and the like. And "plurality" means two or more, for example, the plurality of video frames may be two video frames, three video frames, or any integer number of video frames greater than or equal to two.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the target video referred to in this application is acquired with sufficient authorization.

In order to facilitate understanding of the content of the embodiments of the present application, several technical terms referred to in the embodiments of the present application are explained below.

Intra prediction: intra-frame prediction refers to predicting uncoded pixels in a video frame by coded pixels in the video frame to remove redundant information in the same video frame and improve the compression rate of the video frame.

Inter prediction: inter-frame prediction refers to predicting an uncoded video frame through a coded video frame to remove redundant information among different video frames and improve the compression rate of the video.

Intra prediction cost (intra cost): the intra-prediction cost indicates the computational cost expended for intra-prediction.

Inter prediction cost (inter): the inter-prediction cost indicates the computational cost expended for inter-prediction.

The following describes an implementation environment of an embodiment of the present application.

Fig. 1 is a schematic diagram of an implementation environment of a video frame processing method according to an embodiment of the present application, as shown in fig. 1, where the implementation environment includes: a terminal 101 and a server 102. The terminal 101 can be connected to the server 102 through a wireless network or a wired network.

The terminal 101 may be at least one of a smart phone, a smart watch, a desktop computer, a portable computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, a laptop portable computer, etc., the terminal 101 has a communication function, may access the internet, and the terminal 101 may refer to one of a plurality of terminals, which is only exemplified by the terminal 101 in this embodiment. Those skilled in the art will recognize that the number of terminals may be greater or lesser. The terminal 101 is illustratively running an application that provides a video encoding function by which the application can determine I-frames in the target video and further encode other video frames in the target video with the I-frames as reference frames to compress the target video. The application may be a video encoder class application (e.g., X265 video encoder, X264 video encoder, etc.), a video processing class application, an audiovisual class application, a conference class application, a social class application, etc., without limitation. In some embodiments, the terminal 101 sends the target video to the server 102, the server 102 determines an I frame in the target video, and the server 102 further uses the I frame as a reference frame to encode other video frames in the target video to compress the target video, which is not limited in the embodiments of the present application.

The server 102 may be an independent physical server, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like. The server 102 is configured to provide background services for applications running in the terminal 101, e.g., the server 102 is capable of providing object account login services for applications running in the terminal 101, etc. In some embodiments, the server 102 receives the target video sent by the terminal 101, determines an I frame in the target video, further uses the I frame as a reference frame, encodes other video frames in the target video, compresses the target video, and sends the compressed target video to the terminal 101, which is not limited in this embodiment of the present application.

In some embodiments, the wired network or the wireless network uses standard communication techniques and/or protocols. The network is typically the internet, but can be any network including, but not limited to, a LAN (Local Area Network ), MAN (Metropolitan Area Network, metropolitan area network), WAN (Wide Area Network ), a mobile, wired or wireless network, a private network, or any combination of virtual private networks. In some embodiments, the data exchanged over the network is represented using techniques and/or formats including HTML (Hyper Text Markup Language ), XML (Extensible Markup Language, extensible markup language), and the like. In addition, all or some of the links can be encrypted using conventional encryption techniques such as SSL (Secure Socket Layer ), TLS (Transport Layer Security, transport layer security), VPN (Virtual Private Network ), IPsec (Internet Protocol Security, internet protocol security), etc. In other embodiments, custom and/or dedicated data communication techniques can also be used in place of or in addition to the data communication techniques described above.

The implementation environment of the embodiment of the present application is described above, and a video frame processing method provided by the embodiment of the present application is described below. Fig. 2 is a flowchart of a video frame processing method according to an embodiment of the present application, as shown in fig. 2, and the method is implemented by a terminal as an example, and the method includes the following steps 201 to 203.

In step 201, the terminal determines a set of candidate frames in a target video, the set of candidate frames including a first video frame determined based on a preset interval and at least one second video frame located after the first video frame.

The target video is a video to be encoded, and comprises a plurality of video frames. One video frame in each candidate frame set is determined to be an I-frame, and the candidate frame set is a set of candidate I-frames. The first video frame in the candidate frame set is determined based on a preset interval (keyint), which refers to a preset number of video frames separated between two adjacent I frames. The second video frame in the candidate frame set refers to at least one video frame located after the first video frame. The preset interval and the number of the second video frames in each candidate frame set may be set according to actual requirements, which is not limited in the embodiment of the present application. The process of determining the candidate frame set in the target video by the terminal will be described in detail in the following embodiments, which will not be described herein.

In step 202, if a target video frame exists in the candidate frame set, the terminal determines the target video frame as an I frame, where the target video frame is a video frame in which scene switching occurs based on the first video frame.

The difference information between each video frame before scene switching in the candidate frame set is smaller than the difference information between two video frames before and after scene switching, and because the I frame is a reference frame of each video frame between the I frame and the next I frame, the difference information between each video frame between the I frame and the next I frame, which needs to be reserved when encoding, is smaller when the target video frame is determined to be the I frame compared with the first video frame, is determined to be the I frame, so that the compression rate of each video frame between the I frame and the next I frame can be improved, the compression rate of the target video is further improved, the code rate is reduced, and transmission resources are saved.

In some embodiments, the terminal determines the target video frame as an I frame, that is, the terminal records the frame number, the frame type (I frame) and other coding parameters of the target video frame, and in the subsequent coding process, the terminal queries the coding parameters of the target video frame to obtain the frame type of the target video frame, so as to code the target video frame according to the coding mode of the I frame. In other embodiments, the terminal marks the target video frame, the mark indicating a frame type (I-frame) of the target video frame, and during subsequent encoding, the terminal encodes the target video frame in an I-frame encoding manner based on the mark. The mark may be carried by at least one pixel of the target video frame, and the form of the mark may be a symbol, a number, a pattern, or the like, which is not limited in the carrying manner and form of the mark in the embodiments of the present application. It should be noted that, the above description of the process of determining the target video frame as the I frame by the terminal is only exemplary, and the process of determining the target video frame as the I frame may be determined according to actual requirements, which is not limited in the embodiment of the present application.

In step 203, if the target video frame does not exist in the candidate frame set, the terminal determines the first video frame as an I frame.

Wherein, corresponding to step 202, no target video frame exists in the candidate frame set, which means that each second video frame in the candidate frame set does not undergo scene switching based on the first video frame, and difference information between each second video frame and the first video frame is smaller, in which case, the first video frame is determined as an I frame. In some embodiments, the player displays the scale of the progress bar based on the position of the I frame in the target video, when the target video is played, the player responds to the drag operation on the progress bar, and starts playing the target video from one I frame before the video frame corresponding to the drag operation, because the first video frame is determined based on the preset interval, the first video frame is determined to be the I frame, the player can display the scale of the progress bar conveniently, and the response speed of the player to the drag operation is improved.

The process of determining the first video frame as the I frame by the terminal is the same as the process of determining the target video as the I frame by the terminal in step 202, and will not be described again.

In the method, the terminal determines an I frame from a first video frame determined based on a preset interval and at least one second video frame after the first video frame, wherein if any one of the second video frames is subjected to scene switching based on the first video frame, the terminal determines the second video frame as the I frame, otherwise, the terminal determines the first video frame as the I frame. The terminal determines the I frame from the first video frame and the second video frame nearby the first video frame according to whether scene switching occurs, and determines the I frame according to whether scene switching occurs or not according to whether the difference information between the video frames before and after the scene switching occurs, so that the difference information between the video frames after the I frame can be reduced, the data volume after the video frame compression is further reduced, and the compression rate of the video is improved.

The above-mentioned fig. 2 is only a basic flow of the embodiment of the present application, and a detailed description of the specific flow of the embodiment of the present application is described below. In the embodiment of the present application, the method for determining whether the target video frame exists in the candidate frame set by the terminal includes: the determination is made based on intra-prediction costs and inter-prediction costs of the video frames, and the determination is made based on color histograms of the video frames. The following describes the specific flow of an embodiment of determining a target video frame in the two ways.

First, description will be given taking, as an example, a determination of a target video frame based on intra-prediction costs and inter-prediction costs of the video frame. Fig. 3 is a flowchart of a video frame processing method according to an embodiment of the present application, as shown in fig. 3, and the method is implemented by a terminal as an example, and the method includes the following steps 301 to 305.

In step 301, the terminal determines a preset interval based on the service type corresponding to the target video.

The size of the preset interval determines the number of I frames in the target video, and the number of the I frames determines the compression rate and the distortion degree of the target video, so that the larger the preset interval is, the more the number of the I frames in the target video is, the lower the compression rate of the target video is, and the lower the distortion degree is; the larger the preset interval is, the smaller the number of I frames in the target video is, the higher the compression rate of the target video is, and the higher the distortion degree is. The compression rate influences the size of the compressed target video, and further influences transmission resources consumed during transmission of the target video; the degree of distortion affects the playing quality of the target video. Because the requirements of different service types on the compression rate and the distortion degree of the target video are different, the preset interval can be determined based on the service type corresponding to the target video, so that the compression rate and the distortion degree of the target video meet the service requirements. For example, the service types include live broadcast and recorded broadcast, the live broadcast has higher requirements on the transmission efficiency of the target video and lower requirements on the quality of the target video relative to the recorded broadcast, so that the preset interval corresponding to the live broadcast service can be set smaller than the preset interval corresponding to the recorded broadcast service to meet the service requirements. It should be noted that, the foregoing description of the preset intervals of the target videos under different service types is only exemplary, and the size of the preset intervals of the target videos under each service type may be set according to actual requirements, which is not limited in the embodiment of the present application.

In some embodiments, the terminal tests different intervals on the video set to determineAnd determining the video coding indexes corresponding to each interval, and further determining the interval corresponding to the maximum video coding index as a preset interval of the target video. Wherein, the video coding index is used for evaluating the compression Rate and distortion degree of the video caused by the interval, and the video coding index can be BD-Rate-PSNR #Delta Rate-Peak Signal to Noise Ratio, rate distortion performance evaluation index-peak signal to noise ratio), etc., which are not limited by the embodiments of the present application. In the method, the preset interval is determined based on the video coding indexes corresponding to different intervals, so that the two indexes of the compression rate and the distortion degree can be balanced, and the video coding performance is improved.

In some embodiments, the terminal stores a service type and a preset interval of a target video corresponding to the service type, and the terminal identifies the service type corresponding to the target video through a tag (tag) or a title of the target video, or scans the content of the target video to identify the service type corresponding to the target video; and the terminal queries a preset interval corresponding to the service type from the storage space based on the identified service type.

It should be noted that, this step 301 is an optional step, and in some embodiments, the preset interval is set by the object logging in the terminal or determined by other manners, and is not related to the service type of the target video, which is not limited in the embodiments of the present application.

In step 302, the terminal determines a set of candidate frames in a target video, the set of candidate frames including a first video frame determined based on a preset interval and at least one second video frame located after the first video frame.

The terminal determines that the first video frame includes any one of the following modes based on the preset interval.

The first way is: the terminal determines a first video frame based on a preset interval, and determines at least one video frame after the first video frame as a second video frame. For example, the terminal determines the nth frame as a first video frame based on a preset interval, and then determines the (n+1) th to (n+x) th frames as second video frames, where n and x are integers greater than 0 and x is the number of second video frames in one candidate frame set.

The second way is: the terminal determines a plurality of video frames located within a target position range based on a preset interval, determines a first video frame of the plurality of video frames as a first video frame, and determines the rest of the plurality of video frames except the first video frame as a second video frame, and a start point of the target position range is determined based on the preset interval. For example, the terminal determines, based on a preset interval, that a start point of a target position range is an nth frame, determines, based on a size of the target position range being an x+1th frame, that the nth to nth+x frames are a candidate frame set, determines a first video frame (i.e., the nth frame) in the candidate frame set as a first video frame, and determines the remaining video frames in the candidate frame set as second video frames (i.e., the nth+1 to nth+x frames). Where n and x are integers greater than 0 and x is the number of second video frames in a set of candidate frames.

Because the number of scene switching times in videos of different content types is different, for two videos of the same duration, the interval between two adjacent scene switching frames in the video with more scene switching times is smaller, and the interval between two adjacent scene switching frames in the video with less scene switching times is larger. Thus, in some embodiments, the terminal determines the number of second video frames based on the content type of the target video, e.g., the content type includes a public lesson and a travel video, the video with the content type of public lesson has fewer scene cuts than the video with the content type of travel video, and the number of second video frames in one candidate frame set in the public lesson video may be greater than the number of second video frames in one candidate frame set in the travel video. The terminal identifies the content type of the target video through tags (tags) or titles and the like of the target video. In the method, the terminal determines the number of the second video frames based on the content type of the target video, which is favorable for reducing the number of scenes corresponding to each video frame in the candidate frame set, and determines the second video frame corresponding to the first scene switching in the candidate frame set as the I frame, so that the effectiveness of the determined I frame can be improved, the compression rate of the target video is improved, the code rate is reduced, and the transmission resources are saved. In other embodiments, the number of second video frames is determined based on a preset interval, e.g., the number of second video frames is 1/4 of the preset interval. It should be noted that the above method for determining the number of the second video frames is merely exemplary, and the number of the second video frames may be determined according to actual requirements, which is not limited in this embodiment of the present application.

In step 303, the terminal determines an intra prediction cost for each second video frame and an inter prediction cost between each second video frame and the first video frame.

The terminal determines an intra-prediction cost of each second video frame, that is, an operation cost for predicting an uncoded pixel in the second video frame based on the coded pixel in the second video frame. In some embodiments, the terminal determines the intra-prediction costs for each of the second video frames using a chroma-based prediction mode or using a luma-based prediction mode. It should be noted that, the terminal may also determine the intra-frame prediction cost in other manners according to the actual requirement, which is not limited in the embodiment of the present application.

The terminal determines an inter-frame prediction cost between each second video frame and the first video frame, that is, determines an operation cost for predicting the second video frame based on the first video frame. In some embodiments, the terminal uses motion estimation or motion compensation to determine the inter-prediction cost between each second video frame and the first video frame. It should be noted that, the terminal may also determine the inter-frame prediction cost by adopting other modes according to actual requirements, which is not limited in the embodiment of the present application.

In step 304, if the ratio of the intra-prediction cost to the inter-prediction cost of any second video frame is smaller than the first threshold, the terminal determines the second video frame as an I frame.

The ratio of the intra-frame prediction cost to the inter-frame prediction cost of the second video frame indicates the possibility that the second video frame is subjected to scene switching based on the first video frame, and the larger the ratio is, the smaller the possibility that the scene switching is performed; the smaller the ratio, the greater the likelihood of scene cuts occurring. The first threshold is a preset threshold, and the ratio is smaller than the first threshold, which indicates that the second video frame is subjected to scene switching based on the first video frame. It should be noted that, the first threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.

In some embodiments, the terminal sequentially determines the ratio of the intra-frame prediction cost to the inter-frame prediction cost of each second video frame, compares the ratio with the first threshold value every time one ratio is obtained, determines the second video frame corresponding to the ratio as an I frame if any ratio is smaller than the first threshold value, and stops determining the ratio of the intra-frame prediction cost to the inter-frame prediction cost of the subsequent second video frames, so that the computing resources of the terminal can be saved, and the determination efficiency of the I frame can be improved. In other embodiments, the terminal first determines a ratio of intra-prediction costs to inter-prediction costs for each of the second video frames, and then compares each ratio to a first threshold. The embodiments of the present application are not limited in this regard.

The process of determining the second video frame as the I frame by the terminal is the same as that of step 202, and will not be described again.

It should be noted that, if the target video frame exists in the candidate frame set in the above step 304, the target video frame is determined to be an implementation manner of the I frame, and in some embodiments, the process is implemented based on other steps, which is not limited in the embodiments of the present application.

In step 305, if the ratio of the intra-prediction cost to the inter-prediction cost of each second video frame is greater than or equal to the first threshold, the terminal determines the I frame from the first video frame.

Wherein, corresponding to step 304, the ratio of the intra-frame prediction cost to the inter-frame prediction cost is greater than or equal to the first threshold, which indicates that the second video frame is not subject to scene switching based on the first video frame.

In some embodiments, the terminal sequentially determines a ratio of intra-prediction costs to inter-prediction costs for each second video frame, compares the ratio with a first threshold for each ratio, and determines the first video frame as an I-frame if each ratio is greater than or equal to the first threshold. In other embodiments, the terminal first determines a ratio of intra-prediction costs to inter-prediction costs for each of the second video frames, and then compares each ratio to a first threshold. The embodiments of the present application are not limited in this regard.

The process of determining the first video frame as the I frame by the terminal is the same as that of step 203, and will not be described again.

It should be noted that, step 305 is one implementation of determining the first video frame as an I frame if the target video frame does not exist in the candidate set, and in some embodiments, the process is implemented based on other steps, which is not limited in the embodiments of the present application.

It should be noted that, in the embodiment of the present application, each time the terminal determines an I frame, the terminal uses the position where the I frame is located as a starting position, determines a next candidate frame set based on a preset interval, and further determines a next I frame, and in some embodiments, the terminal uses the position where the first video frame is located as a starting position to determine a next candidate frame set, which is not limited in the embodiment of the present application.

The flow shown in the above steps 301 to 305 is illustrated by fig. 4. Fig. 4 is a schematic diagram of a video frame processing method according to an embodiment of the present application, where the method is performed by an x265 encoder, and the encoder includes a look ahead (look ahead) module, which is configured to determine, during encoding, encoding parameters such as a frame type of a target video and a preset interval (keyint) between two I frames. The encoder also provides a scene cut detection (scene cut) function for identifying a scene cut frame in the target video. The method comprises the steps that a lookahead module sequentially determines I frames from a first video frame of a target video frame, the lookahead module determines the first video frame based on a preset interval, and the lookahead module performs scenecat detection on subsequent frames (keyint/4 frames) of the first video frame. If there is no scene-cut frame within the keyint/4 frame range after the first video frame, the first video frame is determined to be an I-frame. If there is a scene-cut frame within the keyint/4 frame range after the first video frame, the first video frame is skipped, and the scene-cut frame is determined to be an I frame, which can be understood as temporarily extending the keyint to the position where the scene-cut frame is located. As shown in fig. 4, key=147, the 147 th frame is the first video frame, the 37 th frame after the first video frame is the second video frame, scene switching detection is performed on the 37 th frame after the 147 th frame, it is detected that the 149 th frame is the scene switching frame (i.e. the target video frame), and the lookahead module determines the 147 th frame as the P frame and the 149 th frame as the I frame, so that the 147 th frame can improve the compression rate through inter-frame prediction. In addition, a video frame processing method provided by the embodiment of the application is tested on a video set, and the test result shows that when the keyint is 60, the BD-Rate-PSNR gain can be obtained by 0.5%, which shows that the method can improve the compression performance of the encoder.

In the method, the terminal determines the I frame from the first video frame and the second video frame nearby the first video frame according to whether scene switching occurs, and determines the I frame according to whether scene switching occurs due to the fact that difference information between the video frames before and after the scene switching is large, so that the difference information between the video frames after the I frame and the I frame can be reduced, the data volume after the video frame compression is further reduced, and the compression rate of the video is improved. Further, the terminal determines preset intervals based on the service types corresponding to the target videos, and the preset intervals of the target videos under different service types are different, so that the requirements of the different service types on the target videos can be met flexibly; in addition, the terminal determines the number of the second video frames based on the content type of the target video, which is favorable for reducing the number of scenes corresponding to each video frame in the candidate frame set, and determines the second video frame corresponding to the first scene switching in the candidate frame set as an I frame, so that the effectiveness of the determined I frame can be improved, and the compression rate of the video is improved.

The following description will take an example of determining a target video frame based on a color histogram of the video frame. Fig. 5 is a flowchart of a video frame processing method according to an embodiment of the present application, as shown in fig. 5, and the method is implemented by a terminal as an example, and the method includes the following steps 501 to 506.

In step 501, the terminal determines a preset interval based on the service type corresponding to the target video.

In step 502, the terminal determines a set of candidate frames in a target video, the set of candidate frames including a first video frame determined based on a preset interval and at least one second video frame located after the first video frame.

The steps 501 to 502 are similar to the steps 301 to 302, and are not repeated.

In step 503, the terminal determines a color histogram for each video frame in the set of candidate frames.

Wherein the color histogram is used to describe the proportion of different colors in the whole video frame, so the color histogram can indicate the frequency distribution of colors in the video frame, but cannot indicate the spatial distribution of colors in the video frame. And the terminal obtains the color histogram of each video frame by counting the frequency distribution of the colors of each video frame in the candidate frame set. In the method, as the frequency distribution conditions of the colors of the video frames are greatly different in different scenes, the difference of the color frequency distribution of the two video frames can be measured by comparing the difference of the color histograms to determine whether the video frames are subjected to scene switching.

In step 504, the terminal determines difference information of each second video frame and the first video frame based on the color histogram.

Wherein the difference information of the second video frame and the first video frame is indicated by a similarity of color histograms of the second video frame and the first video frame. The higher the similarity, the smaller the difference information, and the less likely the second video frame is that scene switching occurs relative to the first video frame; the lower the similarity, the greater the difference information, and the greater the likelihood that scene cuts occur for the second video frame relative to the first video frame.

In some embodiments, the terminal converts a three primary color (Red-Green-Blue) value of each pixel in the video frame into a gray value, divides the gray value into a plurality of gray intervals from 0 to 255, counts the number of pixels corresponding to each gray interval to obtain a gray feature vector of the video frame, determines the similarity of the gray feature vectors of the second video frame and the first video frame, and describes the difference information of the second video frame and the first video frame by subtracting the similarity from 1. For example, the terminal divides the gray value into 5 gray intervals, and for the first video frame, the number of pixels corresponding to each gray interval is 200, 100, 150, 100, 300, respectively, the gray feature vector of the first video frame is [200, 100, 150, 100, 300], and for the second video frame, the number of pixels corresponding to each gray interval is 100, 200, 250, 300, 100, respectively, the gray feature vector of the second video frame is [100, 200, 250, 300, 100]. The terminal determines the similarity of gray feature vectors of the first video frame and the second video frame to obtain difference information of the first video frame and the second video frame. It should be noted that the foregoing description of the manner of determining the similarity of the color histograms is merely exemplary, and the embodiments of the present application do not limit the manner of determining the similarity map of the color histograms.

In step 505, if the difference information between any second video frame and the first video frame is greater than the second threshold, the terminal determines the second video frame as an I frame.

The second threshold is a preset threshold, and difference information of the second video frame and the first video frame is larger than the second threshold, which indicates that scene switching occurs in the second video frame based on the first video frame. It should be noted that, the second threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.

In some embodiments, the terminal sequentially determines the difference information of each second video frame and the first video frame, compares the difference information with the second threshold value every time one difference information is obtained, if any difference information is greater than the second threshold value, the terminal determines the second video frame corresponding to the difference information as an I frame, and stops determining the difference information of the subsequent second video frame and the first video frame, so that the computing resources of the terminal can be saved, and the determination efficiency of the I frame can be improved. In other embodiments, the terminal first determines difference information for each second video frame and the first video frame, and then compares each difference information with a second threshold. The embodiments of the present application are not limited in this regard.

It should be noted that, in this step 505 is one implementation of determining the target video frame as an I frame if the target video frame exists in the candidate frame set, and in some embodiments, the process is implemented based on other steps, which is not limited in this embodiment of the present application.

In step 506, if the difference information between each second video frame and the first video frame is less than or equal to the second threshold, the terminal determines the first video frame as an I frame.

Wherein, corresponding to step 505, the difference information between the second video frame and the first video frame is greater than the second threshold, which indicates that the second video frame is not subject to scene change based on the first video frame.

In some embodiments, the terminal sequentially determines the difference information of each second video frame and the first video frame, compares the difference information with a second threshold value every time one difference information is obtained, and determines the first video frame as an I frame if each difference information is less than or equal to the second threshold value. In other embodiments, the terminal first determines difference information for each second video frame and each first video frame, and then compares each difference information with a second threshold. The embodiments of the present application are not limited in this regard.

It should be noted that, in this step 506 is one implementation of determining the first video frame as an I frame if the target video frame does not exist in the candidate frame set, and in some embodiments, the process is implemented based on other steps, which is not limited in the embodiments of the present application.

In the method, the terminal determines the I frame from the first video frame and the second video frame nearby the first video frame according to whether scene switching occurs, and determines the I frame according to whether scene switching occurs due to the fact that difference information between the video frames before and after the scene switching is large, so that the difference information between the video frames after the I frame and the I frame can be reduced, the data volume after the video frame compression is further reduced, and the compression rate of the video is improved. Further, the difference of the color histograms is compared to measure the difference of the two video frames on the frequency distribution of the colors, so that whether the video frames are subjected to scene switching is determined, and compared with the method for determining whether the video frames are subjected to scene switching by comparing the difference of the video frames on the spatial distribution of the colors, the method can save the computing resources of a terminal and improve the determination efficiency of the I frames.

It should be noted that, the embodiments shown in fig. 3 and fig. 5 above are described by taking the video frame in which the scene change occurs as an example based on the intra-prediction cost and the inter-prediction cost of the video frame and based on the color histogram of the video frame, respectively, and in some embodiments, the video frame in which the scene change occurs may be determined in other manners, which is not limited in the embodiments of the present application.

Fig. 6 is a block diagram of a video frame processing apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes a candidate frame set determining unit 601 and an I-frame determining unit 602.

A candidate frame set determining unit 601 configured to perform determination of a candidate frame set in a target video, the candidate frame set including a first video frame determined based on a preset interval and at least one second video frame located after the first video frame;

an I-frame determining unit 602 configured to perform determining a target video frame as an I-frame if there is the target video frame in the candidate frame set, the target video frame being a video frame in which scene switching occurs based on the first video frame;

the I-frame determination unit 602 is further configured to perform determining the first video frame as an I-frame if the target video frame is not present in the set of candidate frames.

In a possible implementation manner, the candidate frame set determining unit 601 is configured to perform any one of the following:

In one possible embodiment, the apparatus further comprises:

the I-frame determining unit 602 is further configured to perform:

In one possible embodiment, the apparatus further comprises:

the I-frame determining unit 602 is further configured to perform:

In summary, the I frame is determined from the first video frame and the second video frame near the first video frame based on whether scene switching occurs, and since the difference information between the video frames before and after the scene switching occurs is larger, the difference information between the video frames after the I frame and the I frame can be reduced based on whether the scene switching occurs, thereby reducing the data amount after the video frame compression, and further improving the compression rate of the video.

It should be noted that: in the video frame processing apparatus provided in the above embodiment, when performing the corresponding steps, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the video frame processing apparatus and the video frame processing method embodiment provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not repeated herein.

In an embodiment of the present disclosure, there is also provided an electronic device including a processor and a memory for storing at least one computer program loaded and executed by the processor to implement the video frame processing method described above.

Taking an electronic device as an example of a terminal, fig. 7 is a block diagram of a structure of a terminal provided in an embodiment of the present application, referring to fig. 7, a terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Au dio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Grou p Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 700 includes: a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, a 7-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Progra mmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen is required to display. In some embodiments, the processor 701 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one program code for execution by processor 701 to implement the processes performed by a terminal in the video frame processing method provided by the method embodiments in the present disclosure.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera assembly 706, audio circuitry 707, and a power supply 708.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which are not limited by the disclosed embodiments.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. In some embodiments, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (Near Field Communication ) related circuitry, which is not limited by the present disclosure.

The display screen 705 is used to display a UI (User Interface), a User page. The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one and disposed on the front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. In some embodiments, camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The power supply 708 is used to power the various components in the terminal 700. The power source 708 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 708 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 709. The one or more sensors 709 include, but are not limited to: acceleration sensor 710, gyro sensor 711, pressure sensor 712, optical sensor 713, and proximity sensor 714.

The acceleration sensor 710 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 710 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display a user page in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 710. Acceleration sensor 710 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 711 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 711 may collect a 3D motion of the user on the terminal 700 in cooperation with the acceleration sensor 710. The processor 701 may implement the following functions according to the data collected by the gyro sensor 711: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 712 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the display screen 705. When the pressure sensor 712 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs a left-right hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 712. When the pressure sensor 712 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI page according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 713 is used to collect the intensity of ambient light. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 713. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 713.

A proximity sensor 714, also known as a distance sensor, is typically provided on the front panel of the terminal 700. The proximity sensor 714 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 714 detects that the distance between the user and the front of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 714 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Taking an electronic device as a server as an example, fig. 8 is a schematic structural diagram of a server provided in the embodiment of the present application, where the server 800 may have a relatively large difference due to different configurations or performances, and may include one or more CPUs (Central Processing Units, processors) 801 and one or more memories 802, where at least one computer program is stored in the one or more memories 802, and the at least one computer program is loaded and executed by the one or more processors 801 to implement the video frame processing method described above. Of course, the server 800 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an embodiment of the present application, there is further provided a computer readable storage medium including a program code, for example, a memory 702 including a program code, the program code being executable by the processor 701 of the terminal 700 to perform the video frame processing method. Alternatively, the computer readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory ), a CD-ROM (Compact-Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an embodiment of the present application, there is also provided a computer program product comprising one or more computer programs, the one or more computer programs being executed by one or more processors of an electronic device, so that the electronic device is capable of performing the above-mentioned video frame processing method.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of video frame processing, the method comprising:

determining a candidate frame set in a target video, wherein the candidate frame set comprises a first video frame determined based on a preset interval and at least one second video frame positioned after the first video frame;

if a target video frame exists in the candidate frame set, determining the target video frame as an I frame, wherein the target video frame is a video frame subjected to scene switching based on the first video frame;

and if the target video frame does not exist in the candidate frame set, determining the first video frame as an I frame.

2. The method of claim 1, wherein determining the set of candidate frames in the target video comprises any one of:

determining the first video frame based on the preset interval, and determining at least one video frame after the first video frame as a second video frame;

And determining a plurality of video frames positioned in a target position range based on the preset interval, determining a first video frame in the plurality of video frames as a first video frame, determining the rest video frames except the first video frame in the plurality of video frames as a second video frame, and determining the starting point of the target position range based on the preset interval.

3. The method according to claim 1, wherein the method further comprises:

determining an intra-prediction cost for each of the second video frames and an inter-prediction cost between each of the second video frames and the first video frame;

and if the target video frame exists in the candidate frame set, determining the target video frame as an I frame, including:

and if the ratio of the intra-frame prediction cost to the inter-frame prediction cost of any one of the second video frames is smaller than a first threshold value, determining the second video frames as I frames.

4. The method according to claim 1, wherein the method further comprises:

determining a color histogram of each video frame in the candidate frame set;

5. The method of claim 1, wherein the preset interval is determined based on a service type corresponding to the target video.

6. The method of claim 1, wherein the number of second video frames in the set of candidate frames is determined based on a content type of the target video.

7. A video frame processing apparatus, the apparatus comprising:

The I-frame determination unit is further configured to perform determining the first video frame as an I-frame if there is no target video frame in the candidate frame set.

8. The apparatus according to claim 7, wherein the candidate frame set determination unit is configured to perform any one of:

9. The apparatus of claim 7, wherein the apparatus further comprises:

a target video frame determination unit configured to perform determination of an intra prediction cost of each of the second video frames and an inter prediction cost between each of the second video frames and the first video frame;

the I-frame determination unit is further configured to perform:

10. The apparatus of claim 7, wherein the apparatus further comprises:

the I-frame determination unit is further configured to perform:

11. The apparatus of claim 7, wherein the preset interval is determined based on a service type corresponding to the target video.

12. The apparatus of claim 7, wherein a number of second video frames in the set of candidate frames is determined based on a content type of the target video.

13. An electronic device comprising a processor and a memory for storing at least one computer program loaded and executed by the processor to implement the video frame processing method of any one of claims 1 to 6.

14. A computer readable storage medium having stored therein at least one computer program that is loaded and executed by a processor of an electronic device to implement the video frame processing method of any one of claims 1 to 6.