CN113099241B

CN113099241B - Reference frame list updating method, device, equipment and storage medium

Info

Publication number: CN113099241B
Application number: CN202110350791.0A
Authority: CN
Inventors: 施乐; 丁文鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2022-11-01
Anticipated expiration: 2041-03-31
Also published as: CN113099241A

Abstract

The application discloses a method, a device, equipment, a storage medium and a program product for updating a reference frame list, and relates to the technical field of coding. One embodiment of the method comprises: determining a target frame from the video; carrying out frame level classification on the target frame to obtain the category of the target frame; and updating the reference frame list based on the updating mode corresponding to the category of the target frame. According to the embodiment, the reference frame list of the video encoder is dynamically updated according to the type of the target frame, and the compression performance of the video encoder can be improved under the condition that the encoding complexity is kept unchanged.

Description

Reference frame list updating method, device, equipment and storage medium

Technical Field

Embodiments of the present application relate to the field of computers, and in particular, to the field of coding technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for updating a reference frame list.

Background

A video encoder encodes video using a list of reference frames. Currently, when a reference frame list of a video encoder is updated, a fixed reference frame list updating logic is selected according to a structure of a BP (bi-directional predicted), and a plurality of encoded video frames are reserved as reference frames of a subsequent frame to be encoded.

Taking the pyramid structure with BP structure 15B1P as an example, sequence numbers 1 to 15 are B frames, sequence numbers 2, 4, 6, 8, 10, 12, 14 are B frames that can be referred to by subsequent frames, sequence number 16 is a P frame, and sequence number 0 is a P frame of the previous BP structure. All the even frames are available for reference. The related technique is to put the frames with fixed selection sequence numbers 0, 8 and 16 into the reference frame list for reference by the frames after the 16 th frame. The operations of motion estimation, rate distortion optimization and the like in the encoding are performed based on the current frame and the reference frame in the reference frame list.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment, a storage medium and a program product for updating a reference frame list.

In a first aspect, an embodiment of the present application provides a method for updating a reference frame list, including: determining a target frame from the video; carrying out frame level classification on the target frame to obtain the category of the target frame; and updating the reference frame list based on the updating mode corresponding to the category of the target frame.

In a second aspect, an embodiment of the present application provides a reference frame list updating apparatus, including: a determination module configured to determine a target frame from a video; the classification module is configured to perform frame level classification on the target frame to obtain the category of the target frame; and the updating module is configured to update the reference frame list based on the updating mode corresponding to the category of the target frame.

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application propose a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.

In a fifth aspect, the present application provides a computer program product, which includes a computer program that, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

The reference frame list updating method, device, equipment, storage medium and program product provided by the embodiments of the present application dynamically update the reference frame list of the video encoder according to the type of the target frame, and can improve the compression performance of the video encoder while the encoding complexity remains unchanged.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present application, nor are they intended to limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram for one embodiment of a reference frame list update method according to the present application;

FIG. 3 is a flow diagram of yet another embodiment of a reference frame list update method according to the present application;

FIG. 4 is a flow diagram of another embodiment of a reference frame list update method according to the present application;

FIG. 5 is a block diagram illustrating an embodiment of a reference frame list updating apparatus according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a reference frame list updating method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the reference frame list updating method or the reference frame list updating apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or transmit videos or the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may perform processing such as analysis on videos acquired from the

terminal apparatuses

101, 102, 103 and generate a processing result (e.g., a reference frame list).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the reference frame list updating method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the reference frame list updating apparatus is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to fig. 2, a flow 200 of one embodiment of a reference frame list update method according to the present application is shown. The reference frame list updating method comprises the following steps:

step 201, determining a target frame from a video.

In the present embodiment, the subject of execution of the reference frame list updating method (e.g., the server 105 shown in fig. 1) can determine a target frame from a video.

Here, the target frame may be obtained based on a video that needs to be encoded. For example, all video frames in the video are taken as target frames. To reduce the computational complexity, the number of video frames that need to be computed can be reduced. For example, a video is sampled at preset intervals (for example, at equal intervals, every n frames, where n is a positive integer), and only the sampled video frames are used as target frames. To further reduce the computational complexity, the pixels of the video frame that need to be computed may also be reduced. For example, a sampled video frame is sampled (e.g., down-sampled by width), a sampled video frame is obtained, and the sampled video frame is taken as a target frame. The down-sampling is also called down-sampling or image reduction, that is, the number of sampling points is reduced. For an N x M image, if the down-sampling coefficient is k, every k points in each row and column in the original image are taken to form an image. N, M and k are positive integers, and k is less than N and M.

Step 202, performing frame level classification on the target frame to obtain the category of the target frame.

In this embodiment, the execution body may perform frame-level classification on the target frame to obtain a category of the target frame.

The category of the video frame can be divided according to the motion condition of the video frame. For example, the video processing device is divided into two categories, i.e., a dynamic area frame and a static area frame, wherein the dynamic area frame is a dynamic video frame and the static area frame is a static video frame. For another example, the video frames are classified into three categories, i.e., a high dynamic area frame, which is a video frame with a severe motion, a static area frame, which is a video frame with a static motion, and a normal area frame, which is a video frame with a slight motion.

Step 203, updating the reference frame list based on the updating mode corresponding to the category of the target frame.

In this embodiment, the execution subject may update the reference frame list based on an update mode corresponding to the category of the target frame.

The reference frame list may include a plurality of reference frames for prediction. The reference frame list may include a plurality of update modes, and different categories of video frames may correspond to different update modes. For example, the reference frame list includes two update modes, i.e., an adjacent mode and a picture quality mode. When the video frame is divided into two categories, i.e., a dynamic area frame and a static area frame, the dynamic area frame category may correspond to the proximity mode and the static area frame category may correspond to the image quality mode. When the video frame is divided into three categories, namely a high dynamic area frame, a static area frame and a normal area frame, the high dynamic area frame category can correspond to an adjacent mode, the static area frame category can correspond to an image quality mode, and the normal area frame category can correspond to the adjacent mode and can correspond to the image quality mode. The adjacent mode may be a mode in which a reference frame is selected from a side adjacent to the target frame. The frame quality mode may be a mode in which the reference frames are sorted according to the quality level and the reference frames are selected from the high quality side.

Taking the pyramid structure with BP structure 15B1P as an example, sequence numbers 1 to 15 are B frames, sequence numbers 2, 4, 6, 8, 10, 12, and 14 are B frames that can be referred to by subsequent frames, sequence number 16 is a P frame, and sequence number 0 is a P frame of the previous BP structure. All the even frames are frames available for reference. However, the pyramid structure results in the lowest quantization parameter and the highest image quality for the frames with sequence numbers 0 and 16; the frame number 8 is inferior in picture quality; the image quality of the frames No. 4 and 12 is inferior to that of the frames No. 2, 6, 10 and 14. The adjacent mode can select the frames with sequence numbers 12, 14 and 16 to be put into the reference frame list. The frame quality mode can select the frame with sequence number-16 (P frame with previous BP structure), 0 and 16 to be put into the reference frame list.

The video may include I frames, P frames, and B frames. An I-frame, i.e. an intra-coded frame, is an independent frame with all information, and can be decoded independently without referring to other video frames. A P frame, i.e., a forward predictive coded frame, is predicted from a P frame or an I frame preceding it, and the data of the frame is compressed according to the difference between the frame and the adjacent previous frame (I frame or P frame). The B frame, which is a bidirectional predictive interpolation-coded frame, compresses data of a frame according to a difference between adjacent previous, current, and next frame data.

Further, if all the video frames in the video are target frames, for one frame of video frame in the video, the execution body may update a reference frame list based on the video frame, and encode the video frame using the reference frame list. If the target frame is obtained by sampling every n frames and performing width-height down-sampling on the video, for n frames of video in the video, the execution main body may update the target frame corresponding to the n frames of video to obtain a reference frame list, and encode the n frames of video by using the reference frame list.

According to the reference frame list updating method provided by the embodiment of the application, a target frame is determined from a video; then, carrying out frame level classification on the target frame to obtain the category of the target frame; and finally, updating the reference frame list based on the updating mode corresponding to the category of the target frame. The reference frame list of the video encoder is dynamically updated according to the type of the target frame, and the compression performance of the video encoder can be improved under the condition that the encoding complexity is kept unchanged.

With further reference to fig. 3, a flow 300 of yet another embodiment of a reference frame list update method according to the present application is shown. The reference frame list updating method comprises the following steps:

step 301, determining a target frame from a video.

In this embodiment, the specific operation of step 301 has been described in detail in step 201 in the embodiment shown in fig. 2, and is not described herein again.

Step 302, determining the number of bits consumed by the inter-frame prediction and intra-frame prediction of each block of the target frame when the target frame is encoded by using the inter-frame prediction and intra-frame prediction modes, respectively.

In this embodiment, the execution subject of the reference frame list updating method (for example, the server 105 shown in fig. 1) may determine the number of bits consumed for encoding prediction of inter prediction and intra prediction for each block of the target frame in the case of encoding the target frame using two encoding modes, i.e., inter prediction and intra prediction, respectively.

In some embodiments, the execution body may obtain the number of bits consumed by the inter prediction and the intra prediction for the encoding prediction of each block of the target frame by directly encoding the target frame. For example, the target frame is encoded using two encoding modes, inter-prediction and intra-prediction, respectively. And recording the number of bits consumed by the inter-frame prediction and the intra-frame prediction of each block of the target frame after the encoding is finished.

In some embodiments, the execution body may not directly encode the target frame, but obtains the number of bits consumed by the encoding prediction of the inter prediction and the intra prediction of each block of the target frame by performing the encoding prediction on the target frame.

Inter-prediction and intra-prediction are two coding modes of video. The video is coded, so that the video data rate can be reduced on the premise of ensuring the visual effect as far as possible, and the effective compression of the video is realized. The inter-frame prediction uses the correlation of a video time domain and uses the pixels of adjacent coded video frames to predict the pixels of the current video frame so as to achieve the aim of effectively removing the video time domain redundancy. Currently, block-based motion compensation techniques are mainly used, and the principle is to find a best matching block, i.e. motion estimation, in a previously encoded video frame for each block of a current video frame. The video frame used for prediction is called a reference frame, the displacement of a block of the reference frame to a block of the current video frame is called a motion vector, and the difference between the block of the current video frame and the block of the reference frame is called a prediction residual. The basic idea of intra prediction is to remove spatial redundancy by exploiting the correlation of neighboring pixels. In video coding, neighboring pixels refer to reconstructed pixels of encoded blocks surrounding the current block. For a block of a video frame (except for the special handling of edge blocks), each pixel may be predicted with a different weighted sum (some weights may be 0) of several nearest previously coded pixels, i.e. several pixels in the upper left corner of the block where this pixel is located.

Whether inter-prediction or intra-prediction, it requires a number of bits to encode a block of a target frame. Taking inter-frame prediction as an example, finding the best matching block and quantizing the residual error when making prediction all need to consume a certain number of coding bits. Moreover, when the target frame moves violently, the farther the distance between the reference frame and the target frame is, the more the number of coding bits consumed for searching the best matching block is; when the target frame is static, the higher the quantization parameter of the reference frame is, the lower the picture quality is, the more the quantization residual error is in prediction, and the more the number of coding bits required to be consumed.

And 303, selecting a coding mode corresponding to the smaller of the coding prediction consumed bit number corresponding to the inter-frame prediction and the intra-frame prediction to code the corresponding block in the target frame, and recording the motion vector of each block of the target frame.

In this embodiment, for any block in the target frame, the execution main body may compare the number of bits consumed for encoding prediction for inter prediction and the number of bits consumed for encoding prediction for intra prediction of the block, and select an encoding mode with a small number of bits consumed for encoding prediction to encode the block. And, the motion vector of this block is recorded.

And step 304, performing frame level classification based on the motion vector of each block of the target frame to obtain the category of the target frame.

In this embodiment, the execution body may perform frame-level classification based on the motion vector of each block of the target frame, so as to obtain the category of the target frame.

Wherein the motion vector of each block of the target frame can characterize the motion condition of the target frame. In general, the larger the motion vector of each block of the target frame, the more severe the motion of the target frame. The category of the video frame can be divided according to the motion condition of the video frame. For example, the video processing device is divided into two categories, i.e., a dynamic area frame and a static area frame, wherein the dynamic area frame is a dynamic video frame and the static area frame is a static video frame. For another example, the video frames are classified into three categories, i.e., a high-dynamic-region frame, which is a video frame with a severe motion, a static-region frame, which is a video frame with a static motion, and a normal-region frame, which is a video frame with a slight motion.

In some optional implementations of this embodiment, the execution subject may count the motion vector information based on the motion vector of each block of the target frame, and compare the motion vector information with the motion vector reference threshold, so as to quickly determine the category of the target frame. Wherein the motion vector information may include, but is not limited to, at least one of: the motion vector absolute value accumulated sum, the motion vector standard deviation, the motion vector variance, the motion vector peak value and the like. The motion vector reference threshold value can be set according to actual requirements. For example, the motion vector base threshold is equal to the square of the maximum search range divided by 100.

Wherein the motion vector information is compared with the motion vector reference threshold by at least one of:

firstly, if the comparison result of the motion vector information and the motion vector reference threshold meets a first preset condition, determining that the target frame is a high dynamic area frame. Wherein the first preset condition may be a preset condition related to the motion vector reference threshold. For example, the first preset condition may include: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is greater than a motion vector reference threshold value, and the motion vector variance is greater than a first preset multiple (e.g., 3 times) of the motion vector reference threshold value. In this case, the motion vector information includes at least: the motion vector absolute value accumulated sum and the motion vector variance.

And secondly, if the comparison result of the motion vector information and the motion vector reference threshold meets a second preset condition, determining that the target frame is a high dynamic region frame. Wherein the second preset condition may be a preset condition related to the motion vector reference threshold. For example, the second preset condition may include: a motion vector base threshold value in which the motion vector variance is smaller than a second preset multiple (e.g., 0.2 times), a motion vector base threshold value in which the motion vector peak value is larger than a third preset multiple (e.g., 0.8 times), a value of the motion vector absolute value cumulative sum divided by the number of blocks of the target frame plus the motion vector standard deviation, is larger than a fourth preset multiple (e.g., 2 times). In this case, the motion vector information includes at least: the motion vector absolute value accumulated sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value.

And thirdly, if the comparison result of the motion vector information and the motion vector reference threshold value simultaneously meets a third preset condition and the inter-frame block ratio is smaller than a preset ratio threshold value (for example, 0.1), determining that the target frame is a static area frame. Wherein the third preset condition may be a preset condition related to the motion vector reference threshold. For example, the third preset condition may include: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is smaller than a motion vector reference threshold value of a fifth preset multiple (e.g., 0.02 times). And recording the inter-frame block ratio of the target frame after all the blocks of the target frame are coded. The inter-frame block ratio of the target frame and the motion vector of each block can represent the motion condition of the target frame. In general, the higher the inter block fraction of the target frame, the more severe the motion of the target frame. In this case, the motion vector information includes at least: the motion vector absolute values are summed up.

And fourthly, if the comparison result of the motion vector information and the motion vector reference threshold value does not meet the first preset condition, the second preset condition, the third preset condition and the fourth preset condition at the same time, determining that the target frame is the common region frame. The fourth preset condition may be that the inter-frame block ratio of the target frame is smaller than a preset ratio threshold. In this case, the motion vector information includes at least: the motion vector absolute value accumulated sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value.

It should be noted that specific contents of the first preset condition, the second preset condition, and the third preset condition, and specific numerical values of the first preset multiple, the second preset multiple, the third preset multiple, the fourth preset multiple, and the fifth preset multiple may be set according to actual situations, and are not limited herein.

Step 305, updating the reference frame list based on the update mode corresponding to the category of the target frame.

In this embodiment, the specific operation of step 305 has been described in detail in step 203 in the embodiment shown in fig. 2, and is not described herein again.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the reference frame list updating method in the present embodiment highlights the classification step of the target frame. Therefore, in the scheme described in this embodiment, for any block of the target frame, the coding mode corresponding to the smaller of the estimated number of bits consumed for coding is selected for coding, so that the coding workload on the target frame is reduced. The motion vector of each block of the target frame characterizes the motion condition of the target frame. And the frame-level classification is carried out based on the motion vector of each block of the target frame, so that the classification accuracy of the target frame is improved.

With further reference to fig. 4, a flow 400 of yet another embodiment of a reference frame list update method according to the present application is shown. The reference frame list updating method comprises the following steps:

step 401, sampling the video to obtain a sampled video.

In this embodiment, the execution subject of the reference frame list updating method (for example, the server 105 shown in fig. 1) may sample the video to obtain a sampled video.

Wherein the sampling may be, for example, wide-height down-sampling. The video is subjected to width-height down-sampling, so that the pixels of video frames in the video can be reduced, and the computational complexity is reduced. The down-sampling is also called down-sampling or image reduction, that is, the number of sampling points is reduced. For an N x M image, if the down-sampling coefficient is k, every k points in each row and column in the original image are taken to form an image. N, M and k are all positive integers, and k is smaller than N and M.

Step 402, sampling the sampling video at preset intervals to obtain a target frame.

In this embodiment, the execution main body may perform sampling at preset intervals on the sampled video to obtain the target frame.

The preset interval sampling may be, for example, equal interval sampling. The sampling video is sampled at intervals, so that the number of target frames can be reduced, and the computational complexity is further reduced. For example, every n frames. Wherein n is a positive integer.

Step 403, determining the number of bits consumed by the inter-frame prediction and intra-frame prediction of the coding prediction of each block of the target frame when the target frame is coded by using the inter-frame prediction and intra-frame prediction modes, respectively.

In this embodiment, the specific operation of step 403 has been described in detail in step 302 in the embodiment shown in fig. 3, and is not described herein again.

Step 404, selecting a coding mode corresponding to the smaller of the coding prediction consumed bit numbers of the corresponding inter-frame prediction and intra-frame prediction to code the corresponding block in the target frame, and recording the inter-frame block percentage and the motion vector of each block of the target frame.

In this embodiment, for any block in the target frame, the execution main body may compare the number of bits consumed for encoding prediction for inter prediction and the number of bits consumed for encoding prediction for intra prediction of the block, and select an encoding mode with a small number of bits consumed for encoding prediction to encode the block. And recording the inter-frame block ratio of the target frame and the motion vector of each block after all the blocks of the target frame are coded.

The inter-frame block occupation ratio of the target frame is equal to the ratio of the number of blocks which are coded by adopting the inter-frame prediction coding mode in the target frame to the number of all blocks of the target frame. The motion vector of a block is the displacement of the block of the reference frame to the block of the target frame.

Step 405, motion vector information is counted based on the motion vector of each block of the target frame.

In this embodiment, the execution body described above may count motion vector information based on a motion vector of each block of the target frame. Wherein the motion vector information may include, but is not limited to, at least one of: motion vector absolute value accumulated sum, motion vector standard deviation, motion vector variance, motion vector peak value and the like.

Step 406, determining whether the comparison result of the motion vector information and the motion vector reference threshold satisfies a first preset condition.

In this embodiment, the execution body may determine whether a result of comparing the motion vector information with the motion vector reference threshold satisfies a first preset condition. If the first preset condition is satisfied, go to step 408; if the first predetermined condition is not satisfied, go to step 407.

Wherein the first preset condition may be a preset condition related to the motion vector reference threshold. For example, the first preset condition may include: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is greater than a motion vector reference threshold value, and the motion vector variance is greater than a first preset multiple (e.g., 3 times) of the motion vector reference threshold value.

Step 407, determining whether the comparison result of the motion vector information and the motion vector reference threshold satisfies a second preset condition.

In this embodiment, if the comparison result between the motion vector information and the motion vector reference threshold does not satisfy the first preset condition, the execution subject may determine whether the comparison result between the motion vector information and the motion vector reference threshold satisfies the second preset condition. If the second preset condition is satisfied, go to step 408; if the second predetermined condition is not satisfied, go to step 409.

Wherein the second preset condition may be a preset condition related to the motion vector reference threshold. For example, the second preset condition may include: a motion vector base threshold value in which the motion vector variance is smaller than a second preset multiple (e.g., 0.2 times), a motion vector base threshold value in which the motion vector peak value is larger than a third preset multiple (e.g., 0.8 times), a value of the motion vector absolute value cumulative sum divided by the number of blocks of the target frame plus the motion vector standard deviation, is larger than a fourth preset multiple (e.g., 2 times).

Step 408, determine that the target frame is a high dynamic region frame.

In this embodiment, if the comparison result between the motion vector information and the motion vector reference threshold satisfies the first preset condition or the second preset condition, the executing entity may determine that the target frame is a high dynamic region frame, and continue to execute step 413. Wherein the high dynamic region frame is a video frame with violent motion.

Step 409, determining whether the comparison result of the motion vector information and the motion vector reference threshold meets a third preset condition and a fourth preset condition.

In this embodiment, if the comparison result between the motion vector information and the motion vector reference threshold does not satisfy the second preset condition, the execution subject may determine whether the comparison result between the motion vector information and the motion vector reference threshold satisfies the third preset condition. If the comparison result between the motion vector information and the motion vector reference threshold satisfies the third preset condition and the fourth preset condition, execute step 410; if the comparison result between the motion vector information and the motion vector reference threshold does not satisfy the third preset condition and the fourth preset condition, step 411 is executed.

Wherein the third preset condition and the fourth preset condition may be preset conditions related to a motion vector reference threshold. For example, the third preset condition may include: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is smaller than a motion vector reference threshold value of a fifth preset multiple (e.g., 0.02 times). The fourth preset condition may include: the inter-frame block occupancy is less than a preset occupancy threshold (e.g., 0.1).

Step 410, determine that the target frame is a static area frame.

In this embodiment, if the comparison result between the motion vector information and the motion vector reference threshold satisfies the third preset condition and the fourth preset condition, the executing entity may determine that the target frame is a static area frame, and continue to perform step 414. Wherein the static region frame is a static video frame.

In step 411, it is determined that the target frame is a normal area frame.

In this embodiment, if the comparison result of the motion vector information and the motion vector reference threshold does not satisfy the first preset condition, the second preset condition, the third preset condition, and the fourth preset condition at the same time, the execution subject may determine that the target frame is a normal region frame. Wherein the normal region frame is a slightly moving video frame.

It should be noted that, since the target frame is obtained by high down-sampling the video bandwidth and sampling every n frames, the n frames of video corresponding to the target frame may be also classified into the same category as the target frame.

In step 412, it is determined whether the number of B frames of the normal region frame is less than a preset number threshold.

In this embodiment, if the target frame is a normal region frame, the execution body may determine whether the number of B frames of the normal region frame is less than a preset number threshold. If the number of B frames of the normal region frames is less than the predetermined number threshold (e.g. 8), go to step 413; if the number of B frames of the normal region frames is not less than the first predetermined number threshold, go to step 414.

The number of B frames of the common region frames is small, which indicates that the motion of the target frame is relatively violent; the number of B frames of the normal region frame is large, which indicates that the motion of the target frame is relatively slight.

In step 413, the reference frame list is updated using the neighbor mode.

In this embodiment, if the target frame is a high dynamic region frame or a normal region frame with a B frame number smaller than a predetermined number threshold, the execution body may update the reference frame list using the neighbor mode. The adjacent mode may be a mode in which a reference frame is selected from a side adjacent to the target frame. Taking the pyramid structure with the BP structure of 15B1P as an example, the neighboring mode can select frames with sequence numbers 12, 14, and 16 to be placed in the reference frame list.

If the target frame is a high dynamic region frame or a common region frame with the number of B frames smaller than a preset number threshold, the motion of the target frame is relatively violent. At this time, the reference frame list is updated by using the approach mode, and since the reference frame in the reference frame list is closer to the target frame, the best matching block can be quickly found, thereby reducing the number of coding bits consumed.

In step 414, the reference frame list is updated using the quality mode.

In this embodiment, if the target frame is a static region frame or a normal region frame with the number of B frames not less than a predetermined number threshold, the execution body may update the reference frame list using the image quality mode. The mode may be a mode in which the reference frames are sorted according to the quality level and selected from the side with higher quality level. Taking the pyramid structure with BP structure of 15B1P as an example, the frame id mode may select frames with sequence numbers-16 (P frame with the next BP structure), 0 (P frame with the next BP structure), and 16 to be placed in the reference frame list.

If the target frame is a static region frame or a normal region frame with the number of B frames not less than the preset number threshold, it indicates that the motion of the target frame is relatively slight. At this time, the reference frame list is updated using the image quality mode, and since the image quality of the reference frame in the reference frame list is high, the quantization residual is small in prediction, thereby reducing the number of coding bits consumed.

In some embodiments, if the target frame is a normal region frame with a number of B frames less than a first predetermined number threshold, the reference frame list may be updated using the neighbor mode. If the target frame is a normal region frame with the number of B frames not less than the second predetermined number threshold, the reference frame list may be updated using the image quality mode. The first preset number threshold and the second preset number threshold may be the same value or different values, and are not limited herein.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 3, the reference frame list updating method in this embodiment highlights the step of acquiring the target frame, the step of classifying the target frame, and the step of updating the reference frame list based on the category of the target frame. Therefore, the scheme described in this embodiment obtains the target frame by performing high downsampling on the video bandwidth and performing equidistant sampling, thereby reducing the computational complexity. The motion vector information is compared with a plurality of conditions related to the motion vector reference threshold value one by one, and the target frames can be classified quickly. If the target frame is a high dynamic region frame or a common region frame with the number of B frames smaller than a preset number threshold, the motion of the target frame is relatively violent. At this time, the reference frame list is updated by using the approach mode, and since the reference frame in the reference frame list is closer to the target frame, the best matching block can be quickly found, thereby reducing the number of coding bits consumed. If the target frame is a static region frame or a normal region frame with the number of B frames not less than the preset number threshold, it indicates that the motion of the target frame is relatively slight. At this time, the reference frame list is updated using the image quality mode, and since the image quality of the reference frame in the reference frame list is high, the quantization residual is small in prediction, thereby reducing the number of coding bits consumed.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a reference frame list updating apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, the reference frame list updating apparatus 500 of the present embodiment may include: a determination module 501, a classification module 502, and an update module 503. Wherein, the determining module 501 is configured to determine a target frame from a video; a classification module 502 configured to perform frame-level classification on the target frame to obtain a category of the target frame; an updating module 503 configured to update the reference frame list based on the updating mode corresponding to the category of the target frame.

In the present embodiment, in the reference frame list updating apparatus 500: the specific processing of the determining module 501, the classifying module 502 and the updating module 503 and the technical effects thereof can refer to the related descriptions of steps 201-203 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the classification module 502 includes: the determining submodule is configured to respectively determine the number of bits consumed by the encoding prediction of the inter-frame prediction and the intra-frame prediction of each block of the target frame under the condition that the target frame is encoded by using the two encoding modes of the inter-frame prediction and the intra-frame prediction; the recording sub-module is configured to select a coding mode corresponding to the smaller one of the coding prediction consumed bit numbers corresponding to the inter-frame prediction and the intra-frame prediction to code the corresponding block in the target frame and record the motion vector of each block of the target frame; and the classification submodule is configured to perform frame level classification on the basis of the motion vector of each block of the target frame to obtain the category of the target frame.

In some optional implementations of this embodiment, the classification sub-module includes: a statistics unit configured to count motion vector information based on a motion vector of each block of the target frame, wherein the motion vector information includes at least one of: the motion vector absolute value accumulation sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value; and the determining unit is configured to compare the motion vector information with a motion vector reference threshold value and determine the category of the target frame.

In some optional implementations of this embodiment, the motion vector information includes: the motion vector absolute value accumulation sum and the motion vector variance; and the determination unit includes: a first determining subunit, configured to determine that the target frame is a high dynamic region frame if a comparison result of the motion vector information and the motion vector reference threshold satisfies a first preset condition, where the first preset condition includes: the value of the motion vector absolute value accumulated sum divided by the block number of the target frame is larger than a motion vector reference threshold, and the motion vector variance is larger than a motion vector reference threshold of a first preset multiple.

In some optional implementations of this embodiment, the motion vector information includes: the motion vector absolute value accumulation sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value; and the determination unit includes: a second determining subunit configured to determine that the target frame is a high dynamic area frame if a comparison result of the motion vector information and the motion vector reference threshold satisfies a second preset condition, where the second preset condition includes: the motion vector variance is smaller than a motion vector reference threshold of a second preset multiple, the motion vector peak value is larger than a motion vector reference threshold of a third preset multiple, and the motion vector standard deviation value is larger than a motion vector reference threshold of a fourth preset multiple.

In some optional implementations of this embodiment, the motion vector information includes: motion vector absolute value accumulation sum; and the recording sub-module is further configured to: recording the inter-frame block ratio of the target frame; the determination unit includes: a third determining subunit, configured to determine that the target frame is a static area frame if the comparison result of the motion vector information and the motion vector reference threshold satisfies a third preset condition and the inter-frame block ratio is smaller than the preset ratio threshold, where the third preset condition includes: and the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is less than a motion vector reference threshold value of a fifth preset multiple.

In some optional implementations of this embodiment, the motion vector information includes: the motion vector absolute value accumulation sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value; and the determination unit includes: a fourth determining subunit configured to determine that the target frame is a normal area frame if the comparison result of the motion vector information and the motion vector reference threshold value does not satisfy the following condition at the same time: the first preset condition is as follows: the value of the motion vector absolute value accumulated sum divided by the block number of the target frame is larger than a motion vector reference threshold, and the motion vector variance is larger than a motion vector reference threshold of a first preset multiple; the second preset condition is as follows: the motion vector variance is smaller than a motion vector reference threshold of a second preset multiple, the motion vector peak value is larger than a motion vector reference threshold of a third preset multiple, the motion vector absolute value accumulated sum is divided by the block number of the target frame, and the value of the motion vector standard deviation is larger than a motion vector reference threshold of a fourth preset multiple; the third preset condition is as follows: the motion vector reference threshold value of which the value of the motion vector absolute value accumulated sum divided by the block number of the target frame is smaller than a fifth preset multiple; a fourth preset condition: and the inter-frame block ratio of the target frame is smaller than a preset ratio threshold. In some optional implementations of this embodiment, the updating module 503 includes: and the first updating sub-module is configured to update the reference frame list by using an adjacent mode if the target frame is a high dynamic area frame or a B frame number is smaller than a first preset number threshold of common area frames, wherein the adjacent mode is a mode of selecting the reference frame from one side adjacent to the target frame.

In some optional implementations of this embodiment, the updating module 503 includes: and a second updating sub-module configured to update the reference frame list using the picture quality mode if the target frame is a static area frame or a normal area frame with the number of B frames not less than a second preset number threshold, wherein the picture quality mode is a mode of sorting the reference frames according to picture quality and selecting the reference frame from the side with high picture quality.

In some optional implementations of this embodiment, the determining module 501 includes: the first sampling submodule is configured to sample the video to obtain a sampled video; and the second sampling submodule is configured to sample the sampling video at preset intervals to obtain a target frame.

There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the reference frame list update method. For example, in some embodiments, the reference frame list update method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the reference frame list update method described above may be performed. Alternatively, in other embodiments, the calculation unit 601 may be configured to perform the reference frame list update method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A reference frame list updating method, comprising:

determining a target frame from the video;

carrying out frame level classification on the target frame to obtain the category of the target frame, wherein the category comprises a high dynamic area frame, a static area frame and a common area frame;

updating a reference frame list based on an updating mode corresponding to the category of the target frame;

wherein the updating the reference frame list based on the update mode corresponding to the category of the target frame includes:

if the target frame is the high dynamic region frame or the B frame number is smaller than the common region frame of a first preset number threshold, updating the reference frame list by using a proximity mode, wherein the proximity mode is a mode of selecting a reference frame from one side adjacent to the target frame;

if the target frame is a static area frame or a common area frame with the number of B frames not less than a second preset number threshold, updating the reference frame list by using a picture quality mode, wherein the picture quality mode is a mode of sorting the reference frames according to picture quality and selecting the reference frames from the side with high picture quality.

2. The method of claim 1, wherein the frame-level classifying the target frame to obtain the class of the target frame comprises:

respectively determining the bit number consumed by the inter-frame prediction and the intra-frame prediction of the coding prediction of each block of the target frame under the condition that the target frame is coded by using the inter-frame prediction and the intra-frame prediction;

selecting a coding mode corresponding to the smaller of the coding pre-estimation consumed bit number corresponding to the inter-frame prediction and the intra-frame prediction to code the corresponding block in the target frame, and recording the motion vector of each block of the target frame;

and carrying out frame level classification based on the motion vector of each block of the target frame to obtain the category of the target frame.

3. The method of claim 2, wherein the performing frame-level classification based on the motion vector of each block of the target frame to obtain the class of the target frame comprises:

counting motion vector information based on a motion vector of each block of the target frame, wherein the motion vector information comprises at least one of: the motion vector absolute value accumulation sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value;

and comparing the motion vector information with a motion vector reference threshold value to determine the category of the target frame.

4. The method of claim 3, wherein the motion vector information comprises: motion vector absolute value accumulation sum and motion vector variance; and

the comparing the motion vector information with a motion vector reference threshold to determine the category of the target frame includes:

if the comparison result of the motion vector information and the motion vector reference threshold value meets a first preset condition, determining that the target frame is a high dynamic region frame, wherein the first preset condition comprises: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is greater than the motion vector reference threshold, and the motion vector variance is greater than the motion vector reference threshold of a first preset multiple.

5. The method of claim 3 or 4, wherein the motion vector information comprises: the motion vector absolute value accumulation sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value; and

if the comparison result of the motion vector information and the motion vector reference threshold value meets a second preset condition, determining that the target frame is a high dynamic region frame, wherein the second preset condition comprises: the motion vector variance is smaller than the motion vector reference threshold of a second preset multiple, the motion vector peak value is larger than the motion vector reference threshold of a third preset multiple, and the motion vector sum of absolute values divided by the block number of the target frame plus the motion vector standard deviation is larger than the motion vector reference threshold of a fourth preset multiple.

6. The method of any of claims 3-5, wherein the motion vector information comprises: motion vector absolute value accumulation sum; and

after the selecting the coding mode corresponding to the smaller of the coding pre-estimation consumed bit number corresponding to the inter-frame prediction and the intra-frame prediction to code the corresponding block in the target frame, the method further comprises the following steps:

recording the inter-frame block proportion of the target frame;

if the comparison result of the motion vector information and the motion vector reference threshold value meets a third preset condition and the inter-frame block ratio is smaller than a preset ratio threshold value, determining that the target frame is a static area frame, wherein the third preset condition comprises: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is smaller than a fifth preset multiple of the motion vector reference threshold.

7. The method of any of claims 3-6, wherein the motion vector information comprises: motion vector absolute value accumulation sum, motion vector standard deviation, motion vector variance and motion vector peak value; and

if the comparison result of the motion vector information and the motion vector reference threshold value does not meet the following conditions at the same time, determining that the target frame is a common region frame:

the first preset condition is as follows: the value of the motion vector absolute value accumulated sum divided by the block number of the target frame is greater than the motion vector reference threshold, and the motion vector variance is greater than the motion vector reference threshold of a first preset multiple;

the second preset condition is as follows: the motion vector variance is smaller than the motion vector reference threshold of a second preset multiple, the motion vector peak value is larger than the motion vector reference threshold of a third preset multiple, the motion vector absolute value accumulated sum is divided by the block number of the target frame, and the value of the motion vector standard deviation is larger than the motion vector reference threshold of a fourth preset multiple;

the third preset condition is as follows: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is smaller than a fifth preset multiple of the motion vector reference threshold;

the fourth preset condition is as follows: and the inter-frame block ratio of the target frame is smaller than a preset ratio threshold.

8. The method of any of claims 1-7, wherein the determining a target frame from a video comprises:

sampling the video to obtain a sampled video;

and sampling the sampling video at preset intervals to obtain the target frame.

9. A reference frame list updating apparatus, comprising:

a determination module configured to determine a target frame from a video;

the classification module is configured to perform frame level classification on the target frame to obtain a class of the target frame, wherein the class comprises a high dynamic region frame, a static region frame and a common region frame;

an updating module configured to update the reference frame list based on an updating mode corresponding to the category of the target frame;

wherein the update module comprises:

a first updating sub-module configured to update the reference frame list using an adjacent mode if the target frame is the high dynamic region frame or the number of B frames is smaller than a first preset number threshold of normal region frames, wherein the adjacent mode is a mode of selecting a reference frame from a side adjacent to the target frame;

a second updating sub-module configured to update the reference frame list using a picture quality mode if the target frame is a static area frame or a normal area frame with the number of B frames not less than a second preset number threshold, wherein the picture quality mode is a mode of sorting the reference frames according to picture quality and selecting the reference frame from a high-picture-quality side.

10. The apparatus of claim 9, wherein the classification module comprises:

the determining submodule is configured to respectively determine the number of bits consumed by the inter-frame prediction and the intra-frame prediction of the coding prediction of each block of the target frame under the condition that the target frame is coded by using the inter-frame prediction coding mode and the intra-frame prediction coding mode;

a recording sub-module configured to select a coding mode corresponding to the smaller of the coding prediction consumed bit numbers corresponding to the inter-frame prediction and the intra-frame prediction to code a corresponding block in the target frame, and record a motion vector of each block of the target frame;

a classification sub-module configured to perform frame-level classification based on the motion vector of each block of the target frame, resulting in a class of the target frame.

11. The apparatus of claim 10, wherein the classification submodule comprises:

a statistics unit configured to count motion vector information based on a motion vector of each block of the target frame, wherein the motion vector information includes at least one of: motion vector absolute value accumulation sum, motion vector standard deviation, motion vector variance and motion vector peak value;

a determining unit configured to compare the motion vector information with a motion vector reference threshold value, and determine a category of the target frame.

12. The apparatus of claim 11, wherein the motion vector information comprises: the motion vector absolute value accumulation sum and the motion vector variance; and

the determination unit includes:

a first determining subunit configured to determine that the target frame is a high dynamic area frame if a comparison result of the motion vector information and a motion vector reference threshold satisfies a first preset condition, where the first preset condition includes: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is greater than the motion vector reference threshold, and the motion vector variance is greater than the motion vector reference threshold of a first preset multiple.

13. The apparatus of claim 11 or 12, wherein the motion vector information comprises: the motion vector absolute value accumulation sum, the motion vector standard deviation, the motion vector variance and the motion vector peak value; and

the determination unit includes:

a second determining subunit configured to determine that the target frame is a high dynamic area frame if a comparison result of the motion vector information and a motion vector reference threshold satisfies a second preset condition, where the second preset condition includes: the motion vector variance is smaller than the motion vector reference threshold of a second preset multiple, the motion vector peak value is larger than the motion vector reference threshold of a third preset multiple, and the motion vector sum of absolute values divided by the block number of the target frame plus the motion vector standard deviation is larger than the motion vector reference threshold of a fourth preset multiple.

14. The apparatus of any of claims 11-13, wherein the motion vector information comprises: motion vector absolute value accumulation sum; and

the recording sub-module is further configured to:

recording the inter-frame block proportion of the target frame;

the determination unit includes:

a third determining subunit, configured to determine that the target frame is a static area frame if a comparison result of the motion vector information and a motion vector reference threshold satisfies a third preset condition and the inter-frame block ratio is smaller than a preset ratio threshold, where the third preset condition includes: the value of the motion vector absolute value accumulated sum divided by the number of blocks of the target frame is smaller than a fifth preset multiple of the motion vector reference threshold.

15. The apparatus of any of claims 11-14, wherein the motion vector information comprises: motion vector absolute value accumulation sum, motion vector standard deviation, motion vector variance and motion vector peak value; and

the determination unit includes:

a fourth determining subunit configured to determine that the target frame is a normal area frame if the comparison result of the motion vector information and the motion vector reference threshold does not satisfy the following condition at the same time:

the second preset condition is as follows: the motion vector variance is smaller than the motion vector reference threshold of a second preset multiple, the motion vector peak value is larger than the motion vector reference threshold of a third preset multiple, the motion vector absolute value sum is divided by the block number of the target frame and the value of the motion vector standard deviation is larger than the motion vector reference threshold of a fourth preset multiple;

a fourth preset condition: and the inter-frame block ratio of the target frame is smaller than a preset ratio threshold.

16. The apparatus of any of claims 9-15, wherein the means for determining comprises:

the first sampling submodule is configured to sample the video to obtain a sampled video;

and the second sampling submodule is configured to sample the sampling video at preset intervals to obtain the target frame.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.