CN112291563A

CN112291563A - Video coding method, video coding equipment and computer readable storage medium

Info

Publication number: CN112291563A
Application number: CN202011136915.7A
Authority: CN
Inventors: 刘俊彦; 王�琦; 潘兴浩; 李康敬
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-01-29

Abstract

The invention discloses a video coding method, video coding equipment and a computer readable storage medium, relates to the technical field of video processing, and aims to solve the problem of large computation load of the existing video coding technology. The method comprises the following steps: identifying target information of a target object in a current frame image in an acquired video image; determining the importance of the current frame image based on the target information of the target object; determining the target code rate of the current frame image based on the importance and the residual code stream capacity; and encoding the video image by using the target code rate. The embodiment of the invention only needs to calculate the importance of the current frame image in the video image and count the residual code stream capacity, and determines the target code rate of the current frame image based on the importance and the residual code stream capacity of the current frame image, thereby not only reducing the operation amount, but also having novel coding mode.

Description

Video coding method, video coding equipment and computer readable storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a video encoding method, a video encoding device, and a computer-readable storage medium.

Background

At present, the video coding technology mainly includes a motion residual coding method based on motion vectors, and the like, and the mode is to calculate the coded data of the current video frame to be coded by calculating the motion residual between a target motion vector and a corresponding predicted motion vector. However, this scheme requires not only calculation of the motion vector of each pixel in each frame of image, but also calculation of the motion vector of each frame of image based on the motion vector of the previous frame of image, and thus the calculation amount of the motion vector is very large at both the encoding block level and the pixel level.

Therefore, the problem of large operation amount exists in the existing video coding technology.

Disclosure of Invention

Embodiments of the present invention provide a video encoding method, a video encoding device, and a computer-readable storage medium, so as to solve the problem of a large amount of computation in the existing video encoding technology.

In a first aspect, an embodiment of the present invention provides a video encoding method, including:

identifying target information of a target object in a current frame image in an acquired video image;

determining the importance of the current frame image based on the target information of the target object;

determining the target code rate of the current frame image based on the importance and the residual code stream capacity;

and encoding the video image by using the target code rate.

Optionally, the identifying target information of the target object in the current frame image includes:

identifying n types of target objects in the current frame image, wherein n is an integer larger than 0;

and determining target information of each type of target object, wherein the target information comprises at least one of quantity information, priority and pixel proportion information of the target objects.

Optionally, the target information includes a priority of the target object; the determining of the target information of each type of target object comprises:

determining the priority of each type of target object based on the preset priority corresponding to each type of target object; or

Acquiring first information, and calculating the priority of each type of target object based on the first information; the first information comprises the number of each type of target object in the current frame image, the total number of the n types of target objects in the current frame image, the total frame number of the video image and the frame number of each type of target object in the video image.

Optionally, the determining the importance of the current frame image based on the target information of the target object includes:

calculating the importance of each type of target object based on the target information of each type of target object;

and determining the sum of the importance degrees of the n types of target objects as the importance degree of the current frame image.

Optionally, the target information includes quantity information, priority, pixel ratio information, and reliability information of the target object.

Optionally, of the i-th class objectThe number of m_i，m_iIs a positive integer;

the reliability information of the ith type target object comprises m in the current frame image_iThe credibility of the ith type target object;

the pixel proportion information of the ith type target object comprises m in the current frame image_iPixel proportion of the ith type target object;

the ith type target object is any one of the n type target objects, and i is a positive integer less than or equal to n.

Optionally, the determining the target code rate of the current frame image based on the importance and the residual code stream capacity includes:

determining the residual code stream capacity based on the total code stream capacity of the video images and the code rate distributed by the previous s-1 frame image in the video images, wherein the current frame image is the s-th frame image in the video images;

and determining the target code rate of the current frame image based on the residual frame number, the importance and the residual code stream capacity of the video image, wherein the residual frame number is equal to the total frame number of the video image minus the frame number remaining after the first s-1 frame.

Optionally, the determining the target code rate of the current frame image based on the remaining frame number, the importance, and the remaining code stream capacity of the video image includes:

determining a code rate distribution coefficient based on the proportion of the residual code stream capacity to the total code stream capacity and the residual frame number of the video image;

and determining the target code rate of the current frame image based on the code rate distribution coefficient, the importance and the residual code stream capacity.

Optionally, the determining the remaining code stream capacity based on the total code stream capacity of the video image and the code rate allocated to the previous s-1 frame image in the video image includes:

and determining the residual code stream capacity based on the total code stream capacity and the target compression ratio of the video image and the code rate distributed by the previous s-1 frame image in the video image.

Optionally, the encoding the video image with the target bitrate includes:

respectively utilizing the target code rate of each frame of image in the video image to encode the corresponding frame of image; or

Grouping the video images according to a preset frame number to obtain p groups of video images, wherein p is an integer greater than 1;

determining the average code rate of each group of video images based on the target code rate of each frame of image in each group of video images;

and respectively coding each group of video images by using the average code rate of each group of video images.

In a second aspect, an embodiment of the present invention further provides a video encoding apparatus, including: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor, for reading the program in the memory, implements the steps in the video processing method according to any of the previous claims.

In a third aspect, the present invention further provides a computer-readable storage medium for storing a computer program, where the computer program, when executed by a processor, implements the steps in the video encoding method according to any one of the foregoing embodiments.

In the embodiment of the invention, for a current frame image in an acquired video image, identifying target information of a target object in the current frame image; determining the importance of the current frame image based on the target information of the target object; determining the target code rate of the current frame image based on the importance and the residual code stream capacity; and encoding the video image by using the target code rate. Therefore, the embodiment of the invention only needs to calculate the importance of the current frame image in the video image and count the residual code stream capacity, and determines the target code rate of the current frame image based on the importance and the residual code stream capacity of the current frame image without calculating the motion vector of each pixel point in each frame image and depending on the motion vector of the previous frame image and the next frame image, thereby not only reducing the operation amount compared with the prior art, but also having a novel coding mode.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flowchart of a video encoding method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the output of target object detection information in a current frame image according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a relationship between a priority effect limiting coefficient and a number of target objects according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a calculated target bitrate for each frame of image according to an embodiment of the present invention;

fig. 5 is a block diagram of a video encoding apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of a video encoding apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a video encoding method according to an embodiment of the present invention, as shown in fig. 1, including the following steps:

step 101, identifying target information of a target object in a current frame image in an acquired video image.

In the embodiment of the present invention, the video image may be any video stream that needs to be encoded, such as a live video stream of a sports event, a video stream of a movie and television class, and the like. The current frame image may refer to any acquired frame image.

For each acquired frame of image, the target information of the target object contained therein may be identified, for example, the target object detection may be performed on each frame of image through an image segmentation and detection algorithm, so as to obtain the detection information of the target object in each frame of image.

The target object may be a preset type of object, the target information may be preset information, and specifically, which type of object is a target object to be detected and which information is object information to be detected may be preset according to an actual need in an application scene. For example, for a video scene of a football match, the target object may be an object such as a football player, a football, a goal, a referee, etc., and the target information may be information that needs to be referred to or used to determine the importance of the image, such as a category, a number, a pixel ratio, a reliability, etc.

And 102, determining the importance of the current frame image based on the target information of the target object.

The importance may be an index parameter for indicating the importance or complexity of each frame of image, and a higher importance indicates that the frame of image contains a target object that is more concerned by the user, and generally indicates that the frame of image is more wonderful.

In the embodiment of the present invention, the importance of the current frame image may be determined based on the target information of the target object in the identified current frame image, for example, a certain frame of soccer video image includes a soccer player, a soccer ball and a goal, and it may be determined that the frame image has a higher importance, or a certain frame of soccer video image includes more audiences, and it may be determined that the importance of the frame image is not high.

And the importance of the current frame image can be calculated specifically according to a certain formula by using the target information of the target object, taking the target information including the number, priority and pixel proportion of the target object as an example, for the current frame image, the sum of the products of the number, priority and pixel proportion of each type of target object in the frame image can be used as the importance of the frame image, or different weight coefficients can be used for different target information first, and then the importance of the frame image can be calculated by using the method. Of course, different calculation manners may be used according to actual situations, and the embodiment of the present invention is not particularly limited.

When the current frame image includes a plurality of target objects of different categories, each category of target object may be identified, and target information of each category of target object may be determined, where the target information may include at least one of quantity information, priority, and pixel proportion information of the target object, where the quantity information may indicate the number of times that each category of target object appears in the current frame image; the priority may represent the importance of each type of target object in the video image, for example, for a video scene of a tennis match, multiple types of target objects such as tennis players, rackets, nets, spectators, officials, etc. may be defined, and the priority order of the multiple types of target objects may be tennis player > racquet > net > officials > spectators, i.e. tennis player is the most important target object in the scene; the pixel proportion information may indicate a proportion of pixels of each type of target object to total pixels of the current frame image, for example, for a close-up shot of a certain player, the pixel proportion of the player in the shot may be larger, and for a far shot in a game field, the pixel proportion of a single player in the shot may be smaller.

Similarly, n types of target objects in each frame of image and target information of each type of target object can be identified through an image segmentation and detection algorithm, specifically, several types of object classes to be referred to in an application scene can be preset and used as the class number of a training image segmentation and detection model, and then an image segmentation and detection model capable of outputting the target information of the target object in the image can be obtained by collecting a large number of images marked with the target object and the target information of the target object as training data and training the data by using the preset image segmentation and detection algorithm. To ensure the recognition accuracy, more sophisticated image segmentation and detection algorithms in deep learning, such as Mask-Region convolutional neural network (Mask Region CNN, Mask-RCNN) and yolcat + +, etc., may be used.

Taking yolcat + + image segmentation and detection algorithm as an example, after a current frame image passes through the algorithm model, a list of [ [ index, label, bbox (x, y, w, h), mask, confidence ], ] is basically output, where sub-lists in the list respectively represent attribute results output by a target object passing through the model, index is an index of a detected target object, such as a number, label is a category of the target object, bbox is a coordinate of the target object in the current frame image, where x and y represent center point coordinates of a rectangular region in which the target object is located, w and h represent a width and a height of the rectangular region in which the target object is located, mask is mask information of the target object in the current frame image, and confidence is confidence that the target object is detected as the confidence of the category. For example, in a video scene of a tennis match, the output of the model may be as shown in fig. 2, and the identified target objects, such as identified players and tennis rackets, may be marked in the video frame image 20, and the object category names and credibility may be marked.

It should be noted that, the quantity information of the target objects may be determined based on the category of each sub-list in the output, and for example, the quantity of each type of target objects may be obtained by counting the quantity of target objects with the same category; the pixel proportion information of the target object can be determined based on mask information of the target object in the current frame image, for example, the pixel proportion information of the target object can be obtained by determining the number of pixel points occupied by the target object according to the mask information and dividing the pixel proportion information by the total number of pixel points of the current frame image, and the pixel proportion of each type of target object can also be directly output by adjusting the model; the priority of the target object may be directly output without passing through a model, and specifically, the priority may be determined by the preset priority corresponding to each type of target object, or may be calculated by other methods.

In this way, the importance degree of the current frame image may be determined with reference to at least one of the number information, the priority, and the pixel proportion information of each type of target object in the current frame image, for example, the greater the number of a certain type of target object, the higher the priority, or the higher the pixel proportion, the higher the importance degree of the current frame image.

Further, the target information includes a priority of the target object; the determining of the target information of each type of target object comprises:

That is, in this embodiment, the target information includes the priority of the target object, but may also include other information. In this way, when determining the target information of each type of target object, the priority of each type of target object needs to be determined, and specifically, there may be two different determination manners, one of which may be to determine the priority of each type of target object in the current frame image according to the preset priority by presetting the priority corresponding to each type of target object that needs to be referred to in the current scene. For example, the priority of the k-class target object is set to (comp)₁,comp₂,……,comp_k) Then, the priority of each type of target object can be directly obtained according to the type of each type of target object in the current frame image.

The second method may be that the priority of each type of target object is determined based on the number of each type of target object in the current frame image, the total number of the n types of target objects in the current frame image, the total frame number of the video image, and the frame number of each type of target object included in the video image, for example, for an ith type of target object, the priority of the ith type of target object may be calculated by the number of the ith type of target object in the current frame image, the total frame number of the n types of target objects in the current frame image, the total frame number of the video image, and the frame number of the ith type of target object included in the video image, where the ith type of target object is any one of the n types of target objects.

Specifically, the priority of each type of target object may be calculated by a specific formula, for example, the priority calculation formula may be

Therein, comp_iIndicating the priority, N, of the i-th class target object in the current frame image_iRepresenting the number of ith class target objects in the current frame image, N representing the total number of all class target objects in the current frame image, T representing the total frame number of the video image, T_iIndicating the number of frames, T, containing the i-th class target object in said video image_i+1 is to prevent the denominator from being 0. In this way, the more a certain type of target object appears in the current frame image, the higher the priority of the certain type of target object in the current frame image, and the more the number of frames of the video image containing the certain type of target object, the lower the priority of the certain type of target object in the current frame image, but the influence of the number of the certain type of target object appearing in the current frame image on the priority is larger than the influence of the number of frames of the video image containing the certain type of target object on the priority.

That is to say, in this manner, the priority of each type of target object may be determined after each frame of image in the video image is identified, and after each frame of image in the video image is identified, the number of frames in the video image that respectively include each type of target object may be counted for use in calculating the priorities of the various types of target objects.

Therefore, the priority corresponding to various target objects can be preset to help quickly determine the priority of each type of target object in each frame of image, or the priority of each type of target object can be calculated by counting the information of various target objects in the current video scene, so that the preset priority is not needed, and the calculated priority can be ensured to be more consistent with the priority judgment in the real scene.

Further, the step 102 includes:

Specifically, the importance of each type of target object is calculated based on the target information of each type of target object, and then the importance of each type of target object in the n types of target objects is summed to obtain the importance of the current frame image.

That is, in one embodiment, the target information of each type of target object may include quantity information, priority, pixel proportion information, and reliability information of the type of target object.

Wherein the number of the ith type target objects is m_i，m_iIs a positive integer;

the pixel proportion information of the ith type target object comprises the pixel proportion information in the current frame imageM of_iPixel proportion of the ith type target object;

That is, m is included in the current frame image_iWhen the ith type target object exists, the reliability information of the ith type target object comprises the reliability of each ith type target object, and the pixel proportion information of the ith type target object comprises the pixel proportion of each ith type target object.

Therefore, the importance of each type of target object can be correspondingly calculated based on the quantity information, the priority, the reliability information and the pixel proportion information of each type of target object, and then the importance of each type of target object in the n types of target objects is summed to obtain the importance of the current frame image.

Specifically, the importance of the current frame image may be calculated by an importance calculation formula, for example, the importance calculation formula may be

Wherein, the complexity_sRepresents the importance of the s-th frame image, m_iIndicates the number of i-th class target objects in the s-th frame image, comp_iIndicating the priority, confidence, of the i-th class of object_jRepresenting the credibility of the jth type target object in the s frame image,

and the pixel ratio of the jth ith type target object in the s frame image is represented. That is, for each ith class target object, m can be relied upon separately_iThe credibility and the pixel ratio of the ith type target object calculate m of the ith type target object_iImportance part corresponding to each credibility and pixel ratio

And calculating the number of the ith type target objects and the importance part corresponding to the priority according to the priority and the number of the ith type target objects

And multiplying the two to obtain the importance of the ith type of target object, then sequentially calculating the importance of each type of target object, and finally summing the importance of the n types of target objects to obtain the importance of the s frame of image.

Wherein,

can be understood as the definition coefficient of the priority effect, the maximum value of which does not exceed 2, the definition coefficient of the priority effect and the number m_iThe relationship (c) can be as shown in fig. 3, that is, the target objects of any category in the current frame image can generate at most two times of effect based on their respective priorities, so that it can be avoided that the calculated importance of a certain frame image is high due to a large number of target objects of a certain category, and a result inconsistent with the actual result is obtained, for example, a certain frame image includes a large number of viewers, but the importance of the frame image should not be high.

And as can be known from the above importance calculation formula, the importance in each frame of image has a positive correlation with the priority and the number of the n types of target objects in the frame of image, the reliability of each target object and the pixel proportion, that is, the higher the priority is, the greater the number is, the greater the reliability is, the higher the pixel proportion is, the higher the calculated importance is. Therefore, when a certain frame image contains target class target objects with higher priority, the more the number of the target class target objects is, the higher the reliability is, and the larger the pixel proportion is, the higher the importance of the frame image is.

Therefore, the importance of the current frame image is calculated based on the quantity information, the priority, the reliability information and the pixel proportion information of each type of target object in the current frame image, and the calculated importance can be guaranteed to be more in line with reality and more reliable.

And 103, determining the target code rate of the current frame image based on the importance and the residual code stream capacity.

The residual code stream capacity may be a code stream capacity remaining after removing, from the total code stream capacity, the code stream capacity allocated to all frame images before the current frame image, for example, the total code stream capacity is C, and the currently allocated code stream capacity is C₁The residual code stream capacity is C-C₁。

In the embodiment of the invention, the target code rate of each frame of image can be determined based on the determined importance of each frame of image and the current residual code stream capacity, specifically, the code rate with higher proportion in the residual code stream capacity can be allocated to the video frame image with higher importance, and the code rate with lower proportion in the residual code stream capacity can be allocated to the video frame image with lower importance, so as to ensure that the video frame image with higher importance has higher code rate, and further enable a user to obtain better viewing experience, or the whole capacity of video stream transmission can be reduced on the premise of not changing the whole viewing effect (the code rate of the important video frame image is higher), the transmission efficiency of the video stream is improved, and high-definition video can be effectively transmitted under the condition of speed limit or low-speed bandwidth.

More specifically, the specific target code rate of the current frame image may be calculated based on the importance of the current frame image and the current residual code stream capacity. For example, the allocation coefficient may be determined based on the importance of the current frame image and the current residual code stream capacity, and then the residual code stream capacity is multiplied by the allocation coefficient to obtain the target code rate of the current frame image, where the higher the importance is, or the more the residual code stream capacity is, the higher the allocation coefficient may be. Of course, according to this idea, other calculation methods may be used to determine the target bitrate of each frame of image, and the embodiment of the present invention is not limited in particular.

Optionally, the step 103 includes:

That is, assuming that the current frame image is the s-th frame image, the current residual code stream capacity can be determined according to the total code stream capacity of the video image and the code rate allocated to the previous s-1 frame image in the video image, for example, the total code stream capacity is code_allThe code rate of the first frame image is precoding_lThe code rate allocated to the first s-1 frame image is

Residual code stream capacity of

In this embodiment, the target bitrate of the current frame image may be determined based on the remaining frame number of the video image, the importance and the remaining code stream capacity, where the remaining frame number is equal to the total frame number of the video image minus the frame number remaining after the previous s-1 frame, for example, if the total frame number is T, the current remaining frame number is T-s + 1. Specifically, the code rate allocation proportion of the current frame image may be determined based on the current remaining frame number, the current remaining code stream capacity and the importance of the current frame image, and then the target code rate of the current frame image may be obtained by multiplying the current remaining code stream capacity by the code rate allocation proportion

complexity_sIs important for the s frame imageAnd (4) degree.

Therefore, the method can ensure that the code rate of each frame of image is reasonably distributed according to the importance, the residual code stream capacity and the residual frame number of each frame of image, and can distribute a slightly higher code rate to the video frame image with higher importance on the basis of approximate average distribution.

Wherein, the determining the target bitrate of the current frame image based on the remaining frame number of the video image, the importance and the remaining code stream capacity may include:

That is, in this embodiment, the rate allocation coefficient of the current frame image may be determined based on the ratio of the current residual code stream capacity to the total code stream capacity and the residual frame number of the video image, for example, the rate allocation coefficient may be equal to the ratio divided by the residual frame number, or may be according to a formula

The code rate allocation coefficient is calculated,

allocating coefficient, code, to code rate of the s frame image_allIn order to be the total code stream capacity,

the allocated code stream capacity for the previous s-1 frame image, so the value of the code rate allocation coefficient is between 0 and 1, and the more the remaining code stream capacity is, the larger the code rate allocation coefficient is, the more the remaining code stream capacity is, and the smaller the code rate allocation coefficient is, that is, the allocation principle of more surplus and more use, and less surplus and less use is used for the code rate of each frame image.

Then, the target code rate of the current frame image may be determined based on the code rate allocation coefficient, the importance, and the residual code stream capacity, and specifically, the product of the code rate allocation coefficient, the importance index corresponding to the importance, and the residual code stream capacity may be determined as the target code rate of the current frame image, where the importance index may be equal to the code rate allocation coefficient, the importance index corresponding to the importance, and the residual code stream capacity

Or

complexity_sFor the importance of the s-th frame, 2 ∑_i comp_iIs the sum of the priorities of various types of target objects included in the video image.

Thus, the target code rate of each frame of image can be calculated by using a code rate calculation formula, for example, the target code rate of each frame of image can be calculated by the formula

Calculating target code rate precoding of s frame image_s。

By the implementation mode, the code rate of each frame of image can be reasonably distributed according to the importance and the residual condition, the video frame image with high importance can obtain higher code rate, and the film watching effect of a user is further ensured.

That is, in this embodiment, in the case where the target compression ratio is set, the actually available total code stream capacity may be equal to the total code stream capacity of the video image multiplied by the target compression ratio, and the remaining code stream capacity is equal to the total code stream capacity of the video image multiplied by the target compression ratio, and then the remaining code stream capacity is subtractedAnd removing the code stream capacity allocated by the previous s-1 frame image. For example, when the target compression ratio is μ, the residual code stream capacity is

Thus, the video image can carry out code rate distribution on each frame of image according to the preset target compression ratio and the average code rate of all the distributed frame images is equal to the original inherent code rate multiplied by the target compression ratio.

For example, according to the method in the embodiment of the present invention, with a compression ratio of 0.8, on the basis of the original fixed code rate of 1.0, code rate allocation is performed on each frame of image in a video image, and a target code rate of each frame of image is obtained as shown in fig. 4 (black part), where the average code rate of all video frame images is 0.8 of the original code rate.

And step 104, encoding the video image by using the target code rate.

After the target code rate of each frame of image is determined, the video image can be encoded by using the target code rate of each frame of image, and a plurality of different encoding modes can be provided, which are specifically described below.

Optionally, the step 104 includes:

In other words, in one encoding method, the target code rate of each frame of image in the video image may be used to encode the corresponding frame of image, that is, the target code rate of the current frame of image determined currently may be used to encode the current frame of image in real time. Therefore, each frame of image in the video image can be encoded by different code rates according to the importance degree of the image, and the important video frame image can be ensured to have a clearer presentation effect.

In another encoding mode, in order to reduce resource overhead of device encoding and decoding, the video images may be divided into segments or blocks, and then encoded on the basis of the video segments or blocks, that is, the video images may be grouped according to a preset number of frames, for example, every 30 frames or every 50 frames of images are divided into a group, which is divided into p groups of video images, p may be equal to the total number of frames of the video images divided by the preset number of frames, when the total number of frames cannot be divided exactly, p may be equal to the quotient of the total number of frames divided by the preset number of frames plus 1, that is, the last remaining frames of images are divided into 1 group; then, calculating the average code rate of each group of video images according to the target code rate of each frame of image in each group of video images, wherein the average code rate of each group of video images is equal to the sum of the target code rates of all the frame images in the group divided by the total number of the group of video images; and finally, encoding the group of video images by using the average code rate of each group of video images respectively. Therefore, the multi-frame images can be coded at one time by using a group of average code rates, and when the multi-frame images are decoded, the multi-frame images in the group of video images can be quickly decoded by using the average code rate of each group, so that the resource overhead brought by coding and decoding can be reduced on the basis of ensuring a better viewing effect.

The video coding scheme in the embodiment of the invention can be applied to video scenes of sports events, in the video scene of a football match, if the video pictures contain objects of various types such as football players, football, goal and the like, the frame of pictures is more important, and the priority of the objects of various types in the pictures is different, for example, the object priority is set as football player > football > goal; in a scene of a tennis match, if an object such as a tennis player, a racket, a net, a spectator, a referee, etc. is defined as an important object to be considered, the object priority is set to tennis player > racket > net > referee > spectator.

If similar scenes appear in a video of a tennis match as follows:

1) a large-proportion athlete is detected in the frame image and is equivalent to a short-distance snap shot athlete;

2) a plurality of athletes are detected in the frame image, which is equivalent to shooting the whole court at a long distance;

3) detecting a viewer in the frame image;

4) the referee is detected in the frame image.

According to the scheme of the setting and the embodiment of the invention, the frame importance ranking sequence of the four scenes can be calculated to be basically 1>2>4> 3; under the condition that the available residual code stream capacity is sufficient, the sequencing of the frame code rates is basically consistent with the sequencing sequence of the frame importance degrees. Of course, in an actual scene, the calculated code rate of the frame to be encoded fluctuates due to differences in the number of categories of the target object, the target object detected in the frame image, the object mask size detected, and the available residual code stream capacity.

In addition, the coding mode provided in the embodiment of the invention does not change the size of the video image, but defines a common image importance which can be intuitively perceived by the public, and provides a method for calculating the image importance. The code rate of frame coding or video clip coding can be controlled through the coding calculation method provided by the embodiment of the invention, and under the condition that the target compression ratio is set, the whole capacity of video stream transmission can be reduced on the premise of not changing the whole film watching effect.

Compared with the prior art, the embodiment of the invention has the following characteristics: the image size is not changed, and only the code rate is changed; the real-time performance can be realized by directly processing and calculating each frame of image without storing the video image of the previous frame or the next frame, wherein the residual code stream capacity is only based on the statistical result of each frame of code rate, and the space complexity is 1 on the calculation overhead; the priority of the object in the image is used, and a method for calculating the importance of the image is also defined, which is more consistent with the judgment of the importance of the image in a real scene, and the concept of the object priority has a general rule in videos of moving scenes or other scenes, and is a mode accepted by audiences; according to the embodiment of the invention, the frame image can be processed by using the image segmentation and detection model, and the calculated amount can be greatly reduced relative to the calculated amount of the whole pixel level of the image.

The video coding method of the embodiment of the invention identifies the target information of a target object in a current frame image in an acquired video image; determining the importance of the current frame image based on the target information of the target object; determining the target code rate of the current frame image based on the importance and the residual code stream capacity; and encoding the video image by using the target code rate. Therefore, the embodiment of the invention only needs to calculate the importance of the current frame image in the video image and count the residual code stream capacity, and determines the target code rate of the current frame image based on the importance and the residual code stream capacity of the current frame image without calculating the motion vector of each pixel point in each frame image and depending on the motion vector of the previous frame image and the next frame image, thereby not only reducing the operation amount compared with the prior art, but also having a novel coding mode.

The embodiment of the invention also provides a video coding device. Referring to fig. 5, fig. 5 is a block diagram of a video encoding apparatus according to an embodiment of the present invention. Since the principle of the video encoding apparatus for solving the problem is similar to the video encoding method in the embodiment of the present invention, the implementation of the video encoding apparatus can refer to the implementation of the method, and repeated details are not repeated.

As shown in fig. 5, the video encoding apparatus 500 includes:

the identifying module 501 is configured to identify, for a current frame image in an acquired video image, target information of a target object in the current frame image;

a first determining module 502, configured to determine the importance of the current frame image based on target information of the target object;

a second determining module 503, configured to determine a target code rate of the current frame image based on the importance and the residual code stream capacity;

an encoding module 504, configured to encode the video image with the target bitrate.

Optionally, the identifying module 501 includes:

the identification unit is used for identifying n types of target objects in the current frame image, wherein n is an integer larger than 0;

a first determination unit configured to determine target information of each type of target object, the target information including at least one of number information, priority, and pixel proportion information of the target objects.

Optionally, the target information includes a priority of the target object; the first determining unit is used for determining the priority of each type of target object based on the preset priority corresponding to each type of target object; or

The first determining unit is used for acquiring first information and calculating the priority of each type of target object based on the first information; the first information comprises the number of each type of target object in the current frame image, the total number of the n types of target objects in the current frame image, the total frame number of the video image and the frame number of each type of target object in the video image.

Optionally, the first determining module 502 includes:

a calculating unit for calculating the importance of each type of target object based on the target information of each type of target object;

and the second determining unit is used for determining the sum of the importance degrees of the n types of target objects as the importance degree of the current frame image.

Optionally, the number of the ith type target object is m_i，m_iIs a positive integer;

the reliability information of the ith type target object comprises m in the current frame image_iOf an i-th class target objectReliability;

Optionally, the second determining module 503 includes:

a third determining unit, configured to determine a residual code stream capacity based on a total code stream capacity of the video image and a code rate allocated to a previous s-1 frame image in the video image, where the current frame image is an s-th frame image in the video image;

and a fourth determining unit, configured to determine the target code rate of the current frame image based on the remaining frame number of the video image, the importance, and the remaining code stream capacity, where the remaining frame number is equal to a total frame number of the video image minus a frame number remaining after the first s-1 frame.

Optionally, the fourth determining unit includes:

the first determining subunit is used for determining a code rate distribution coefficient based on the proportion of the residual code stream capacity to the total code stream capacity and the residual frame number of the video image;

and the second determining subunit is configured to determine the target code rate of the current frame image based on the code rate allocation coefficient, the importance, and the residual code stream capacity.

Optionally, the third determining unit is configured to determine a remaining code stream capacity based on the total code stream capacity of the video image, the target compression ratio, and the code rate allocated to the first s-1 frame image in the video image.

Optionally, the encoding module 504 is configured to encode the corresponding frame image by using the target code rate of each frame image in the video image; or

The encoding module 504 includes:

the grouping unit is used for grouping the video images according to a preset frame number to obtain p groups of video images, wherein p is an integer larger than 1;

the fifth determining unit is used for determining the average code rate of each group of video images based on the target code rate of each frame of image in each group of video images;

and the coding unit is used for coding each group of video images by respectively utilizing the average code rate of each group of video images.

The video encoding apparatus provided in the embodiment of the present invention may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.

The video encoding device 500 of the embodiment of the present invention identifies, for a current frame image in an acquired video image, target information of a target object in the current frame image; determining the importance of the current frame image based on the target information of the target object; determining the target code rate of the current frame image based on the importance and the residual code stream capacity; and encoding the video image by using the target code rate. Therefore, the embodiment of the invention only needs to calculate the importance of the current frame image in the video image and count the residual code stream capacity, and determines the target code rate of the current frame image based on the importance and the residual code stream capacity of the current frame image without calculating the motion vector of each pixel point in each frame image and depending on the motion vector of the previous frame image and the next frame image, thereby not only reducing the operation amount compared with the prior art, but also having a novel coding mode.

The embodiment of the invention also provides video coding equipment. Since the principle of the video encoding device for solving the problem is similar to the video encoding method in the embodiment of the present invention, the implementation of the video encoding device may refer to the implementation of the method, and repeated details are not repeated. As shown in fig. 6, the video encoding apparatus according to the embodiment of the present invention includes: the processor 600, which is used to read the program in the memory 620, executes the following processes:

and encoding the video image by using the target code rate.

Where in fig. 6, the bus architecture may include any number of interconnected buses and bridges, with various circuits being linked together, particularly one or more processors represented by processor 600 and memory represented by memory 620. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The processor 600 is responsible for managing the bus architecture and general processing, and the memory 620 may store data used by the processor 600 in performing operations.

Optionally, the processor 600 is further configured to read the computer program and execute the following steps:

Optionally, the target information includes a priority of the target object; the processor 600 is further adapted to read the computer program and perform the following steps:

The video encoding device provided in the embodiment of the present invention may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.

Furthermore, a computer-readable storage medium of an embodiment of the present invention stores a computer program executable by a processor to implement:

and encoding the video image by using the target code rate.

Alternatively, the computer program may be executable by a processor to perform the steps of:

Optionally, the target information includes a priority of the target object; the computer program is executable by a processor to implement the steps of:

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A video encoding method, comprising:

and encoding the video image by using the target code rate.

2. The method of claim 1, wherein the identifying target information of the target object in the current frame image comprises:

3. The method of claim 2, wherein the target information includes a priority of a target object; the determining of the target information of each type of target object comprises:

4. The method of claim 2, wherein the determining the importance of the current frame image based on the target information of the target object comprises:

5. The method of claim 4, wherein the target information comprises quantity information, priority, pixel proportion information, and reliability information of the target object.

6. The method of claim 5, wherein the number of i-th class target objects is m_i，m_iIs a positive integer;

7. The method of claim 1, wherein the determining the target code rate of the current frame image based on the importance and the residual code stream capacity comprises:

8. The method of claim 7, wherein the determining the target bitrate of the current frame image based on the remaining frame number of the video image, the importance and the remaining code stream capacity comprises:

9. The method of claim 7, wherein determining the residual code stream capacity based on the total code stream capacity of the video pictures and the code rate allocated to the first s-1 frame of pictures in the video pictures comprises:

10. The method of claim 1, wherein the encoding the video image with the target code rate comprises:

11. A video encoding device comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor, reading a program in the memory, implements the steps in the video coding method according to any of claims 1 to 10.

12. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the video encoding method of any of claims 1 to 10.