CN117750030A

CN117750030A - Video coding method, device, equipment and storage medium

Info

Publication number: CN117750030A
Application number: CN202310094141.3A
Authority: CN
Inventors: 樊星星
Original assignee: Xiaohongshu Technology Co ltd
Current assignee: Xiaohongshu Technology Co ltd
Priority date: 2023-02-07
Filing date: 2023-02-07
Publication date: 2024-03-22

Abstract

The embodiment of the application discloses a video coding method, a video coding device, video coding equipment and a storage medium. The method comprises the following steps: decoding the target video to obtain multi-frame images; determining the value of GOP of the multi-frame images based on the similarity between frames in the multi-frame images; dividing the multi-frame image into a plurality of GOPs based on the determined GOP values of the multi-frame image, and adjusting the coding sequence of the images contained in each GOP; and coding the image with the coding sequence adjusted to obtain the coded target video. By adopting the embodiment of the invention, the data volume of the video can be reduced under the condition of not reducing the video quality.

Description

Video coding method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to a video encoding method, apparatus, device, and storage medium.

Background

At present, unlike conventional videos, the video is made of or converted from a plurality of electronic images, the video contains a small number of image frames, and meanwhile, the video has high requirements on image quality, such as album video. Album videos refer to a type of electronic album that presents image content in the form of videos. When a user uses a plurality of electronic images to manufacture album videos, a certain sequence is arranged among the plurality of electronic images, but the preset sequence is not the best for further compression coding to save the data volume of the videos by utilizing the content relativity/similarity among the images. Therefore, how to reduce the data volume of the video, i.e. to reduce the volume value of the video, on the premise of ensuring the unchanged video quality is a technical problem to be solved at present.

Disclosure of Invention

The embodiment of the application provides a video coding method, a video coding device, video coding equipment and a storage medium, which can ensure that the data volume of video is reduced under the condition of not reducing the video quality.

In one aspect, an embodiment of the present application provides a video encoding method, including:

decoding the target video to obtain multi-frame images;

determining a value of a group of pictures (Group of Pictures, GOP) of the multi-frame image based on a similarity between frames in the multi-frame image;

dividing the multi-frame image into a plurality of GOPs based on the determined GOP values of the multi-frame image, and adjusting the coding sequence of the images contained in each GOP;

and coding the image with the coding sequence adjusted to obtain the coded target video.

In one embodiment, the adjusting the coding sequence of the pictures included in each GOP includes:

for any GOP, acquiring the frame type of each frame image in the GOP and the reference frame of each frame image;

and adjusting the coding sequence of the images contained in any GOP based on the frame type of each frame image and the reference frame of each frame image.

In one embodiment, the adjusting the coding sequence of the pictures included in any GOP based on the frame type of each frame picture and the reference frame of each frame picture includes:

acquiring a first image with a frame type of an I frame and a plurality of second images with a frame type of a B frame in any GOP;

adjusting the coding sequence of the first image; wherein the coding order of the first image precedes the coding order of the plurality of second images;

traversing the plurality of second images, determining a second image of the reference frame including the first image from the plurality of second images;

adjusting the determined coding sequence of the second image; wherein the determined coding order of the second image is prior to the coding order of other second images in the plurality of second images;

determining a second image of which the reference frame comprises any one of the second images of which the encoding sequence is recently adjusted from the other second images;

adjusting the coding sequence of the second image which is determined recently; wherein the most recently determined coding order of the second image precedes the coding order of other second images of the plurality of second images;

And after the traversal is finished, acquiring the image with the coding sequence adjusted in any GOP.

In one embodiment, the frame type of the nth frame picture in the arbitrary GOP is an I frame, the frame type of the other frame pictures in the arbitrary GOP is a B frame, the coding sequence of the nth frame picture precedes the coding sequence of the other frame pictures in the arbitrary GOP, the coding sequence of the nth/2 frame picture in the arbitrary GOP follows the coding sequence of the nth frame picture, the coding sequence of the nth/4 frame picture in the arbitrary GOP follows the coding sequence of the nth/2 frame picture, the coding sequence of the nth/8 frame picture in the arbitrary GOP follows the coding sequence of the nth/4 frame picture, the coding sequence of the 3 nth/8 frame picture in the arbitrary GOP follows the coding sequence of the nth/8 frame picture, the coding sequence of the 3 nth/4 frame picture in the arbitrary GOP follows the coding sequence of the 3 nth/8 frame picture, the coding sequence of the 5 nth/8 frame picture in the arbitrary GOP follows the coding sequence of the 3 nth/8 frame picture in the coding sequence of the arbitrary GOP, and the coding sequence of the nth/8 frame picture in the arbitrary GOP follows the coding sequence of the nth/8 frame picture.

In one embodiment, the acquiring the reference frame of each frame image includes:

And determining the reference frame of each frame image based on the frame type of each frame image and a reference frame determination strategy.

In one embodiment, the similarity between frames in the multi-frame image is greater when the GOP of the multi-frame image is valued at the determined value than when the GOP of the multi-frame image is valued at other values.

In one embodiment, the encoding the image after the encoding sequence is adjusted to obtain an encoded target video, including:

determining the frame type of each frame image in each GOP and the reference frame of each frame image;

and encoding each frame image based on the frame type of each frame image and the reference frame of each frame image according to the adjusted encoding sequence of each frame image in each GOP, so as to obtain an encoded target video.

In one embodiment, the method further comprises:

performing resolution downsampling on each frame of image to obtain downsampled multi-frame images;

the determining the value of the group of pictures GOP of the multi-frame image based on the similarity between frames in the multi-frame image includes:

And determining the value of the GOP of the multi-frame image based on the similarity between frames in the multi-frame image after the downsampling.

In one embodiment, the performing resolution downsampling on each frame of image to obtain downsampled multi-frame images includes:

performing resolution downsampling on image parameter components of each frame of image to obtain downsampled multi-frame images, wherein the image parameter components comprise one or more of the following: a luminance component, a chrominance component.

In one embodiment, the method further comprises:

acquiring the frame rate of the target video;

and if the frame rate is smaller than the frame rate threshold, triggering and executing the decoding processing on the target video to obtain a multi-frame image.

In another aspect, an embodiment of the present application provides a video encoding apparatus, including:

the decoding unit is used for decoding the target video to obtain multi-frame images;

a determining unit, configured to determine a value of a GOP of the multi-frame image based on a similarity between frames in the multi-frame image;

a dividing unit, configured to divide the multi-frame image into a plurality of GOPs based on the determined GOP values of the multi-frame image, and adjust coding orders of the images included in each GOP;

And the encoding unit is used for encoding the images with the encoding sequence adjusted to obtain the encoded target video.

In another aspect, an embodiment of the present application provides a computer device, including a processor, a storage device, and a communication interface, where the processor, the storage device, and the communication interface are connected to each other, where the storage device is configured to store a computer program that supports the computer device to perform the method, the computer program includes program instructions, and the processor is configured to invoke the program instructions to perform the following steps:

decoding the target video to obtain multi-frame images;

determining the value of GOP of the multi-frame images based on the similarity between frames in the multi-frame images;

In another aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the video encoding method described above.

In another aspect, embodiments of the present application provide a computer program product comprising a computer program adapted to be loaded by a processor and to perform the video encoding method described above.

In the embodiment of the present application, after decoding a target video to obtain multiple frame images, determining the GOP value of the multiple frame images based on the similarity between frames in the multiple frame images, and dividing the multiple frame images into multiple GOPs based on the determined GOP value of the multiple frame images, so that the content correlation between the images included in each GOP obtained by dividing is the highest, and therefore, the coding sequence of the images included in each GOP is adjusted, and the images after the coding sequence is adjusted are coded, so as to obtain the coded target video. The data volume of the coded video obtained by the coding mode can be reduced, and the video quality can not be reduced. That is, the embodiments of the present application can ensure that the data amount of video is reduced without degrading the video quality.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a video encoding method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a GOP provided in an embodiment of the present application;

fig. 3 is a flowchart of another video encoding method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video encoding device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Users can upload videos to the content distribution platform to realize the sharing and the propagation of the videos. In order to reduce the volume value of a video, a content distribution platform generally encodes the video, and further distributes the encoded video so that other users can watch the video or edit the video, such as forwarding, praise, collection, etc. Taking album videos as an example, album videos refer to a type of electronic album that presents image content in the form of videos. The number of image frames of the album video is small, the frame rate of the album video is far lower than that of a conventional video, and the requirements of the album video on image quality are high, namely, when the content distribution platform codes the album video, the quality of the video needs to be ensured not to be affected as much as possible. Therefore, how to reduce the data volume of the video, i.e. to reduce the volume value of the video, on the premise of ensuring the unchanged video quality is a technical problem to be solved at present.

Based on this, in the video encoding method provided in the embodiment of the present application, after decoding a target video to obtain multiple frame images, based on the similarity between frames in the multiple frame images, the value of the GOP of the multiple frame images is determined, and then the multiple frame images are divided into multiple GOPs based on the determined value of the GOP of the multiple frame images, so that the content correlation between the images included in each GOP obtained by such division is the highest, so that the encoding order of the images included in each GOP is adjusted, and the images after the encoding order is adjusted are encoded, so as to obtain the encoded target video. The data volume of the coded video obtained by the coding mode can be reduced, and the video quality can not be reduced. That is, the embodiments of the present application can ensure that the data amount of video is reduced without degrading the video quality.

Where a GOP is a group of consecutive pictures, a GOP is a set of pictures in a target video, a GOP may include multiple pictures, the GOP is used to assist random access, and the GOP value may be understood as the interval between two I frames.

The video coding method provided by the embodiment of the application can be applied to a video coding device, the video coding device can be installed or integrated on a content release platform, the content release platform can be operated in computer equipment, the computer equipment can comprise terminal equipment or a server and the like, and the computer equipment comprises but is not limited to a smart phone, a camera, a wearable device or a computer and the like.

Referring to fig. 1, fig. 1 is a schematic flow chart of a video encoding method provided in an embodiment of the present application, where the video encoding method may be performed by a video encoding apparatus, a content distribution platform, or a computer device; the video coding scheme as shown in fig. 1 includes, but is not limited to, steps S101 to S104, wherein:

s101, decoding the target video to obtain multi-frame images.

The target video may be any video, for example, a video uploaded to the content distribution platform by a user, or a video to be encoded in the content distribution platform.

In one example, the target video may be decoded to obtain multiple frame images in YUV format, i.e., the pixel format of each frame image is YUV. Wherein YUV refers to a pixel format in which Luminance parameters and chrominance parameters are expressed separately, wherein "Y" represents brightness (luminence or Luma), that is, a gray value; "U" and "V" denote Chroma (Chroma) to describe the image color and saturation for the color of the given pixel.

In another example, the target video may be decoded to obtain multiple frame images in RGB format, i.e., each frame image has a pixel format of RGB. RGB is the color representing the three channels of red, green and blue.

S102, determining the value of GOP of the multi-frame images based on the similarity between frames in the multi-frame images.

The similarity between frames in the multi-frame images is larger than that between frames in the case that the GOP of the multi-frame images is at other values.

In one implementation, the similarity between frames in the multi-frame image may be obtained when the GOP value is set to different values, and the GOP value of the multi-frame image is set to a target value, where the target value refers to: the value set when the similarity between frames is highest.

In one implementation, a plurality of values may be traversed, when the value of the GOP is set as the value of the current traversal, the multi-frame image is divided into a plurality of candidate GOPs, for any one of the plurality of candidate GOPs, an inter-frame similarity index of the image contained in the any one candidate GOP is calculated, and if the inter-frame similarity index of the any one candidate GOP is smaller than an index threshold corresponding to the value of the GOP, it is determined that the similarity between frames in the multi-frame image is highest when the value of the GOP is set as the value of the current traversal.

In one implementation, if the inter-frame similarity index of any candidate GOP is greater than or equal to the index threshold corresponding to the value of the GOP, the next value is traversed.

For example, assume that a number of values include 4,8, 16, 32, and that the target video is decoded to obtain 8 frames of images. When traversing the value 4, the value of the GOP is set to 4, that is, the 8-frame image is divided into 2 GOPs, namely, a first GOP and a second GOP, each GOP includes 4-frame images, wherein the first GOP includes a first frame image to a fourth frame image, and the second GOP includes a fifth frame image to an eighth frame image. The inter-frame similarity index of the first frame image to the fourth frame image included in the first GOP may be calculated, if the inter-frame similarity index of the first frame image to the fourth frame image is smaller than the index threshold corresponding to the GOP when the value of the GOP is set to 4, it is determined that the similarity between frames in the 12 frame images is highest when the value of the GOP is set to 4, so that the value of the GOP of the 12 frame images may be set to 4, the 12 frame images may be divided into 3 GOPs, the coding sequence of the images included in each GOP may be adjusted, the images with the adjusted coding sequence may be coded, the coded video bitstream may be obtained, and the coded video bitstream and the audio bitstream of the target video may be packaged, so as to obtain the coded target video.

If the inter-frame similarity index of the first frame image to the fourth frame image is greater than or equal to the index threshold corresponding to the GOP with the value set to 4, it indicates that the similarity between the frames in the first frame image to the fourth frame image is not the highest when the GOP with the value set to 4. The inter-frame similarity index of the fifth to eighth frame images included in the second GOP may be calculated, and if the inter-frame similarity index of the fifth to eighth frame images is smaller than the index threshold corresponding to the GOP when the GOP value is set to 4, it is determined that the similarity between frames in the 12 frame images is highest when the GOP value is set to 4.

If the inter-frame similarity index of the fifth to eighth frame images is greater than or equal to the index threshold corresponding to the GOP with the value set to 4, it indicates that the similarity between frames in the fifth to eighth frame images is not the highest when the GOP with the value set to 4. The GOP may be set to a value of 8, i.e., 8 frames of pictures are divided into 1 GOP, i.e., the GOP includes 8 frames of pictures. If the inter-frame similarity index of the first frame image to the eighth frame image contained in the GOP is smaller than the index threshold corresponding to the GOP with the value set to 8, it is determined that the similarity between frames in the 8 frame images is highest when the GOP with the value set to 8, so that the value of the GOP with the 8 frame images can be set to 8, the 8 frame images are divided into 1 GOP, the coding sequence of the images contained in the GOP is adjusted, the images with the adjusted coding sequence are coded, the coded video bit stream is obtained, and the coded video bit stream and the audio bit stream of the target video are packaged, so that the coded target video is obtained.

In another implementation manner, a plurality of values can be traversed, when the value of the GOP is set as the value of the current traversal, a multi-frame image is divided into a plurality of candidate GOPs, for a first candidate GOP in the plurality of candidate GOPs, an inter-frame similarity index of the image contained in the first candidate GOP is calculated, and if the inter-frame similarity index of the first candidate GOP is smaller than an index threshold corresponding to the value of the GOP, it is determined that the similarity between frames in the multi-frame image is highest when the value of the GOP is set as the value of the current traversal; if the inter-frame similarity index of the first candidate GOP is greater than or equal to the index threshold corresponding to the GOP value, traversing the next value. Setting the value of GOP of the multi-frame image as a target value, wherein the target value refers to: the value set when the similarity between frames is highest. Based on the value of GOP of the multi-frame image, dividing the multi-frame image into a plurality of GOP, adjusting the coding sequence of the image contained in the first GOP, and coding the image with the adjusted coding sequence to obtain the coded first video bit stream. And determining target values of other images except the image contained in the first GOP in the multi-frame images according to the same mode, setting the values of the GOPs of the other images as the target values, dividing the other images into a plurality of GOPs based on the values of the GOPs of the other images, adjusting the coding sequence of the images contained in the first GOP, and coding the images with the adjusted coding sequence to obtain a coded second video bit stream until the multi-frame images are coded.

For example, assume that a number of values include 4,8, 16, 32, and that the target video is decoded to obtain 12 frames of images. In traversing the value 4, the value of the GOP is set to 4, that is, the 8-frame image is divided into 3 GOPs, which are respectively a first GOP, a second GOP and a third GOP, wherein each GOP includes 4 frame images, the first GOP includes a first frame image to a fourth frame image, the second GOP includes a fifth frame image to an eighth frame image, and the third GOP includes a ninth frame image to a twelfth frame image. The inter-frame similarity index of the first to fourth frame images included in the first GOP may be calculated, if the inter-frame similarity index of the first to fourth frame images is smaller than an index threshold corresponding to when the value of GOP is set to 4, it is determined that when the value of GOP is set to 4, the similarity between frames in the 12 frame images is highest, so that the value of GOP of the 12 frame images may be set to 4, the 12 frame images may be divided into 3 GOPs, and the encoding sequence of the first to fourth frame images may be adjusted, and the images after the adjusted encoding sequence may be encoded to obtain the encoded first video bitstream.

Then, for the fifth to twelfth frame images, the value of GOP is set to 4 when traversing the value 4, that is, the fifth to twelfth frame images are divided into 2 GOPs, respectively, a first GOP and a second GOP, each GOP includes 4 frame images, wherein the first GOP includes the fifth to eighth frame images, and the second GOP includes the ninth to twelfth frame images. The inter-frame similarity index of the fifth to eighth frame images included in the first GOP may be calculated, if the inter-frame similarity index of the fifth to eighth frame images is greater than or equal to an index threshold corresponding to when the value of GOP is set to 4, the value 8 is traversed, the fifth to twelfth frame images are divided into 1 GOP, the inter-frame similarity index of the fifth to twelfth frame images is calculated, and if the inter-frame similarity index of the fifth to twelfth frame images is less than the index threshold corresponding to when the value of GOP is set to 8, the coding sequence of the fifth to twelfth frame images is adjusted, and the images after the adjusted coding sequence are coded, so as to obtain the coded second video bitstream. And packaging the encoded first video bit stream, the encoded second video bit stream and the audio bit stream of the target video to obtain the encoded target video.

In one implementation, inter-frame prediction is performed on each frame image included in any candidate GOP to obtain a predicted pixel value of each frame image, a matching parameter of each frame image is obtained based on the predicted pixel value and a target pixel value of each frame image, the matching parameter includes a sum of absolute differences (Sum of Absolute Transformed Difference, SATD) or an average absolute difference (Mean Absolute Differences, MAD), and an inter-frame similarity index of the image included in any candidate GOP is obtained based on the matching parameter of each frame image. The target pixel value of each frame image refers to the pixel value of each frame image obtained by decoding the target video.

In one example, the SATD of each frame image may be obtained based on the predicted pixel value and the target pixel value of each frame image, and the SATDs of each frame image included in any candidate GOP may be added to obtain the inter-frame similarity index of the candidate GOP. The smaller the SATD of each frame image is, the smaller the inter-frame similarity index of the candidate GOP is, which indicates that the similarity between frames is higher.

In another example, the MAD of each frame image may be obtained based on the predicted pixel value and the target pixel value of each frame image, and the MAD of each frame image included in any candidate GOP may be added to obtain the inter-frame similarity index of the candidate GOP. The smaller the MAD of each frame image is, the smaller the inter-frame similarity index of the candidate GOP is, which indicates that the similarity between frames is higher.

In one implementation, each frame of image may be segmented to obtain a plurality of image blocks of each frame of image, inter-frame prediction is performed on each image block to obtain a predicted pixel value of each image block, a matching parameter of each image block is obtained based on the predicted pixel value and a target pixel value of each image block, and a matching parameter of any frame of image is obtained based on a matching parameter of each image block in any frame of image included in any candidate GOP.

In one example, each frame of image may be segmented to obtain a plurality of image blocks of each frame of image, inter-prediction is performed on each image block to obtain a predicted pixel value of each image block, and SATD of each image block is obtained based on the predicted pixel value and the target pixel value of each image block. And adding SATD of each image block contained in each frame image to obtain the inter-frame similarity index of the candidate GOP.

For example, assuming that the candidate GOP includes a first frame image to a fourth frame image, blocking each of the first frame image to the fourth frame image to obtain 64 image blocks of each frame image, inter-predicting each image block to obtain a predicted pixel value of each image block, and obtaining SATD of each image block based on the predicted pixel value and the target pixel value of each image block. The SATD of the 64 image blocks of the first frame image may be added to obtain the SATD of the first frame image, and similarly, the SATD of the second frame image, the SATD of the third frame image, and the SATD of the fourth frame image may be obtained. Then, the SATD of the first frame image, the SATD of the second frame image, the SATD of the third frame image, and the SATD of the fourth frame image are added to obtain the inter-frame similarity index of the candidate GOP.

For example, assuming that the candidate GOP includes a first frame image to a fourth frame image, blocking each of the first frame image to the fourth frame image to obtain 64 image blocks of each frame image, performing inter-frame prediction on each image block to obtain a predicted pixel value of each image block, and obtaining the MAD of each image block based on the predicted pixel value and the target pixel value of each image block. The MADs of the 64 image blocks of the first frame image may be added to obtain the MAD of the first frame image, and similarly, the MAD of the second frame image, the MAD of the third frame image, and the MAD of the fourth frame image may be obtained. Then, the MAD of the first frame image, the MAD of the second frame image, the MAD of the third frame image, and the MAD of the fourth frame image are added to obtain the inter-frame similarity index of the candidate GOP.

In one implementation, each frame of image may be downsampled in resolution to obtain a downsampled multi-frame image, and then a GOP value of the multi-frame image is determined based on a similarity between frames in the downsampled multi-frame image. The manner of performing resolution downsampling on each frame image may be described in step S203 in the following embodiment.

S103, dividing the multi-frame image into a plurality of GOPs based on the determined GOP values of the multi-frame image, and adjusting the coding sequence of the images contained in each GOP.

For example, assuming that the determined GOP of the multi-frame image has a value of 4 and the number of multi-frame images has a value of 12, the 12-frame image may be divided into three GOPs, that is, a first GOP, a second GOP, and a third GOP, each GOP including 4 frame images, where the first GOP includes the first frame image to the fourth frame image, the second GOP includes the fifth frame image to the eighth frame image, and the third GOP includes the ninth frame image to the twelfth frame image. And then the coding order of the pictures included in the first GOP, the second GOP and the third GOP is adjusted.

In one implementation, for any GOP, a frame type of each frame image and a reference frame of each frame image in any GOP may be acquired, and an encoding order of the images included in any GOP may be adjusted based on the frame type of each frame image and the reference frame of each frame image.

The frame types of the image may include I frames, P frames, and B frames, among others. I frame is an intra-frame coding frame, also called intra picture, which is a full-frame compression coding frame, and it carries out JPEG compression coding and transmission on full-frame image information, and when I frame is decoded, the complete image can be reconstructed by using only the data of I frame. P frames are forward predictive coded frames, also called predictive-frames, which compress the coded pictures of a transmitted data volume by exploiting temporal redundancy information below previously coded frames in a picture sequence. The P frame takes the I frame as a reference frame, a predicted value and a motion vector of a certain point of the P frame are found out in the I frame, and a predicted difference value and the motion vector are taken to be transmitted together. The predicted value of a P frame 'certain point' is found out from the I frame at the decoding end according to the motion vector, and is added with the difference value to obtain the sample value of the P frame 'certain point', so that the complete P frame can be obtained. The B frame is a bi-directional predictive interpolation coding frame, also called bi-directional interpolated prediction frame, which takes into account the time redundancy information between the coded frame before the source image sequence and the coded frame after the source image sequence to compress the coded image of the transmission data volume, also called bi-directional predictive frame, and the B frame takes the previous I or P frame and the later P frame as reference frames, so as to "find out the predicted value and two motion vectors of a certain point of the B frame" and take the predicted difference value and the motion vectors to transmit. The decoding end' finds out (or calculates) the predicted value in the two reference frames according to the motion vector, and sums the predicted value with the difference value to obtain a sample value of a certain point of the B frame, so that a complete B frame can be obtained.

In one implementation, the method for acquiring the reference frame of each frame of image may include: the reference frame of each frame image is determined based on the frame type of each frame image and a reference frame determination policy.

Taking the schematic diagram of the GOP shown in fig. 2 as an example, it is assumed that a certain GOP includes N frame images, where the frame type of the last frame image in the GOP is an I frame, the frame types of other frame images in the GOP are B frames, and it may be determined that the reference frame of the N/2 th frame image in the GOP includes the N frame image and the last frame image in the last GOP based on the reference frame determination policy. The reference frame of the N/4 th frame image in the GOP comprises the N/2 th frame image and the last frame image in the last GOP, and similarly, the reference frame of the 3N/4 th frame image in the GOP comprises the N/2 th frame image and the N th frame image. The reference frame of the N/8 th frame image in the GOP comprises the N/4 th frame image and the last frame image in the last GOP, the reference frame of the 3N/8 th frame image in the GOP comprises the N/4 th frame image and the N/2 th frame image, the reference frame of the 5N/8 th frame image in the GOP comprises the 3N/4 th frame image and the N/2 th frame image, and the reference frame of the 7N/8 th frame image in the GOP comprises the 3N/4 th frame image and the N th frame image.

Alternatively, the reference frame determination policy shown in fig. 2 is only an example, and other manners are also possible, that is, the frame type of each frame image may be other, or the number of reference frames of each frame image may be 3 or 4, or the like. For example, the frame type of the nth frame picture in the GOP is I frame, the frame types of the other frame pictures are P frame, the reference frame of the 7 nth/8 frame picture in the GOP includes the nth frame picture, the reference frame of the 6 nth/8 frame picture in the GOP includes the 7 nth/8 frame picture and the nth frame picture, the reference frame of the 5 nth/8 frame picture in the GOP includes the 6 nth/8 frame picture and the 7 nth/8 frame picture, the reference frame of the nth/2 frame picture in the GOP includes the 5 nth/8 frame picture and the 6 nth/8 frame picture, and so on.

Optionally, the method for adjusting the coding sequence of the pictures included in any GOP based on the frame type of each frame picture and the reference frame of each frame picture may include: a first picture of a frame type I frame and a plurality of second pictures of a frame type B frame in any GOP are acquired. Then, an encoding order of the first image is adjusted, wherein the encoding order of the first image precedes the encoding order of the plurality of second images. And traversing the plurality of second images, determining the second images of which the reference frame comprises the first image from the plurality of second images, and adjusting the determined coding sequence of the second images, wherein the determined coding sequence of the second images precedes the coding sequence of other second images in the plurality of second images. Further, determining the reference frame from the other second images includes adjusting the coding order of the most recently determined second image prior to the coding order of the other second images in the plurality of second images. And after the traversal is finished, acquiring the image with the coding sequence adjusted in any GOP.

Taking fig. 2 as an example, since the frame type of the last frame image in the GOP is an I frame, the frame types of the other frame images in the GOP are B frames, and the reference frame of the 4 th frame image in the GOP includes the 8 th frame image and the last frame image in the last GOP, it can be determined that the encoding order of the last frame image in the GOP is the first, and the encoding order of the 4 th frame image in the GOP is the second. Since the reference frame of the 2 nd frame picture in the GOP includes the 4 th frame picture and the last frame picture in the previous GOP, the encoding order of the 2 nd frame picture in the GOP can be determined to be the third. Since the reference frame of the 1 st frame picture in the GOP includes the 2 nd frame picture and the last frame picture in the previous GOP, the coding order of the 1 st frame picture in the GOP can be determined to be fourth. Similarly, it can be determined that the coding order of the 3 rd frame image in the GOP is fifth, the coding order of the 6 th frame image in the GOP is sixth, the coding order of the 5 th frame image in the GOP is seventh, and the coding order of the 7 th frame image in the GOP is eighth.

Optionally, assuming that any GOP includes N frame images, the frame type of the N frame image in any GOP is I frame, the frame type of the other image in any GOP is B frame, the encoding sequence of the N frame image precedes the encoding sequence of the other image in any GOP, the encoding sequence of the N/2 frame image in any GOP follows the encoding sequence of the N frame image, the encoding sequence of the N/4 frame image in any GOP follows the encoding sequence of the N/2 frame image, the encoding sequence of the N/8 frame image in any GOP follows the encoding sequence of the N/4 frame image, the encoding sequence of the 3N/8 frame image in any GOP follows the encoding sequence of the N/8 frame image, the encoding sequence of the 3N/4 frame image in any GOP follows the encoding sequence of the 3N/8 frame image, and the encoding sequence of the 5N/8 frame image in any GOP follows the encoding sequence of the 5N/8 frame image.

S104, encoding the image with the adjusted encoding sequence to obtain an encoded target video.

In one implementation, a frame type of each frame image in each GOP and a reference frame of each frame image may be determined, and each frame image is encoded according to an adjusted encoding order of each frame image in each GOP based on the frame type of each frame image and the reference frame of each frame image, to obtain an encoded target video.

Alternatively, the image after the adjustment of the coding sequence may be coded to obtain a coded video bitstream, and the coded video bitstream and the audio bitstream of the target video are encapsulated to obtain the coded target video.

In the embodiment of the application, after decoding a target video to obtain multiple frame images, determining the value of GOPs of the multiple frame images based on the similarity between frames in the multiple frame images, dividing the multiple frame images into multiple GOPs based on the determined GOPs of the multiple frame images, adjusting the coding sequence of the images contained in each GOP, and coding the images with the adjusted coding sequence to obtain the coded target video, so that the data volume of the video can be reduced under the condition of not reducing the video quality.

Based on the above description, please refer to fig. 3, fig. 3 is a flowchart of another video encoding method provided in an embodiment of the present application, where the video encoding method may be performed by a video encoding apparatus, a content distribution platform, or a computer device; the video coding scheme as shown in fig. 3 includes, but is not limited to, steps S301 to S309, wherein:

s301, acquiring the frame rate of the target video.

In the embodiment of the present application, since the frame rate of the album video is far smaller than the frame rate of the conventional video, after the target video is acquired, the frame rate of the target video may be acquired, and if the frame rate is smaller than the frame rate threshold, it indicates that the target video is the album video, and then the target video is decoded to obtain the multi-frame image. If the frame rate is greater than or equal to the frame rate threshold, indicating that the target video is a conventional video, the target video may be directly published.

And S302, if the frame rate is smaller than the frame rate threshold, decoding the target video to obtain a multi-frame image.

In the embodiment of the present application, the decoding process is performed on the target video, and the manner of obtaining the multi-frame image may refer to the description of step S101 in the foregoing embodiment, which is not repeated in the embodiment of the present application.

And S303, performing resolution downsampling on each frame of image to obtain downsampled multi-frame images.

In one implementation, the resolution downsampling may be performed on image parameter components of each frame of image to obtain downsampled multi-frame images, where the image parameter components include one or more of: a luminance component, a chrominance component.

For example, the luminance component and the chrominance component of each frame image may be subjected to resolution downsampling, so as to obtain downsampled multi-frame images. Wherein the resolution of the downsampled multi-frame image is small.

S304, obtaining the similarity between frames in the downsampled multi-frame images when the GOP values are set to different values.

In the embodiment, the step S304 may refer to the specific description of the step S102 in the above embodiment, which is not repeated herein.

S305, setting the GOP value of the multi-frame image as a target value, where the target value refers to: the value set when the similarity between frames is highest.

S306, dividing the multi-frame image into a plurality of GOPs based on the GOP values of the multi-frame image, and determining the frame type of each frame image in each GOP and the reference frame of each frame image.

Taking the schematic diagram of the GOP shown in fig. 2 as an example, it is assumed that a certain GOP includes N frame pictures, wherein based on the encoding policy, it is determined that the frame type of the last frame picture in the GOP is an I frame, the frame types of other frame pictures in the GOP are B frames, and the reference frame of the N/2 th frame picture in the GOP includes the N frame picture and the last frame picture in the previous GOP. The reference frame of the N/4 th frame image in the GOP comprises the N/2 th frame image and the last frame image in the last GOP, and similarly, the reference frame of the 3N/4 th frame image in the GOP comprises the N/2 th frame image and the N th frame image. The reference frame of the N/8 th frame image in the GOP comprises the N/4 th frame image and the last frame image in the last GOP, the reference frame of the 3N/8 th frame image in the GOP comprises the N/4 th frame image and the N/2 th frame image, the reference frame of the 5N/8 th frame image in the GOP comprises the 3N/4 th frame image and the N/2 th frame image, and the reference frame of the 7N/8 th frame image in the GOP comprises the 3N/4 th frame image and the N th frame image.

Alternatively, the encoding strategy shown in fig. 2 is only an example, and other manners are also possible, that is, the frame type of each frame image may be in other forms, or the number of reference frames of each frame image may be 3 or 4, etc. For example, the frame type of the nth frame picture in the GOP is I frame, the frame types of the other frame pictures are P frame, the reference frame of the 7 nth/8 frame picture in the GOP includes the nth frame picture, the reference frame of the 6 nth/8 frame picture in the GOP includes the 7 nth/8 frame picture and the nth frame picture, the reference frame of the 5 nth/8 frame picture in the GOP includes the 6 nth/8 frame picture and the 7 nth/8 frame picture, the reference frame of the nth/2 frame picture in the GOP includes the 5 nth/8 frame picture and the 6 nth/8 frame picture, and so on.

S307, the coding order of the pictures included in each GOP is adjusted.

Taking fig. 2 as an example, since the frame type of the last frame image in the GOP is an I frame, the frame types of the other frame images in the GOP are B frames, and the reference frame of the nth/2 frame image in the GOP includes the nth frame image and the last frame image in the last GOP, it can be determined that the encoding order of the last frame image in the GOP is the first, and the encoding order of the nth/2 frame image in the GOP is the second. Since the reference frame of the N/4 th frame picture in the GOP includes the N/2 th frame picture and the last frame picture in the previous GOP, it can be determined that the coding order of the N/4 th frame picture in the GOP is the third. Since the reference frame of the N/8 th frame picture in the GOP includes the N/4 th frame picture and the last frame picture in the last GOP, it can be determined that the coding order of the N/8 th frame picture in the GOP is fourth. Similarly, it can be determined that the coding order of the 3N/8 th frame image in the GOP is fifth, the coding order of the 3N/4 th frame image in the GOP is sixth, the coding order of the 5N/8 th frame image in the GOP is seventh, and the coding order of the 7N/8 th frame image in the GOP is eighth.

S308, encoding each frame image based on the frame type of each frame image and the reference frame of each frame image according to the adjusted encoding sequence of each frame image in each GOP, to obtain an encoded video bitstream.

S309, packaging the coded video bit stream and the audio bit stream of the target video to obtain the coded target video.

In this embodiment of the present application, if the frame rate of the target video is smaller than the frame rate threshold, decoding the target video to obtain a plurality of frame images, performing resolution downsampling on each frame image to obtain downsampled plurality of frame images, obtaining similarity between frames in the downsampled plurality of frame images when the value of the GOP is set to different values, and setting the value of the GOP of the plurality of frame images to the target value, where the target value refers to: the method comprises the steps of setting a numerical value when the similarity between frames is highest, dividing the multi-frame image into a plurality of GOPs based on the value of the GOP of the multi-frame image, determining the frame type of each frame image in each GOP and the reference frame of each frame image, adjusting the coding sequence of the images contained in each GOP, coding each frame image according to the adjusted coding sequence of each frame image in each GOP based on the frame type of each frame image and the reference frame of each frame image to obtain a coded video bit stream, packaging the coded video bit stream and the audio bit stream of a target video to obtain a coded target video, analyzing the content correlation between each frame image, coding the images with more similarity as continuous frames until all frame images of the target video are coded, and further reducing the size of the video on the premise of ensuring the video quality.

The present embodiment also provides a computer storage medium having stored therein program instructions for implementing the corresponding method described in the above embodiments when executed.

Referring to fig. 4 again, fig. 4 is a schematic structural diagram of a video encoding device according to an embodiment of the present application.

In one implementation manner of the video encoding device of the embodiment of the present application, the video encoding device includes the following structure.

A decoding unit 401, configured to perform decoding processing on a target video to obtain a multi-frame image;

a determining unit 402, configured to determine a value of a GOP of the multi-frame image based on a similarity between frames in the multi-frame image;

a dividing unit 403, configured to divide the multi-frame image into a plurality of GOPs based on the determined GOP values of the multi-frame image, and adjust coding orders of the images included in each GOP;

and the encoding unit 404 is configured to encode the image after the encoding sequence is adjusted, so as to obtain an encoded target video.

In one embodiment, the dividing unit 403 adjusts the coding order of the pictures included in each GOP, including:

In one embodiment, the dividing unit 403 adjusts the coding order of the pictures included in the arbitrary GOP based on the frame type of each frame picture and the reference frame of each frame picture, including:

In one embodiment, the dividing unit 403 acquires a reference frame of each frame image, including:

In one embodiment, the encoding unit 404 encodes the image after the adjustment of the encoding sequence to obtain the encoded target video, including:

In one embodiment, the video encoding apparatus may further include a downsampling unit 405, wherein:

a downsampling unit 405, configured to downsample the resolution of each frame of image, so as to obtain a downsampled multi-frame image;

the determining unit 402 determines, based on the similarity between frames in the multiple frame images, a value of a group of pictures GOP of the multiple frame images, including:

In one embodiment, the downsampling unit 405 performs resolution downsampling on each frame of image to obtain downsampled multi-frame images, including:

In one embodiment, the video encoding apparatus may further include a frame rate acquisition unit 406, wherein:

a frame rate obtaining unit 406, configured to obtain a frame rate of the target video;

if the frame rate is smaller than the frame rate threshold, the decoding unit 401 is triggered to perform decoding processing on the target video, so as to obtain a multi-frame image.

In this embodiment, after the decoding unit 401 decodes the target video to obtain the multi-frame image, the determining unit 402 determines the GOP value of the multi-frame image based on the similarity between frames in the multi-frame image, and then the dividing unit 403 divides the multi-frame image into multiple GOPs based on the determined GOP value of the multi-frame image, adjusts the coding sequence of the images included in each GOP, and the coding unit 404 codes the images after adjusting the coding sequence to obtain the coded target video, so as to ensure that the data amount of the video is reduced without reducing the video quality.

Referring to fig. 5 again, fig. 5 is a schematic structural diagram of a computer device provided in an embodiment of the present application, where the computer device in the embodiment of the present application includes a power supply module and other structures, and includes a processor 501, a storage 502, and a communication interface 503. Data can be interacted among the processor 501, the storage device 502 and the communication interface 503, and a corresponding video coding method is implemented by the processor 501.

The storage 502 may include volatile memory (RAM), such as random-access memory (RAM); the storage 502 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a Solid State Drive (SSD), etc.; the storage 502 may also include a combination of the types of memory described above.

The processor 501 may be a central processing unit (central processing unit, CPU). The processor 501 may also be a combination of a CPU and a GPU. In the server, a plurality of CPUs and GPUs can be included as required to perform corresponding video encoding. In one embodiment, storage 502 is used to store program instructions. The processor 501 may invoke program instructions to implement the various methods as referred to above in embodiments of the present application.

In a first possible implementation manner, the processor 501 of the computer device invokes the program instructions stored in the storage device 502 to decode the target video to obtain a multi-frame image; determining the value of GOP of the multi-frame images based on the similarity between frames in the multi-frame images; dividing the multi-frame image into a plurality of GOPs based on the determined GOP values of the multi-frame image, and adjusting the coding sequence of the images contained in each GOP; and coding the image with the coding sequence adjusted to obtain the coded target video.

In one embodiment, the processor 501 may perform the following operations when adjusting the coding order of the pictures included in each GOP:

In one embodiment, the processor 501 may perform the following operations when adjusting the coding order of the pictures included in the GOP based on the frame type of each frame picture and the reference frame of each frame picture:

In one embodiment, the processor 501 is configured to, when acquiring the reference frame of each frame of image, perform the following operations:

In one embodiment, the processor 501 is configured to, when encoding the image after adjusting the encoding sequence, obtain the encoded target video, perform the following operations:

In one embodiment, the processor 501 is further configured to perform the following operations: performing resolution downsampling on each frame of image to obtain downsampled multi-frame images;

The processor 501 is configured to, when determining the value of the group of pictures GOP of the multi-frame image based on the similarity between frames in the multi-frame image, perform the following operations:

In one embodiment, the processor 501 is configured to, when performing resolution downsampling on each frame of image to obtain a downsampled multi-frame image, perform the following operations:

In one embodiment, the processor 501 is further configured to perform the following operations:

acquiring the frame rate of the target video;

In this embodiment of the present application, after performing decoding processing on a target video to obtain multiple frame images, the processor 501 determines the value of a GOP of the multiple frame images based on the similarity between frames in the multiple frame images, then divides the multiple frame images into multiple GOPs based on the determined value of the GOP of the multiple frame images, adjusts the coding sequence of the images included in each GOP, and encodes the images after the coding sequence is adjusted to obtain the encoded target video, so that it is possible to ensure that the data size of the video is reduced without reducing the video quality.

It will be appreciated by those skilled in the art that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The computer readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

The above disclosure is only a few examples of the present application, and it is not intended to limit the scope of the claims, and those skilled in the art will understand that all or a portion of the above-described embodiments may be implemented and equivalents may be substituted for elements thereof, which are included in the scope of the present invention.

Claims

1. A video encoding method, comprising:

Decoding the target video to obtain multi-frame images;

determining the value of a group of pictures (GOP) of the multi-frame image based on the similarity between frames in the multi-frame image;

2. The method of claim 1, wherein said adjusting the coding order of the pictures contained in each GOP comprises:

3. The method according to claim 2, wherein said adjusting the coding order of the pictures contained in the arbitrary GOP based on the frame type of each frame picture and the reference frame of each frame picture comprises:

4. The method of claim 3 wherein the frame type of the nth frame picture in the arbitrary GOP is an I frame, the frame type of the other pictures in the arbitrary GOP is a B frame, the coding order of the nth frame picture precedes the coding order of the other pictures in the arbitrary GOP, the coding order of the nth/2 frame picture in the arbitrary GOP follows the coding order of the nth frame picture, the coding order of the nth/4 frame picture in the arbitrary GOP follows the coding order of the nth/2 frame picture, the coding order of the nth/8 frame picture in the arbitrary GOP follows the coding order of the nth/4 frame picture, the coding order of the 3 nth/8 frame picture in the arbitrary GOP follows the coding order of the nth/8 frame picture, the coding order of the 3 nth/4 frame picture in the arbitrary GOP follows the coding order of the 3 nth/8 frame picture, the coding order of the 5 th/8 frame picture in the arbitrary GOP follows the coding order of the nth/8 frame picture in the arbitrary GOP, and the coding order of the nth/8 frame picture in the arbitrary GOP follows the coding order of the 3 nth/8 frame picture.

5. The method according to any one of claims 2 to 4, wherein the acquiring the reference frame of each frame image comprises:

6. The method according to claim 1, wherein the similarity between frames in the multi-frame image is greater in the case where the GOP of the multi-frame image is valued at the determined value than in the case where the GOP of the multi-frame image is valued at other values.

7. The method according to claim 1, wherein the encoding the image after the adjustment of the encoding order to obtain the encoded target video comprises:

and encoding each frame image based on the frame type of each frame image and the reference frame of each frame image according to the adjusted encoding sequence of each frame image in each GOP, so as to obtain the encoded target video.

8. The method according to claim 1, wherein the method further comprises:

9. The method of claim 8, wherein the performing the resolution downsampling on each frame of image to obtain downsampled multi-frame images comprises:

10. The method according to claim 1, wherein the method further comprises:

acquiring the frame rate of the target video;

11. A video encoding device, the device comprising:

12. A computer device comprising a processor, a storage device, and a communication interface, the processor, storage device, and communication interface being interconnected, wherein:

the storage device is used for storing a computer program, and the computer program comprises program instructions;

the processor being configured to invoke the program instructions to perform the video encoding method of any of claims 1 to 10.

13. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program comprising program instructions for performing the video encoding method according to any of claims 1 to 10 when executed by a processor.