CN111372088A

CN111372088A - Video coding method, video coding device, video coder and storage device

Info

Publication number: CN111372088A
Application number: CN202010238801.7A
Authority: CN
Inventors: 林聚财; 江东; 曾飞洋; 方诚; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2020-07-03
Anticipated expiration: 2040-03-30
Also published as: CN111372088B

Abstract

The application discloses a video coding method, a video coding device, a video coder and a storage device. The video encoding method includes: acquiring a multi-frame image in a video to be coded; dividing each frame of image in the multi-frame image into a plurality of areas respectively; taking at least one region in each frame of image as a first region, and carrying out intra-frame coding on the first region; wherein a combination of the first regions in the multi-frame image is capable of overlaying the image. By the mode, the code stream of the intra-frame coding frame can be reduced, and the problems of delay or frame loss in the transmission process of the video code stream are avoided.

Description

Video coding method, video coding device, video coder and storage device

Technical Field

The present application relates to the field of video encoding and decoding, and in particular, to a video encoding method, an apparatus, an encoder, and a storage apparatus.

Background

Because the video image data volume is large, it is usually necessary to encode and compress the video image data, the compressed data is called video code stream, and the video code stream is transmitted to the user end through a wired or wireless network and then decoded and viewed.

The whole video coding flow comprises the processes of prediction, transformation, quantization, coding and the like. The current process of encoding video generally divides the image of the video into an intra-coded frame (i.e., I-frame), an inter-coded frame (i.e., P-frame), and a bi-directional inter-coded frame (i.e., B-frame). The I frame is an independent frame with all information, and can be independently decoded without referring to other images, and an existing intra-frame coding frame needs to perform intra-frame predictive coding on each Coding Unit (CU) in the frame, so that a large amount of Code streams are usually generated to express the intra-frame coding frame, and thus the transmission data volume of a video Code stream can be increased. In some application scenarios, such as real-time codec applications, the large amount of data of intra-coded frames may cause significant delay or some frames to be lost.

Disclosure of Invention

The application provides a video coding method, a video coding device, a video coder and a video storage device, which can reduce the code stream of an intra-frame coded frame and further avoid the problems of delay or frame loss in the transmission process of the video code stream.

One technical solution adopted in the first aspect of the present application is: there is provided a video encoding method including: acquiring a multi-frame image in a video to be coded; dividing each frame of image in the multi-frame image into a plurality of areas respectively; taking at least one region in each frame of image as a first region, and carrying out intra-frame coding on the first region; wherein a combination of the first regions in the multi-frame image is capable of overlaying the image.

One technical solution adopted in the second aspect of the present application is: there is provided a video encoding apparatus including: the acquisition module is used for acquiring multi-frame images in a video to be coded; the dividing module is used for dividing each frame of image in the multi-frame images into a plurality of areas respectively; the coding module is used for taking at least one region in each frame of image as a first region and carrying out intra-frame coding on the first region; wherein a combination of the first regions in the multi-frame image is capable of overlaying the image.

The third aspect of the present application adopts a technical solution that: providing an encoder comprising a processor, a memory coupled to the processor, wherein the memory stores program instructions for implementing the method of the first aspect; the processor is configured to execute the program instructions stored by the memory to encode the video to be encoded.

A technical solution adopted in the fourth aspect of the present application is: there is provided a storage device storing program instructions capable of implementing the method of the first aspect.

According to the scheme, a plurality of areas are respectively divided from a plurality of frames of images in a video to be coded, and at least one area in each frame of image is subjected to intra-frame coding; the combination of the regions for intra-frame coding in the multi-frame image can cover the image, so that the code streams of the intra-frame coding of different regions are distributed into multiple frames by respectively performing the intra-frame coding on partial regions of the multi-frame image.

Drawings

FIG. 1 is a schematic flowchart of an embodiment of a video encoding method of the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a video encoding method of the present application;

FIG. 3 is a schematic diagram of image partitioning and encoding in a first example of the video encoding method of the present application;

FIG. 4 is a diagram illustrating image partitioning and encoding in a second example of the video encoding method of the present application;

FIG. 5 is a schematic diagram of image partitioning and encoding in a third example of the video encoding method of the present application;

FIG. 6 is a schematic diagram of image partitioning and encoding in a fourth example of the video encoding method of the present application;

FIG. 7 is a block diagram of an embodiment of a video encoding apparatus of the present application;

FIG. 8 is a schematic structural diagram of an embodiment of an encoder of the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a memory device according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. It should be noted that, in the following method examples, the method of the present application is not limited to the flow sequence shown in the drawings if the results are substantially the same.

The following describes embodiments of the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating a video encoding method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

s110: acquiring a plurality of frame images in a video to be coded.

An intra-coded frame (hereinafter, referred to as an I-frame) is an independent frame with all information, and can be independently decoded without referring to other pictures. Therefore, I-frames also serve as reference frames for inter-coded frames such as P-frames and B-frames. In the method, if all the areas of a single I frame are subjected to intra-frame coding, the code stream of the single I frame is large, so that when the I frame needs to be coded, the image which is used as the I frame and at least one frame image adjacent to the image are obtained from the video to be coded to obtain the multi-frame image. In this embodiment, it is considered that the first frame image in the video is generally used as the I frame, so this step is to obtain the previous multi-frame image in the video to be encoded.

S120: each frame of the multi-frame images is divided into a plurality of areas respectively.

In this embodiment, the region may include at least one Coding Tree Unit (CTU). Specifically, each region may be, but is not limited to, a Slice (Slice) or a rectangular region (Tile).

It is to be understood that, in the multi-frame image, the division of the area of each frame image may be the same or different. As shown in fig. 3, the first 3 frames of images in the video to be encoded are obtained, and each frame of image of the first 3 frames is divided into 3 regions according to the same division manner, where the same division manner includes that the size of the divided regions and the positions in the images are the same; as shown in fig. 4, the first 3 frame images in the video to be encoded are obtained, and each frame image of the first 3 frames is divided into 2 regions, wherein the first frame image frame0 is divided in a different manner from the second frame image frame1, and the second frame image frame1 is divided in a same manner as the third frame image frame 2. Therefore, the present application does not limit the manner of dividing the regions of each image, and can divide a plurality of regions.

S130: at least one area in each frame of image is taken as a first area, and the first area is subjected to intra-frame coding.

Wherein the combination of the first regions in the multi-frame image can cover the image. That is, if the first regions in the multi-frame images are spliced and combined according to the positions of the first regions in the images, the entire screen of the images can be covered, and thus the regions at each position of one frame image can be intra-coded. As shown in fig. 3, the first 3 frames of images in the video to be encoded are obtained, each frame of image includes three regions 0-3, the first region of the first frame of image 0 is region 0, the first region of the second frame of image 1 is region 1, and the first region of the third frame of image 2 is region 2, so that the first regions of the first three frames of images are combined to cover just the whole picture of the image.

Specifically, for the multi-frame image acquired in step S110, a preset number of regions may be selected as the first region in each frame image, where the preset number is greater than or equal to 1 and is less than the number of regions of the image. As shown in fig. 3, the first frame image frame0 includes 3 regions, and the number of the first regions selected by the first frame image frame0 is greater than or equal to 1 and less than 3, so as to avoid intra-coding all the regions. The selected area is not located in the same location for each frame of image. The different bits described herein are such that the positions occupied by the two regions on the image are not exactly the same. For example, for each frame image of the plurality of frame images acquired in step S110: if the first frame image in the multi-frame image is the first frame image, selecting a preset number of areas as the first area, specifically selecting the first area in the image according to a preset sequence strategy, or randomly selecting one area, wherein the specific selection mode is not limited; if the selected image is not the first image in the multi-frame image, selecting a preset number of areas with different bits from the first area in the selected image as the first area, specifically selecting an area in the selected image according to a preset sequence policy (for example, selecting a second area in the case of the second frame image), or randomly selecting an area from areas with different bits from the first area in the selected image, where a specific selection manner is not limited. The selected image is the image of the selected first area in the multi-frame images, and in an application scene, each frame of image is coded according to the sequence, so that the selected image is the image which is positioned before the current frame of image in the multi-frame images.

In this embodiment, the first region of each of the plurality of frames of images is intra-coded, so the first region may also be referred to as an I region, and if the region is a Slice, the first region is an I Slice. Specifically, the Intra coding may be performed using any Intra coding mode, such as an Intra Block Copy (IBC) prediction coding mode, an IBC _ merge prediction coding mode, and the like. By carrying out intra-frame coding on the first area in the multi-frame image, after receiving the code stream obtained by coding the video to be coded, the image data of the first area of the multi-frame image can be directly obtained by decoding according to the code stream of the first area of the multi-frame image. In this way, if the other frame images than the multi-frame image are encoded by inter-frame encoding, when decoding, the corresponding inter-frame decoding can be performed using the first region of the multi-frame image as a reference region, for example, the co-located region of the other frame image can be inter-decoded for each first region, and the other frame image data can be obtained.

In the embodiment, a plurality of frames of images in a video to be coded are divided into a plurality of areas respectively, and at least one area in each frame of image is subjected to intra-frame coding; the combination of the regions for intra-frame coding in the multi-frame image can cover the image, so that the code streams of the intra-frame coding of different regions are distributed into multiple frames by respectively performing the intra-frame coding on partial regions of the multi-frame image.

Referring to fig. 2, fig. 2 is a flowchart illustrating a video encoding method according to another embodiment of the present application. As shown in fig. 2, the method comprises the steps of:

s210: acquiring a plurality of frame images in a video to be coded.

S220: each frame of the multi-frame images is divided into a plurality of areas respectively.

For specific description of step S210 and step S210, reference may be made to the specific description of step S110 and step S120, which is not repeated herein.

S230: at least one area in each frame of image is used as a first area, the first area is subjected to intra-frame coding, and each second area except the first area in each frame of image is not coded or is coded in a preset mode.

The specific description of step S130 may be referred to for the determination of the first area and the description of the intra-frame coding, which is not repeated herein.

For each frame of image, the area other than the first area is taken as the second area. In order to reduce the code stream after each frame of image is encoded, the second region may not be encoded, or a preset mode may be performed to ensure a certain integrity of the multi-frame image.

The preset mode comprises the following steps of carrying out homogenization treatment on pixel values in the second area and then carrying out intra-frame coding, or carrying out inter-frame coding on the second area, so that after the coding is carried out through the preset mode, the code stream of the second area is less than the code stream generated by directly carrying out intra-frame coding on the second area, and in addition, the inter-frame coding mode on the second area can ensure the image data validity of the area to a certain extent, and further can ensure the integrity of the image.

The above-mentioned homogenizing the pixel values in the second region means reducing the difference between the pixel values in the second region, for example, each pixel in the second region may be set to a first preset pixel value, and then the second region may be intra-coded. Because the pixel values in the second area are relatively uniform and even the same constant value, the code stream generated by intra-frame coding the second area is less than the code stream generated by directly intra-frame coding the second area.

The inter-coding of the second region may include any one of the following steps: (1) and performing first inter-frame coding on the second area by taking the first area of the frame as a reference area. For example, a matching block of a current block in a second region is found in a first region serving as a reference region through motion estimation, a relative displacement between the matching block and the current block is obtained to obtain a Motion Vector (MV), and a code stream of the second region is obtained by using the MV and residual data. The inter-frame coding can refer to the coded area in the same frame, and compared with the direct intra-frame coding of the second area, the inter-frame coding of the second area can reduce code streams. (2) After the pixel value in the second area is set as a second preset pixel value, a Skip (Skip) mode (an inter-frame prediction mode without residual) is used to perform a second inter-frame coding on the second area, where the second preset pixel value is the pixel value set in the second area in the coded image, and the coded image is an image that has been coded in the multi-frame image obtained in step S210, for example, an image that is a previous frame of the image in which the second area is located. Therefore, the pixel values in the second area and the co-located second area of the coded image are set to be consistent, so that the code stream obtained by inter-frame prediction can be greatly reduced. (3) And performing third inter-frame coding on the second area by taking a preset area of the coded image as a reference area, wherein the preset area is the first area in the coded image or the second area subjected to the first inter-frame coding or the third inter-frame coding. For example, if the second region is located in the same position as the first region of the previous frame, the second region is inter-coded using the first region of the previous frame as a reference region; for another example, if the second region is co-located with the second region of the previous frame image that was inter-coded in the above step (2) or (3), the co-located second region of the previous frame image is used as a reference region, and the second region is inter-coded. In the present embodiment, the referenceable region of inter-frame coding is enlarged by allowing the coded region in the same frame to be referred to in inter-frame coding.

It is to be understood that the inter-coding may be performed using any inter-coding mode, such as a merge (merge) inter-Prediction mode, an Advanced Motion Vector Prediction (AMVP) inter-Prediction mode, a skip inter-Prediction mode, and the like, unless otherwise specified.

Various preset modes are provided for encoding the second region, and specifically, which preset mode is adopted for encoding the second region can be determined according to the code stream costs of the various preset modes. For example, the code stream cost of each preset mode executed by the second region is determined, and the preset mode with the minimum code stream cost is selected to encode the second region. Wherein the code stream cost may be understood as the size of the generated code stream, and for inter-frame coding, may be related to the pixel difference between the reference region and the current second region.

In addition, for the first frame image in the multi-frame images, because no encoded image exists, the inter-frame encoding cannot be performed, and if the image in which the second area is located is the first frame image in the multi-frame images, any one of the following steps is performed: not encoding the second region; after setting the pixels in the second area to be the first preset pixel value, carrying out intra-frame coding on the second area; and performing first inter-frame coding on the second area by taking the first area of the frame as a reference area. If the image in which the second region is located is not the first frame image of the multi-frame images, any processing mode can be adopted to perform encoding processing on the second region.

It can be understood that, by performing no encoding on the second region, performing intra-frame encoding after performing homogenization processing on the pixel values, or performing inter-frame encoding after setting the pixel values to second preset pixel values, the code stream of a single frame can be further reduced, but the finally obtained code stream of the second region is invalid information because it does not carry the original image information, and thus the image is incomplete. In practical application, the influence of incomplete images on video playing of a receiving end can be measured according to practical conditions to determine whether to process the second area in each image in the manner. In some application scenarios, in order to reduce incomplete images, the second region may be processed only in the first few frames of the multi-frame image, and the second region may be processed in the other predetermined manners in the remaining images of the multi-frame image. In another application scenario, a mode of preferentially selecting a second region to carry valid information may be adopted for a non-first frame image of the multiple frame images to reduce incomplete images, for example, for a case that an image where the second region is located is not a first frame image of the multiple frame images, it is determined whether a co-located region of the second region in a previous frame image is the preset region, if so, the preset region of the previous frame image is used as a reference region, and third inter-frame coding is performed on the second region, otherwise, any one of the following steps is performed: not encoding the second region; after setting the pixels in the second area to be the first preset pixel value, carrying out intra-frame coding on the second area; taking the first area of the frame as a reference area, and carrying out first inter-frame coding on a second area; and after the pixel value in the second area is set as a second preset pixel value, performing second inter-frame coding on the second area by adopting a skip mode.

In this embodiment, a plurality of encoding processing manners of the second region are provided, that is, the second region can be encoded in a more flexible manner according to actual requirements, so that the problem of inheritance error or encoding delay can be avoided.

Step S240: it is determined whether the encoded multi-frame image is complete. If at least part of the encoded multi-frame images are incomplete, step S250 is performed.

Step S250: adding a syntax element to indicate that the at least partial frame image is incomplete.

In order to avoid video playing at the receiving end due to incomplete images, after the multi-frame images are encoded, whether each frame of image in the encoded multi-frame images is complete or not can be determined. For example, if there is an area that meets a preset coding condition in a coded image, it is determined that the image is incomplete; and performing intra-frame coding on the region after the region is not coded or pixels in the region are all set to be a first preset pixel value in the preset coding condition, or performing second inter-frame coding on the region by adopting a skip mode after the pixel value in the region is set to be a second preset pixel value, wherein the second preset pixel value is the pixel value set by the region in the coded image. Specifically, the aforementioned preset encoding case may refer to the aforementioned description about encoding the second region.

And if the plurality of frames of images are determined to have incomplete images, adding a syntax element to indicate that the images are incomplete. Therefore, after the receiving end receives the video code stream, some incomplete images in the multi-frame images can be obtained according to the syntax elements, and whether the incomplete images are played or not can be selected according to the conditions.

In particular, the addition of syntax elements may be implemented, but is not limited to, in the following way:

a) determining the number of incomplete images in the multi-frame image, adding a first preset syntax element in a Sequence Parameter Set (SPS), and assigning the first preset syntax element as the number of incomplete images. For example, if the first j frames in the multi-frame image are incomplete, the first preset syntax element is assigned as j to indicate that the first j frames in the video are incomplete.

b) Adding a second preset syntax element in a Picture Parameter Set (PPS), wherein if the second preset syntax element is a first symbol, it indicates that a frame Picture referring to the Picture parameter Set is incomplete. A second preset syntax element of the PPS referencing an incomplete picture may be set to the first symbol. In addition, if the second predetermined syntax element is a third symbol, it indicates that the frame picture referring to the picture parameter set is complete, so the second predetermined syntax element of the PPS which can refer to the complete picture is set to the third symbol. The first symbol and the third symbol may be set according to actual situations, for example, "0" and "1", or "tune" and "false", respectively, and are not limited herein. It is to be understood that the second predefined syntax element may be set to the first symbol only and not to the third symbol, that is, if the second predefined syntax element is the first symbol, it indicates that the picture referenced by the PPS is not complete, otherwise, it indicates that the picture referenced by the PPS is complete.

c) And adding a third preset syntax element to the region header of the region, for example, if the region is slice, the region header is slice header. If the third preset syntax element is the second symbol, it indicates that the image where the region is located is incomplete, so the third preset syntax element is set as the second symbol in the region headers of some or all regions in the incomplete image. In addition, if the third preset syntax element is a fourth symbol, it indicates that the image in which the region is located is complete, so the region headers of all regions in the complete image set the third preset syntax element as the fourth symbol. The second symbol and the fourth symbol may be set according to actual situations, for example, "0" and "1", or "tune" and "false", respectively, and are not limited herein. It is to be understood that the third predefined syntax element may be set to the second symbol only, but not to the fourth symbol, that is, if the third predefined syntax element is the second symbol, it indicates that the image in which the region is located is incomplete, otherwise, it indicates that the image in which the region is located is complete.

Step S260: and performing interframe coding on the residual frame images in the video to be coded by taking at least one frame in the multi-frame images as a reference frame.

In this embodiment, for the remaining frame images in the video to be encoded, except for the multi-frame image, inter-frame encoding is performed with the multi-frame image as a reference, that is, the remaining frame images are P frames, B frames, and the like. Therefore, the residual frame image can be inter-coded in accordance with the normal inter-coding method for P-frames and B-frames. For example, the remaining frame images are inter-coded with the last frame of the multi-frame image as a reference frame. Of course, the co-located regions in the residual frame image may be inter-coded by using different first regions in the multi-frame image as reference regions, respectively, and the coding method of the residual frame image is not limited arbitrarily herein.

The inter-coding of the residual frame image may be performed by using any conventional inter-coding mode, such as a merge (merge) inter-Prediction mode, an Advanced Motion Vector Prediction (AMVP) inter-Prediction mode, a skip inter-Prediction mode, and the like, which is not limited herein.

It is understood that the sequence of steps S240-S250 and step S260 may be switched, or executed simultaneously, and is not limited herein.

By the scheme, all images of the video to be coded are coded to obtain the video code stream. After receiving the video code stream, the receiving end can obtain the image data of the first area in the multi-frame image at least through intra-frame decoding, and further obtain the image data of the second area with at least part of co-location of the multi-frame image and the image data of the residual frame image by taking the first area in the multi-frame image as a reference area through inter-frame decoding. Therefore, the receiving end decodes the video data. Moreover, whether an incomplete picture exists in the multi-frame pictures can be determined through the syntax element, and if the incomplete picture exists, whether the incomplete picture is played can be determined according to a preset convention or by selection of a user. In an application scenario, the receiving end can skip over an incomplete image, for example, if the previous j frames of the video data are incomplete, the playing starts from the j +1 th frame, so that the complete image is directly played, and the situation of discontinuous pictures is avoided.

In order to better explain the coding method of the present application, the following specific coding schemes are provided for illustration:

the first scheme is as follows:

when the I frame needs to be coded, the first M frame images in the video to be coded are obtained, and each frame image in the first M frame images is divided into N areas. Each region may be a Slice or a Tile. In the preceding M-frame image, from the first frame to the current frame, an area that has been encoded in the intra mode is referred to as a first area, and an area that has not been encoded in the intra mode is referred to as a second area.

Starting from the first frame image, several areas are selected as the first area of the first frame, and are coded by adopting an intra-frame mode. The remaining area in the first frame image is called the first frame second area, and the first frame second area can be coded by 2 methods:

1. not coding, namely, the frame only codes the first area of the first frame;

2. setting the pixel values of the second area of the first frame to a certain constant, and encoding the constants in the area by adopting an intra-frame mode.

And coding regions in the second frame image, which are located with the same position as the first region of the first frame, as the second A region of the second frame by adopting an inter-frame mode, wherein the referenceable regions of CTUs in the regions are the first region of the first frame. And selecting a plurality of areas from the rest areas of the second frame as first areas of the second frame to encode by adopting an intra-frame mode.

The remaining area of the second frame image except for the second a area of the second frame and the first area of the second frame is called a second B area of the second frame, and the second B area of the second frame can be encoded by 3 methods:

1. not coding, namely, the frame only codes the first area of the second frame and the second A area of the second frame;

2. setting the pixel values of a second B area of a second frame as a certain constant, and coding the total constants of the area by adopting an intra-frame mode;

3. setting the pixel values of the second B area of the second frame to be the same constant as the pixel values of the second B area of the previous frame (if the previous frame is the first frame, the second B area of the first frame), and encoding by adopting a skip interframe mode.

The second frame to the M-1 th frame encode each region thereof in the manner of the second frame described above. And the like until the M frame image is coded, in the M-1 frame image, N areas are coded in the intra-frame mode, and the positions of the areas which are not coded in the intra-frame mode no longer exist in the image. Therefore, the frames following the mth frame are encoded according to the prior art.

Since the first M-1 frame has an image that is not completely encoded, syntax elements are added to indicate that the image of the frames is not complete after decoding, so that the decoding end can select whether to play the frames.

The addition method of the syntax element includes 3 types:

1. adding a syntax element K to the SPS, wherein when the K is M, the incomplete previous M frames of images in the video sequence is shown;

2. adding a syntax element K in the PPS to indicate whether a frame image referring to the PPS is complete;

3. and adding a syntax element K in the Slice header to indicate whether the frame image of the Slice is complete or not.

In order to avoid picture discontinuity caused by that a playing end does not play a frame with an incomplete coded image, the first M-1 frame used as the I frame can be coded into a code stream by adopting the method. So that the play-back from the M-th frame does not cause picture discontinuity.

The first embodiment is illustrated with reference to fig. 3.

As shown in fig. 3, the first 3 frames of images of the video sequence are obtained, wherein the first frame of image frame0 (hereinafter referred to as frame0) is only intra-coded, so it is an I frame, the second frame of image frame1 and the third frame of image frame2 (hereinafter referred to as frame1 and frame 2) are inter-coded, so it is a P frame, and each frame is divided into 3 slices, i.e. slices 0-2.

The Slice 0 of the frame0 is I Slice and is coded by adopting an intra-frame mode; the pixel values in Slice1 and Slice2 are both set to a constant value of 128 and are encoded in intra mode. Then in frame0, the first region is Slice 0, and the second regions are Slice1 and Slice 2.

Slice 0 of frame1 is P Slice, and is coded by adopting an inter mode, and the reference area of its CTU is Slice 0 of frame0, as shown by an arrow in fig. 3; slice1 is I Slice, and adopts intra-frame mode to encode; the pixel values in Slice2 are all set to a constant value of 128 and are encoded in intra mode. In frame1, the first region is Slice1, the second a region is Slice 0, and the second B region is Slice 2.

Similarly, Slice 0 and Slice1 in frame2 are P slices, their reference regions are shown by arrows in fig. 3, and Slice2 is I Slice. Then in frame2, the first region is Slice2, and the second a region is the remaining slices 0 and Slice 1.

Starting from the third frame image frame3 of the video sequence, the subsequent frames are encoded using prior art techniques. For example, the frame3 is a P frame, and is also divided into 3 slices, and each Slice in the frame3 is encoded in an inter mode by using the collocated Slice of the previous frame2 as a reference region.

There are incomplete images in the above scheme. Therefore, in order to identify whether a certain frame is completely encoded, the syntax element Incomplete _ frames is added in the SPS, which indicates that the previous Incomplete _ frames frame image in the segment of the video sequence is not complete. In this example, if inclusive _ frames is 2, frame0 and frame1 are Incomplete.

Scheme II:

the second scheme allows the coded area in the same frame to be used as the reference area during inter-frame prediction.

When the I frame needs to be coded, the first M frame image in the video to be coded is obtained, and each frame image in the first M frame image is divided into a plurality of areas.

In the former M frames of images, each frame of image selects a part of area as a first area to be coded by adopting an intra-frame mode, and the other areas are used as second areas to be coded by adopting an inter-frame mode. The second region using the inter mode includes a referenceable region of the CTU, which is an encoded region in the same frame, i.e., the first region.

A second embodiment of the present invention is illustrated with reference to fig. 4.

As shown in fig. 4, the first 3 frame images of the video sequence are obtained, wherein the first frame image frame0, the second frame image frame1, and the third frame image frame2 (hereinafter referred to as frame0, frame1, and frame 2) do not need to refer to the out-of-frame information, and are therefore I frames. The Frame0 is divided into two tiles on the left and right, and the Frame1 and the Frame2 are divided into two slices on the upper and lower sides.

Tile 0 of frame0 is I Slice, and intra mode coding is adopted. Tile1 is coded in inter mode, where Tile1 has reference region Tile 0. Then frame0 has a first area of Tile 0 and a second area of Tile 1.

Slice 0 of frame1 is I Slice, and intra-frame mode coding is adopted; slice1 is coded in an inter-frame mode, and the reference region of the Slice1 is Slice 0, as shown by an arrow in fig. 4. Then in frame1, the first region is Slice 0 and the second region is Slice 1.

Similarly, Slice 0 in frame2 is P Slice, and Slice1 is I Slice. The reference region of Slice 0 is Slice1, as shown by the arrow in fig. 4. Then in frame2, the first region is Slice1 and the second region is Slice 0.

Starting from the third frame image frame3 of the video sequence, the subsequent frames are encoded using prior art techniques. For example, frame3 is a P frame, frame3 is a Slice, and frame3 is encoded in inter mode with the previous frame, frame2, as a reference frame.

The third scheme is as follows:

the third scheme is the flexible combination of the first scheme and the second scheme.

The previous M frame images may select the same or different regions as the first region, so as to use intra mode encoding, and it is sufficient to ensure that the combination of regions encoded in the previous M frame images using the intra mode can cover the entire image. As the second region, a region other than intra-frame coding in each frame may be encoded in the following manner:

1. no coding is performed;

2. setting the pixel values of the second area of the first frame as a certain constant, and coding by adopting an intra-frame mode or a skip inter-frame mode;

3. coding by adopting an inter-frame mode by referring to a coded area in the same frame;

the coding modes can be flexibly combined.

And in the M-1 frame image, N areas are coded by adopting an intra-frame mode, and the positions of the areas which are not coded by the intra-frame mode do not exist in the image. Therefore, the frames following the mth frame are encoded according to the prior art.

The third embodiment is illustrated with reference to fig. 5 and 6.

As shown in fig. 5, the first 3 frame images of the video sequence are obtained, wherein the first frame image frame0 (hereinafter referred to as frame0) is independent of the out-of-frame information and is an I frame, and the second frame image frame1 and the third frame image frame2 (hereinafter referred to as frame1 and frame 2) have inter-frame coding and are P frames. Each frame is divided into 4 tiles, and each Tile contains a Slice.

Tie 0 and Tie 2 of frame0 adopt intra-frame coding, and the Slice included in the frame is I Slice. Tile1 and Tile 3 adopt inter-frame coding, and the Slice included therein is P Slice, where the reference regions of CTUs in Tile1 and Tile 3 are Tile 0 and Tile 2, respectively, as shown by arrows in fig. 5. Then frame0 has the first regions Tile 0 and Tile 2 and the second regions Tile1 and Tile 3.

Tie 1 of frame1 adopts intra-frame coding, and the Slice contained in the frame is I Slice; tile 0, Tile 2 and Tile 3 adopt interframe coding, and Slice contained in the interframe coding is P Slice. The reference areas of the CTUs in Tile 0, Tile 2 and Tile 3 are Tile 0, Tile 2 and Tile 3, respectively, of frame0, as indicated by the arrows in fig. 5. Then in frame1, the first region is Tile1, the second a region is Tile 0 and Tile 2, and the second B region is Tile 3.

Similarly, Tile 3 in frame2 uses intra-coding, and Tile 0, Tile 2 and Tile 3 use inter-coding. Then frame1 has the first area Tile 3 and the second a area Tile 0, Tile1 and Tile 2. The second a region herein may be understood as a reference region of the region being the first region or a further reference region thereof being the first region of another frame, so that the image data of the second a region can be obtained by performing inter-frame decoding directly according to the first region of another frame or indirectly according to the first region. For example, Tile 0 of frame1 can be decoded from Tile 0 of the first region of frame0, and Tile 0 of frame2 can be further decoded from Tile 0 of frame1, so it can be called as decoding Tile 0 of frame2 indirectly from Tile 0 of the first region of frame 0.

Starting from the third frame image frame3 of the video sequence, the subsequent frames are encoded using prior art techniques. For example, the frame3 is a P frame, and is also divided into 4 tiles, each Tile contains a Slice. Frame3 is encoded in inter mode, with the previous frame, frame2, respectively, being the reference frame.

As shown in fig. 6, the first 3 frames of images of the video sequence are obtained, wherein the first frame image frame0 (hereinafter referred to as frame0) is only intra-coded, so it is an I frame, the second frame image frame1 and the third frame image frame2 (hereinafter referred to as frame1 and frame 2) have inter-coded, so it is a P frame, and each frame is divided into 3 slices, i.e. slices 0-2.

Slice 0 of frame1 is P Slice, coding is carried out by adopting an inter mode, and the CTU reference area of the P Slice is Slice 0 of frame 0; slice1 is I Slice, and adopts intra-frame mode to encode; slice2 is encoded in inter mode, and includes a CTU reference region of Slice1, as shown by the arrow in fig. 6. In frame1, the first region is Slice1, the second a region is Slice1, and the second B region is Slice 2.

Similarly, Slice 0 and Slice1 in frame2 are P slices, and Slice2 is I Slice, and their reference regions are shown by arrows in fig. 6. Then in frame2, the first region is Slice2 and the second a region is the remaining

slices

0 and 1.

There are incomplete images in the above scheme. Therefore, in order to identify whether a certain frame is completely encoded, a syntax element inclusive _ frames is added to the Sliceheader, which indicates whether the frame image of the slice is Incomplete. In this embodiment, the slice included in frame0 identifies the inclusive _ frames as true to indicate that frame0 is Incomplete, and slices of frames other than frame0 all identify the inclusive _ frames as false to indicate that the remaining frames are complete.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a video encoding device according to the present application. As shown in fig. 7, the apparatus 70 includes an obtaining module 71, a dividing module 72, and an encoding module 73.

The obtaining module 71 is configured to obtain a plurality of frames of images in a video to be encoded;

the dividing module 72 is configured to divide each frame of image in the multiple frames of images into a plurality of regions respectively;

the encoding module 73 is configured to take at least one of the regions in each frame of the image as a first region, and perform intra-frame encoding on the first region; wherein a combination of the first regions in the multi-frame image is capable of overlaying the image.

In some embodiments, the encoding module 73 performs the setting of at least one of the regions in each frame of image as a first region, including: for each frame image of the plurality of frame images: if the first frame image in the multi-frame image is the first frame image, selecting a preset number of areas as the first area; if the selected image is not the first frame image in the multi-frame images, selecting a preset number of areas which are not at the same position as the first area in the selected image as the first area, wherein the selected image is the image of the multi-frame images with the selected first area, and the preset number is greater than or equal to 1 and is less than the number of the areas of the image.

In some embodiments, the encoding module 73 is further configured to: and carrying out no coding or coding in a preset mode on each second area except the first area in each frame of image, wherein the preset mode comprises carrying out homogenization processing on pixel values in the second areas and then carrying out intra-frame coding or carrying out inter-frame coding on the second areas.

In some embodiments, when the encoding module 73 performs encoding on each second region of each frame of image except the first region without encoding or in a preset manner, the performing intra-frame encoding after performing homogenization processing on pixel values in the second region includes: after setting all pixels in the second area to be first preset pixel values, carrying out intra-frame coding on the second area; the inter-coding the second region includes any one of the following steps: performing first inter-frame coding on the second area by taking the first area of the frame as a reference area; after the pixel value in the second area is set to a second preset pixel value, performing second inter-frame coding on the second area by adopting a skip mode, wherein the second preset pixel value is the pixel value set in the second area in a coded image, and the coded image is an image which is coded in the multi-frame image; and performing third inter-frame coding on the second area by taking a preset area of the coded image as a reference area, wherein the preset area is a first area in the coded image or the second area subjected to the first inter-frame coding or the third inter-frame coding.

Further, the encoding module 73 performs the encoding on each second region except for the first region in each frame of image in a non-performing or predetermined manner, and may include: if the image in which the second area is located is the first frame image in the multi-frame images, executing any one of the following steps: not encoding the second region; after setting all pixels in the second area to be first preset pixel values, carrying out intra-frame coding on the second area; and performing first inter-frame coding on the second area by taking the first area of the frame as a reference area.

In some embodiments, the encoding module 73 performs the encoding on each second region except for the first region in each frame of image in a preset manner, specifically including: and determining the code stream cost of each preset mode executed by the second area, and selecting the preset mode with the minimum code stream cost to encode the second area.

In some embodiments, the apparatus may further include an adding module 74 for determining whether the encoded plurality of frames of images are complete; if at least part of frame images in the encoded multi-frame images are incomplete, adding a syntax element to indicate that the at least part of frame images are incomplete.

Wherein the adding module 74 executes the adding syntax element, any of the following steps may be included: determining the number of incomplete images in the multi-frame images, adding a first preset syntax element in a sequence parameter set, and assigning a value to the first preset syntax element as the number of the incomplete images; adding a second preset syntax element in the image parameter set, wherein if the second preset syntax element is a first symbol, the second preset syntax element indicates that a frame image referring to the image parameter set is incomplete; and adding a third preset syntax element in the region header of the region, wherein if the third preset syntax element is a second symbol, the image in which the region is located is incomplete.

The adding module 74 may perform the determining whether the plurality of frames of images are complete, and may include: if the coded image has an area which meets the preset coding condition, determining that the image is incomplete; and performing intra-frame coding on the region after the region is not coded or pixels in the region are all set to be a first preset pixel value, or performing second inter-frame coding on the region by adopting a skip mode after the pixel values in the region are set to be a second preset pixel value, wherein the second preset pixel value is a pixel value set by the region in the coded image.

In some embodiments, the encoding module 73 is further configured to perform inter-frame encoding on the remaining frame images in the video to be encoded by using at least one frame of the multi-frame image as a reference frame.

In some embodiments, the multi-frame image is a previous multi-frame image in the video to be encoded.

In some embodiments, the division of the area of each of the plurality of frames of images is the same or different.

In some embodiments, the region comprises at least one CTU, and the region is Slice or Tile.

The specific execution of each module of the apparatus may refer to the corresponding steps of the method embodiments, which are not described herein again.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of an encoder of the present application. As shown in fig. 8, the encoder 80 includes a processor 81 and a memory 82 coupled to the processor 81.

The memory 82 stores program instructions for implementing the video encoding method or the encoding method described in any of the above embodiments. Processor 81 is operative to execute program instructions stored in memory 82 to encode video to be encoded.

The processor 81 may also be referred to as a CPU (Central Processing Unit). The processor 81 may be an integrated circuit chip having signal processing capabilities. Processor 81 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be appreciated that the processor 81 may also execute program instructions stored in the memory 82 to decode a received video bitstream, and that the decoding process may refer to existing decoding methods. For example, when a video code stream encoded by the above video encoding method is received, a previous multi-frame image serving as an I frame in the video may be decoded by using a corresponding decoding method, and the remaining frame images in the video may be decoded by using at least a first region in the previous multi-frame image, thereby obtaining video data. In addition, whether an incomplete image exists in the previous multi-frame image can be determined according to the syntax element of the video, and if the incomplete image exists, whether the incomplete image is played is determined according to preset convention or user selection. For example, the incomplete picture can be skipped and the video can be played directly from the first complete picture to avoid picture discontinuity.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a memory device according to the present application. The storage device of the embodiment of the present application stores program instructions 91 capable of implementing all the methods described above, where the program instructions 91 may be stored in the storage device in the form of a software product, and include several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the above embodiment, a plurality of frames of images in a video to be encoded are divided into a plurality of regions, and at least one region in each frame of image is intra-coded; the combination of the regions for intra-frame coding in the multi-frame image can cover the image, so that the code streams of the intra-frame coding of different regions are distributed into multiple frames by respectively performing the intra-frame coding on partial regions of the multi-frame image.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. A video encoding method, comprising:

acquiring a multi-frame image in a video to be coded;

dividing each frame of image in the multi-frame image into a plurality of areas respectively;

taking at least one region in each frame of image as a first region, and carrying out intra-frame coding on the first region; wherein a combination of the first regions in the multi-frame image is capable of overlaying the image.

2. The method according to claim 1, wherein the taking at least one of the regions in each frame of image as a first region comprises:

for each frame image of the plurality of frame images: if the first frame image in the multi-frame image is the first frame image, selecting a preset number of areas as the first area; if the selected image is not the first frame image in the multi-frame images, selecting a preset number of areas which are not at the same position as the first area in the selected image as the first area, wherein the selected image is the image of the multi-frame images with the selected first area, and the preset number is greater than or equal to 1 and is less than the number of the areas of the image.

3. The method of claim 1, further comprising:

and carrying out no coding or coding in a preset mode on each second area except the first area in each frame of image, wherein the preset mode comprises carrying out homogenization processing on pixel values in the second areas and then carrying out intra-frame coding or carrying out inter-frame coding on the second areas.

4. The method of claim 3, wherein the homogenizing the pixel values in the second region and then performing intra-coding comprises:

after setting all pixels in the second area to be first preset pixel values, carrying out intra-frame coding on the second area;

the inter-coding the second region includes any one of the following steps:

performing first inter-frame coding on the second area by taking the first area of the frame as a reference area;

after the pixel value in the second area is set to a second preset pixel value, performing second inter-frame coding on the second area by adopting a skip mode, wherein the second preset pixel value is the pixel value set in the second area in a coded image, and the coded image is an image which is coded in the multi-frame image;

and performing third inter-frame coding on the second area by taking a preset area of the coded image as a reference area, wherein the preset area is a first area in the coded image or the second area subjected to the first inter-frame coding or the third inter-frame coding.

5. The method according to claim 4, wherein the encoding, without or in a predetermined manner, each of the second regions except the first region in each of the frames of the images comprises:

if the image in which the second area is located is the first frame image in the multi-frame images, executing any one of the following steps: not encoding the second region; after setting all pixels in the second area to be first preset pixel values, carrying out intra-frame coding on the second area; performing first inter-frame coding on the second area by taking the first area of the frame as a reference area;

if the image in which the second area is located is the first frame image in the multi-frame image, determining whether the co-located area of the second area in the previous frame image is the preset area; if so, performing third inter-frame coding on the second area by taking a preset area of the previous frame image as a reference area; otherwise, executing any one of the following steps: not encoding the second region; after setting all pixels in the second area to be first preset pixel values, carrying out intra-frame coding on the second area; performing first inter-frame coding on the second area by taking the first area of the frame as a reference area; and after the pixel value in the second area is set as a second preset pixel value, performing second inter-frame coding on the second area by adopting a skip mode.

6. The method according to claim 3, wherein said encoding each of the second regions of each of the frames of images except the first region in a predetermined manner comprises:

and determining the code stream cost of each preset mode executed by the second area, and selecting the preset mode with the minimum code stream cost to encode the second area.

7. The method of claim 1, further comprising:

determining whether the encoded multi-frame image is complete;

if at least part of frame images in the encoded multi-frame images are incomplete, adding a syntax element to indicate that the at least part of frame images are incomplete.

8. The method according to claim 7, wherein said adding syntax elements comprises any of the following steps:

determining the number of incomplete images in the multi-frame images, adding a first preset syntax element in a sequence parameter set, and assigning a value to the first preset syntax element as the number of the incomplete images;

adding a second preset syntax element in the image parameter set, wherein if the second preset syntax element is a first symbol, the second preset syntax element indicates that a frame image referring to the image parameter set is incomplete;

and adding a third preset syntax element in the region header of the region, wherein if the third preset syntax element is a second symbol, the image in which the region is located is incomplete.

9. The method of claim 7, wherein said determining whether the plurality of frames of images are complete comprises:

if the coded image has an area which meets the preset coding condition, determining that the image is incomplete;

and performing intra-frame coding on the region after the region is not coded or pixels in the region are all set to be a first preset pixel value, or performing second inter-frame coding on the region by adopting a skip mode after the pixel values in the region are set to be a second preset pixel value, wherein the second preset pixel value is a pixel value set by the region in the coded image.

10. The method of claim 1, further comprising:

and performing interframe coding on the residual frame images in the video to be coded by taking at least one frame in the multi-frame images as a reference frame.

11. The method according to claim 1, wherein the multi-frame image is a previous multi-frame image in the video to be encoded.

12. The method according to claim 1, wherein the division of the area of each of the images in the plurality of frames of images is the same or different.

13. The method of claim 1, wherein the region comprises at least one coding tree unit, and wherein the region is a slice or a rectangular region.

14. A video encoding apparatus, comprising:

the acquisition module is used for acquiring multi-frame images in a video to be coded;

the dividing module is used for dividing each frame of image in the multi-frame images into a plurality of areas respectively;

the coding module is used for taking at least one region in each frame of image as a first region and carrying out intra-frame coding on the first region; wherein a combination of the first regions in the multi-frame image is capable of overlaying the image.

15. An encoder comprising a processor, a memory coupled to the processor, wherein,

the memory stores program instructions for implementing the method of any one of claims 1-13;

the processor is configured to execute the program instructions stored by the memory to encode video to be encoded.

16. A storage device storing program instructions executable by a processor to perform the method of any one of claims 1 to 13.