WO2013155828A1

WO2013155828A1 - Video image code stream processing method and device

Info

Publication number: WO2013155828A1
Application number: PCT/CN2012/083682
Authority: WO
Inventors: 李斌; 李厚强; 杨海涛
Original assignee: 华为技术有限公司
Priority date: 2012-04-16
Filing date: 2012-10-29
Publication date: 2013-10-24
Also published as: CN103379320B; CN103379320A

Abstract

Embodiments of the present invention provide a video image code stream processing method and device. The video image code stream processing method comprises: extracting a code stream generated by an encoding device. The code stream comprises clean random access CRA image information. The CRA image information meets a constraint condition of a CRA image, and the constraint condition comprises that the CRA image has to be a reference image. According to the embodiments of the present invention, on the basis of an HEVC standard draft in the prior art, the constraint condition of the CRA image is improved, that is, on the basis of the constraint condition for the CRA image in the prior art, the CRA image is compulsively constrained as the reference image, so as to avoid calculation abnormity and decoding abnormity of the POC.

Description

The present invention relates to image processing technologies, and in particular, to a video image code stream processing method and apparatus. Background technique

High Efficiency Video Coding (hereafter referred to as HEVC) is a new generation of video coding standards to meet the needs of the growing demand for video applications, especially for high definition video.

The goal of HEVC design is to save 50% code rate compared to previous generation video coding standards, such as Advanced Video Coding (AVC), when subjective effects are similar. When using the objective evaluation criteria based on Peak Signal Noise Ratio (PSNR), the compression performance of HEVC relative to AVC is as follows: Under the test conditions of all intra-frame images, the code rate is saved by about 23%; The code rate savings are about 33% under test conditions that support random access; the code rate savings are about 41% under low latency test conditions. Compared with the objective performance improvement of HEVC, the subjective performance improvement of HEVC is more prominent. The main reason is that the improved interframe prediction and loop filtering techniques in HEVC have an obvious effect on improving subjective performance.

Due to the superior performance of HEVC, it is foreseeable that HEVC will become a very successful video coding standard for the new generation in the near future. Summary of the invention

Embodiments of the present invention provide a video image code stream processing method and device.

An embodiment of the present invention provides a video image code stream processing method, including:

A code stream generated by the encoding device is extracted, the code stream containing pure random access CRA image information, the CRA image information satisfying a constraint condition of the CRA image and the constraint condition includes the CRA image must be a reference image.

Correspondingly, an embodiment of the present invention provides a decoding device, including:

An extracting module, configured to extract a code stream generated by the encoding device, where the code stream includes pure random access CRA image information, where the CRA image information satisfies a constraint condition of the CRA image and The constraints include that the CRA image must be a reference image.

According to the above embodiment of the present invention, on the basis of the existing HEVC standard document, the constraint condition of the CRA image is improved, that is, the CRA image is forced to be a reference image based on the existing constraints on the CRA image. After extracting the code stream generated by the encoding device, the decoding device may determine whether the code stream includes a CRA image, and if yes, determine whether the attribute of the CRA image in the code stream is a reference image according to a constraint condition of the CRA image, if The reference image may determine that the code stream is a legal code stream, and the CRA image may be used to assist POC derivation of subsequent images, otherwise the code stream is an illegal code stream, and the CRA image cannot be used to assist POC derivation of subsequent images. , thereby avoiding POC calculation anomalies and decoding anomalies.

An embodiment of the present invention provides another video image code stream processing method, including:

Determining whether the picture sequence number POC of the sequentially decoded pictures is arranged in non-descending order; if so, determining the POC important bits POCMsb of the current picture by the following method:

Obtaining a POC non-significant bit of the current image from the code stream POCLsb;

If the POCLsb of the current image is smaller than the POCLsb of the previous image, the POCMsb of the current image is equal to the sum of POCMsb and the maximum POCLsb of the previous image, wherein the maximum POCLsb=2N, where N is the bit width of the POCLsb;

Otherwise, the POCMsb of the current image is equal to the POCMsb of the previous image;

The POC of the current image is determined according to the POCMsb of the current image and the POCLsb of the current image.

a first determining module, configured to determine whether a picture sequence number POC of the sequentially decoded images is arranged in a non-descending order;

The processing module is configured to determine POCsb of the current image by using the following method if it is arranged in non-descending order:

Otherwise, the POCMsb of the current image is equal to the POCMsb of the previous image; And a second determining module, configured to determine a POC of the current image according to the POCMsb of the current image and the POCLsb of the current image.

In the above embodiment of the present invention, the decoding device may first determine whether POCs of sequentially decoded images are arranged in non-descending order; if they are arranged in non-descending order, the improved derivation manner may be used to determine POCMsb of the current image, that is, from the code stream. Obtaining a POC non-significant bit POCLsb of the current image, if the POCLsb of the current image is smaller than the POCLsb of the previous image, the POCMsb of the current image is equal to the sum of the POCMsb of the previous image and the maximum POCLsb, otherwise, the POCMsb of the current image is equal to the prevPicOrderCntMsb; Finally, the POC of the current image is determined according to the POCMsb of the current image and the POCLsb of the current image. Therefore, the decoding device can determine the output order based on the POC of the current image and the POC of each image in the DPB. Therefore, in the embodiment of the present invention, the POCs of the sequentially decoded images are arranged in a non-descending order, which simplifies the derivation of the POCMsb, improves the processing efficiency, and, based on the improved derivation, can remove any two of the activated images. The size of the POC difference of the image is limited, so that the bit width of POCLsb in the code stream can be reduced, thereby reducing the transmission overhead. Among them, the active image refers to the current image and all the images in the DPB that are marked for short-term reference and the images to be output.

Extracting a code stream generated by the encoding device, wherein the first class syntax element and the second class syntax element are placed in a neighboring position in the code stream, or the second class syntax element and the third class syntax element are placed in the code In a neighboring position in the stream, the first type of syntax element refers to a syntax element including time layer quantity information, and the second type of syntax element refers to a syntax element that includes code stream feature information of each time layer, The third type of syntax element refers to a syntax element that contains information about the size of the storage space occupied by each image;

At least two types of syntax elements are extracted from the neighboring locations of the codestream.

Correspondingly, an embodiment of the present invention provides a processing device, including:

An extraction module, configured to extract a code stream generated by the encoding device, where the first class syntax element and the second class syntax element are placed in a neighboring position in the code stream, or the second class syntax element and the third class syntax element Placed in a neighboring position in the code stream, the first type of syntax element refers to a syntax element including time layer quantity information, and the second type of syntax element refers to a code stream characteristic information including each time layer. a syntax element, the third type of syntax element is meant to contain each picture a syntax element like the size of the storage space occupied;

And an extracting module, configured to extract at least two types of grammatical elements from the neighboring locations of the code stream.

In the foregoing embodiment of the present invention, the decoding device or the media gateway may extract a code stream generated by the encoding device, where the first type of syntax element and the second type of syntax element are placed in a neighboring position in the code stream, or The second type of syntax element and the third type of syntax element are placed in the adjacent position in the code stream. Therefore, the decoding device or the media gateway does not need to additionally parse other irrelevant syntax elements when extracting each time layer information, but All the syntax elements of each time layer information can be continuously extracted, thereby improving the processing efficiency of the time layer information.

Extracting syntax elements max_dec_pic_buffering_diff[i] and num_reorder_pics_diff[i] in the code stream generated by the encoding device, where the syntax element max_dec_pic_buffering_diff[i] indicates that the decoding time layer identifier does not exceed i The size of the decoded image buffer required by the substream is different from the size of the decoded image buffer required by the decoding time layer identifier not exceeding i-1, and the syntax element num_reorder_pics_diff[i] represents the time layer. The difference between the number of inverted images in the substream that does not exceed i and the number of inverted images in the substream that does not exceed i-1;

If i is 0, then MaxDecPicBuffering[i] and NumReorderPics[i] are obtained in the following way:

MaxDecPicBuffering[i] = max— dec— pic— buffering— diff[i];

NumReorderPics[i] = num_reorder_pics_diff [i];

If i is greater than 0, use MaxDecPicBuffering[i] and NumReorderPics[i] as follows:

MaxDecPicBuffering[i]=MaxDecPicBuffering[i-l]

+max— dec— pic— buffering— diff[i];

NumReorderPics[i] = NumReorderPics [i-1] + num_reorder_p ic s_dif f [i] ; where MaxDecPicBuffering[i] represents the size of the storage space required for the iB of the reference decoder HRD for the i-th temporal layer stream, NumReorderPics [i] represents the maximum number of time-series inverted images in the i-th temporal layer code stream.

Correspondingly, an embodiment of the present invention provides a decoding device, including: An extracting module, configured to extract syntax elements max_dec_pic_buffering_diff[i] and num_reorder_pics_diff[i] in the code stream generated by the encoding device, wherein the syntax element max_dec_pic- Buffering_diff[i] indicates the difference between the size of the decoded picture buffer required for the sub-code stream whose decoding time layer identifier does not exceed i and the size of the decoded picture buffer required for the sub-code stream whose decoding time layer identifier does not exceed i-1 Value, syntax element num_reorder_pics_diff[i] indicates the difference between the number of inverted images in the subcode stream whose time layer identifier does not exceed i and the number of inverted images in the subcode stream whose time layer identifier does not exceed i-1 value;

Differential decoding module, if i is 0, obtain MaxDecPicBuffering[i] and NumReorderPics[i] in the following way:

MaxDecPicBuffering[i] = max— dec— pic— buffering— diff[i];

NumReorderPics[i] = num_reorder_pics_diff [i];

MaxDecPicBuffering[i]=MaxDecPicBuffering[i-l]

+max— dec— pic— buffering— diff[i];

In the above embodiment of the present invention, the value of the syntax element indicating the size of the decoded image buffer required for the sub-code stream not exceeding i of the decoding time layer identifier and the number of inverted images in the sub-code stream indicating that the time layer identifier does not exceed i are utilized. The values of the syntax elements are non-decreasing, and the two syntax elements are differentially encoded to reduce the bits required to represent the syntax elements in the SPS, thereby improving compression efficiency. DRAWINGS

The drawings used in the embodiments or the description of the prior art are briefly described. It is obvious that the drawings in the following description are some embodiments of the present invention, and are not creative to those skilled in the art. Under the premise of labor, you can also obtain other The figure.

1 is a flowchart of Embodiment 1 of a video image code stream processing method according to the present invention;

2 is a flowchart of Embodiment 2 of a video image code stream processing method according to the present invention;

3 is a flowchart of Embodiment 3 of a video image code stream processing method according to the present invention;

4 is a flowchart of Embodiment 4 of a video image code stream processing method according to the present invention;

FIG. 5 is a schematic structural diagram of Embodiment 2 of a decoding device according to the present invention;

6 is a schematic structural diagram of an embodiment of a processing device according to the present invention;

FIG. 7 is a schematic structural diagram of Embodiment 3 of a decoding device according to the present invention. The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. The embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

An embodiment of the present invention provides a video image code stream processing method, where the method includes: extracting a code stream generated by an encoding device, where the code stream includes Clean Random Access (CRA) image information, The CRA image information satisfies the constraints of the CRA image and the constraints include that the CRA image must be a reference image.

The technical solution of the embodiment improves the constraint condition of the CRA image on the basis of the existing HEVC standard document, that is, forcibly constrains the CRA image into a reference image based on the existing constraints on the CRA image, thereby making the The CRA image can be used to assist the picture order number (Picture Order Count, hereinafter referred to as POC) derivation of the subsequent image, otherwise the code stream is an illegal code stream, and the CRA image cannot be used to assist the POC derivation of the subsequent image, thereby avoiding the POC. Calculate exceptions and decode exceptions.

The above technical solution will be described in detail below using a specific embodiment.

1 is a flowchart of Embodiment 1 of a video image code stream processing method according to the present invention. As shown in FIG. 1, the method in this embodiment may include:

Step 101: Extract a code stream generated by the encoding device, where the code stream includes CRA image information. Step 102: Determine, according to a constraint condition of the CRA image, whether the code stream is a legal code stream. This constraint includes that the CRA image must be a reference image.

Specifically, HEVC supports random access using both Instantaneous Decoding Refresh (IDR) images and CRA images.

For a CRA image, it can only be an I-frame image; for an image whose decoding order and output order are after the CRA image, it is not allowed to use the decoding order or the output order to predict the image before the CRA image, and the decoding order is The image before the CRA image, whose output order is also before the CRA image. Therefore, the CRA image can be normally decoded without relying on other images, and the decoding operation of the image after the CRA image does not depend on any image before the CRA image. Therefore, the CRA image can be used as a starting point to decode a video sequence; and before the CRA image The image does not affect the output of the image after the CRA image, so the CRA image can be used as a starting point to output a video sequence.

In summary, the CRA image can be decoded and output as the first image in a video sequence. The first image of a video sequence specified in HEVC can only be an IDR image or a CRA image, so if an image is neither an IDR image nor a CRA image, it must not be the first image in the video sequence.

Further, for each image, it has a corresponding POC, which is used to indicate the temporal relationship between an image and other images in a Decoded Picture Buffer (DPB). The timing relationship specifies the output order of all images in the DPB. In general, images with a small POC number should be output before the image with a large POC number. Therefore, in deriving the POC of the current image, it is necessary to first determine the previous image of the current image, then acquire the POC information of the previous image and apply the POC information of the previous image to derive the POC of the current image. Wherein, the previous image refers to the previous image in the decoding order that satisfies the following two conditions: Condition 1, the syntax element of the image temporal_id is 0, and the syntax element temporal_id can be from the network abstraction layer unit (Network The Abstract Layer Unit (hereinafter referred to as NALU) is obtained from the header field (hereinafter referred to as header); Condition 2, the image is a reference image, and the syntax element nal_ref_flag in the NALU header can be used to determine whether an image is a reference. Image, in the existing HEVC standard document, nal_ref_flag of 1 indicates that the image is a reference image, and nal_ref_flag of 0 indicates that the image is not a reference image.

Therefore, the first decoded image in a video sequence, whether it is an IDR image or a CRA image, must satisfy both of the above conditions to ensure subsequent frames in the video sequence. Any image can find its corresponding previous image.

However, it has been found in practice that if the first image in a video sequence is a CRA, it often happens that subsequent images cannot find their corresponding previous images.

After in-depth research, it is found that the existing HEVC standard document has a temporal-id for the CRA image, which must be 0, otherwise the code stream containing the CRA image is an illegal code stream. However, for the nal_ref_flag of the CRA image, the existing HEVC standard document does not limit its nal-ref_flag must be 1. In this case, if a CRA image is used as the first image of a video sequence, and the nal_ref_flag of the CRA image is 0, that is, the CRA image is not a reference image, then each of the CRA images The image cannot find its corresponding previous image, causing the POC calculation to be abnormal and eventually causing the decoding to be abnormal.

In view of the above problems, the present embodiment improves the constraint condition of the CRA image based on the existing HEVC standard document, that is, the constraint CRA image is forced to be a reference image based on the existing constraints on the CRA image. The decoding device may extract the code stream generated by the encoding device, and then determine whether the code stream includes CRA image information, and if so, determine whether the attribute of the CRA image in the code stream is a reference image according to a constraint condition of the CRA image, if Referring to the image, the code stream can be considered as a legal code stream, which can be used as a POC derivation of the subsequent image, otherwise the code stream is an illegal code stream. In practical applications, the decoding device can forcibly constrain the CRA image into a reference image, thereby performing fault tolerance processing on the illegal code stream.

The entire video stream is encapsulated in multiple NALUs, and each NALU can contain all or part of an image. Therefore, in the specific implementation of the embodiment, for the NALU corresponding to the CRA image, the syntax element nal_unit_type in the NALU header may be used to indicate that the information contained in the NALU belongs to the CRA image, and the NALU header is in the NALU header. Another syntax element nal_ref_flag is 1, which can indicate that the CRA image is a reference image.

After the decoding device extracts the NALU generated by the encoding device, it may determine, according to the syntax element nal_unit_type in the NALU header, whether the information included in the NALU belongs to the CRA image, and if it belongs to the CRA image, according to the NALU header. Another syntax element nal_ref_flag determines whether the CRA image is a reference image, and further determines whether the code stream containing the NALU is a legal code stream.

In this embodiment, on the basis of the existing HEVC standard document, the improvement of the CRA image is improved. The beam condition, that is, based on the existing constraints for the CRA image, forces the CRA image to be a reference image. After extracting the code stream generated by the encoding device, the decoding device may determine whether the code stream includes a CRA image, and if yes, determine whether the attribute of the CRA image in the code stream is a reference image according to a constraint condition of the CRA image, if The reference image may determine that the code stream is a legal code stream, and the CRA image may be used to assist POC derivation of subsequent images, otherwise the code stream is an illegal code stream, and the CRA image cannot be used to assist POC derivation of subsequent images. , thereby avoiding POC calculation anomalies and decoding anomalies.

2 is a flowchart of Embodiment 2 of a video image code stream processing method according to the present invention. As shown in FIG. 2, the method in this embodiment may include:

Step 201: Determine whether POCs of sequentially decoded images are arranged in non-descending order; Step 202, if yes, determine POCMsb of the current image by using: obtaining POCLsb of the current image from the code stream;

If the POCLsb of the current image is smaller than the POCLsb of the previous image, the POCMsb of the current image is equal to the sum of the POCMsb of the previous image and the maximum POCLsb, wherein the maximum POCLsb=2 ^N , where N is the bit width of the POCLsb;

Step 203: Determine a POC of the current image according to the POCMsb of the current image and the POCLsb of the current image.

As mentioned before, for each image, it has a corresponding POC, which is used to indicate the timing relationship between an image and other images in the DPB, and the timing relationship defines all the images in the DPB. The order of the output. In general, images with a small POC number should be output before the image with a large POC number. Therefore, in deriving the POC of the current image, it is necessary to first determine the previous image of the current image, then acquire the POC information of the previous image and apply the POC information of the previous image to derive the POC of the current image.

Specifically, the POC of an image is determined by a combination of a Major Significant Bit (MSB) and a Least Significant Bit (LSB). In this embodiment, the MSB can be referred to as POCMsb, and the LSB can be referred to as POCLsb.

The POC of the current image can be calculated as follows:

PicOrderCntVal = PicOrderCntMsb + pic—order— cnt—lsb

Where pic_order_cnt_lsb is the POCLsb of the current image obtained from the code stream, and PicOrderCntMsb is the POCMsb of the current image derived by the decoding device.

PicOrderCntVal is the POC of the current image.

Existing HEVC standard documents are derived using the following method to obtain PicOrderCntMsb:

1. Obtain the POCLsb of the previous image (hereinafter referred to as: prevPicOrderCntLsb) and the POCMsb of the previous image (hereinafter referred to as: prevPicOrderCntMsb)

If the current image is an IDR image, or if the current image is the first image in the entire sequence and is a CRA image, then prevPicOrderCntLsb and prevPicOrderCntMsb are both set to 0;

Otherwise, the previous image of the current image is determined, and POCMsb and POCLsb of the previous image are obtained as prevPicOrderCntLsb and prevPicOrderCntMsb, respectively, wherein the definition of the previous image has been described in the previous embodiment, and will not be described herein.

2, according to prevPicOrderCntLsb and prevPicOrderCntMsb, determine PicOrderCntMsb

Specifically, the process of determining PicOrderCntMsb can be implemented using the operations represented by the pseudo code below:

If((pic_order_cnt_lsb<prevPicOrderCntLsb)&&

((prevPicOrderCntLsb-pic_order_cnt_lsb)>=(MaxPicOrderCntLsb/2))) PicOrderCntMsb = prevPicOrderCntMsb + MaxPicOrderCntLsb else if((pic_order_cnt_lsb>prevPicOrderCntLsb)&&

((pic_order_cnt_lsb-prevPicOrderCntLsb)>(MaxPicOrderCntLsb/2

)))

PicOrderCntMsb=prevPicOrderCntMsb-MaxPicOrderCntLsb else

PicOrderCntMsb = prevPicOrderCntMsb

Among them, the maximum POCLsb (hereinafter referred to as: MaxPicOrderCntLsb) is determined by the bit width N of POCLsb, and MaxPicOrderCntLsb=2 ^N . For example, POCLsb has a bit width of 4, that is, POCLsb釆 is represented by 4 bits, then MaxPicOrderCntLsb=2 ⁴ =16, and POCLsb釆 is represented by 8 bits, and MaxPicOrderCntLsb=2 ⁸ =64. The meanings of the above three branch languages are:

If the pic_order_cnt_lsb of the current image obtained from the code stream is smaller than the prevPicOrderCntLsb of the previous image, and the difference between prevPicOrderCntLsb and pic_order_cnt_lsb is greater than or equal to MaxPicOrderCntLsb/2, then PicOrderCntMsb = prevPicOrderCntMsb + MaxPicOrderCntLsb;

If the pic_order_cnt_lsb of the current image obtained from the code stream is greater than the prevPicOrderCntLsb of the previous image, and the difference between pic_order_cnt_lsb and revPicOrderCntLsb is greater than MaxPicOrderCntLsb/2, then PicOrderCntMsb = prevPicOrderCntMsb - MaxPicOrderCntLsb;

Otherwise, PicOrderCntMsb = prevPicOrderCntMsb.

Therefore, the effect of deriving PicOrderCntMsb according to the prior art is that the POC of the current image is larger than POC-MaxPicOrderCntLsb/2 of the previous image and less than or equal to POC+MaxPicOrderCntLsb/2 of the previous image.

For example, suppose that POCLsb is represented by 4 bits, that is, MaxPicOrderCntLsb is 16, and adjacent three images A, B, and C arranged in order of output timing in the video bitstream are reference images and temporal_id is 0, these three The POCLsb of the image are: POCLsb(A)=2, POCLsb(B)=7, POCLsb(C)=ll.

Assuming that all three images are correctly received, decoded, and placed in the DPB for output, the decoding device determines the output order of the three images as follows:

1. Determine the POC of image A

The POCMsb of the image A is referred to as POCMsb(A), and the POCMsb(A) can be determined during the process of decoding the image A.

Therefore, the POC of image A is:

POC(A)=POCMsb(A) + POCLsb(A) = POCMsb(A)+2

2. Determine the PO of the image B

Select B as the current image, so pic_order- cnt-lsb = POCLsb(B) = 7; Determine that the previous image of image B is image A, therefore, prevPicOrderCntLsb = 2, prevPicOrderCntMsb = POCMsb(A).

Since pic_order- cnt-lsb > prevPicOrderCntLsb and pic_order- cnt-lsb- prevPicOrderCntLsb=5 < MaxPicOrderCntLsb/2 , because JH, Calculating PicOrderCntMsb=prevPicOrderCntMsb according to the third conditional branch above, then inferring POCMsb(B)=POCMsb(A);

Therefore, the POC of image B is:

POC(B) = POCMsb(B) + pic-order- cnt-lsb = POCMsb(A)+7.

3. Determine the PO of the image C

Select C as the current image, so pic_order—cnt—lsb = POCLsb(C) = 1 1 ; Determine that the previous image of image C is image B, therefore, prevPicOrderCntLsb = 7, prevPicOrderCntMsb = POCMsb(B) = POCMsb(A)

Since pic_order_cnt_lsb > pre vP ic OrderCntLsb and pic_order_cnt_lsb- prevPicOrderCntLsb=4 < MaxPicOrderCntLsb/2 , because JH, PicOrderCntMsb=prevPicOrderCntMsb is calculated according to the third conditional branch above, then POCMsb (C) can be inferred. ) = POCMsb (B) = POCMsb (A);

Therefore, the POC of image C is:

POC(C) = POCMsb(C) + pic-order- cnt-lsb = POCMsb(A)+l 1.

Because the POC values of the three images calculated by the decoding device are POC(A) =

POCMsb(A) + 2 , POC(B) = POCMsb(A) + 7 , POC(C) = POCMsb(A) + 1 1 , so the output order of these three images is A->B->C.

However, if B is not correctly received by the decoding device, only images A and C are correctly received, decoded by the decoding device and placed in the DPB for output. The decoding device determines the order in which the two images are output as follows:

1. Determine the POC of image A

Therefore, the POC of image A is:

POC(A)=POCMsb(A) + POCLsb(A) = POCMsb(A)+2

2. Determine the PO of the image C

Select C as the current image, so pic_order- cnt-lsb = POCLsb(C) = 1 1 ; Determine that the previous image of image C is image A, therefore, prevPicOrderCntLsb = 2, prevPicOrderCntMsb = POCMsb(A);

Since pic_order- cnt-lsb > pre vP ic OrderCntLsb and pic_order_cnt_lsb-prevPicOrderCntLsb=9 > MaxPicOrderCntLsb/2 , because JH, according to the second conditional branch calculation PicOrderCntMsb = prevPicOrderCntMsb - MaxPicOrderCntLsb, then POCMsb(C) = POCMsb(A) - 16 can be inferred;

Therefore, the POC of image C:

POC(C) = POCMsb(C) + pic-order- cnt-lsb = POCMsb(A) - 16 + 1 1 =

POCMsb(A) _ 5.

Since the calculated POCs of images A and C are POC(A) = POCMsb(A) + 2 and POC(C) = POCMsb(A) - 5 , respectively, the output order of the two images is C->A. Obviously, in the case where the image B is lost, the decoding device cannot derive the correct output order of the images A and C according to a given method.

In order to solve the above problem of image out-of-order output due to image loss, the POC difference between any two images in the existing HEVC limit activation image must be smaller than MaxPicOrderCntLsb/2.

For example, in the above example of missing image B, if the difference between POCLsb(C) and POCLsb(A) is less than MaxPicOrderCntLsb/2, POCMsb(C) = POCMsb(A), so that the image A and C output order does not occur. Reversed situation.

However, it has been found in practice that the existing HEVC-defined POC derivation process applies to any coding structure, that is, allows both the POC of the current image to be larger than the POC of the previous image, and allows the POC of the current image to be POC of the previous image. small. For example, in the calculation formula for deriving PicOrderCntMsb, the second conditional branch corresponds to the case where the current image POCMsb is smaller than the previous image POCMsb. At this time, considering the value range of POCLsb in the range of [0, MaxPicOrderCntLsb), compare the POC of the last calculated current image with the POC of the previous image, obviously the POC of the current image is smaller than the POC of the previous image.

However, when there is no POC of the current image smaller than the POC of the previous image, that is, the POCs of all sequentially decoded images are arranged in non-descending order, at this time, the second conditional branch of the above-mentioned derivation PicOrderCntMsb is redundant, which does not need The POC difference between any two images in the restricted activation image must be less than MaxPicOrderCntLsb/2.

In view of the above analysis, the present embodiment improves the derivation process of PicOrderCntMsb for the POC of sequentially decoded images in non-descending order, and the specific pseudo code is expressed as follows: If( pic— order— cnt—lsb < prevPicOrderCntLsb )

PicOrderCntMsb = pre vP ic OrderCntMsb + MaxPicOrderCntLsb else

PicOrderCntMsb = pre vP ic OrderCntMsb

The meaning of the above grammar is:

If the pic_order_cnt_lsb of the current image obtained from the code stream is smaller than the prevPicOrderCntLsb of the previous image, Bay ¹ ] PicOrderCntMsb= prevPicOrderCntMsb + MaxPicOrderCntLsb;

No Bay ¹ J , PicOrderCntMsb = prevPicOrderCntMsb.

In a specific implementation, the value of the syntax element num_reorder_pics indicating the maximum number of timing inverted images in the Sequence Parameter Set (SPS) may be used to indicate whether the POC of the sequentially decoded image is not Descending order | J. The timing inverted image refers to an image in which the decoding order is before a certain image and the output order is after that image. For example, using num_reorder_pics to 0 means that the POCs of sequentially decoded images are arranged in non-descending order, and using num_reorder_pics greater than 0 means that the POCs of sequentially decoded images are not arranged in non-descending order. It should be noted that the encoding of the syntax element num_reorder_pics has been modified in the fourth embodiment of the subsequent method. The modified variable MaxDecPicBuffering has the same physical meaning as the syntax element num_reorder_pics before tampering. Therefore, under the condition of the fourth embodiment, it is determined whether the POCs of the sequentially decoded images are arranged in non-descending order according to the value of the variable MaxDecPicBuffering.

It should be noted that, since num_reorder_pics is transmitted according to different time layers, the variables without the array group label in this embodiment all represent the parameter information of the current decoding time layer. If the current layer 1 is decoded, Bay ¹ J num — reorder_pics represents num — reorder — pics [l] , and the remaining variables are similar.

When determining the output order of the current image and the DPB images, the decoding device may first determine whether the POCs of the sequentially decoded images are arranged in non-descending order according to the value of num_reorder_pics in the SPS, if it is determined that the images are sequentially decoded. The POCs are arranged in non-descending order, and the PicOrderCntMsb of the current image can be obtained by using the improved derivation method of the embodiment, and the current current pic_order_cnt-lsb of the current image obtained from the code stream is obtained according to PicOrderCntMsb. The PicOrderCntVal of the image, thus based on the current image's PicOrderCntVal And the POC value of each image in the DPB to determine the output order of the current image and the DPB images; if it is determined that the POCs of the sequentially decoded images are not arranged in non-descending order, then

, the actual implementation of the examples will not be described again. .

It can be understood that, under the condition that the PPOOCC of the image image of the image of the decoded code is arranged in the non-non-descending descending order, 55 Even if the size of the PPOOCC difference value of any two or two image images in the image of the active activation image is not limited, it will not be affected. It affects the robustness of the ringing decoding code, that is, it does not cause the situation that the image of the current decoded coded image is misplaced relative to the position of the alignment. . Therefore, based on the above-mentioned processing and processing procedures in the present embodiment, the present embodiment can also be modified to improve the existing HHEE VVCC standard document. The large and small constrained bundle conditional condition of the PPOOCC difference value of the image of the image of the image of the image of the image of the image of the image of the image of the image of the image of the image For the case where the PPOOCC of the image image of the image of the successive de-decoded code is arranged according to the non-non-descending descending sequence, there is no need to use the constraint constraint to bind the conditional component. The live limit image is activated by a large and small limit of the PPOOCC difference value of any two or two image images in the 1100. . However, the PPOOCC for the image image of the image image of the successive de-decoded code is arranged according to the non-non-descending descending sequence, and the large and small values of the PPOOCC difference value are removed. The constraint bundle, its good advantage lies in the fact that it can reduce the width of the picicc - oorrddeerr - ccnntt - llssbb, from which it can be reduced, transmitted, transmitted Lose the overhead. .

In the embodiment of the present embodiment, the de-code decoding device may be configured to first determine whether the PPOOCC of the image image of the de-decoded code is determined according to the non-non-descending descending order. Arrange the columns; if the results are arranged in a non-descending descending order, then the derivation method can be used to determine the 1155 when the current PPOOCCMMssbb of the image of the previous image, that is, the PPOOCC that is obtained from the code stream is taken as the image of the current image of the previous image is not more important than the bit bit PPOOCCLLssbb, as described When the PPOOCCLLssbb of the current image of the previous image is smaller than the PPOOCCLLssbb of the previous image of the previous image, then the PPOOCCMMssbb of the current image of the previous image is equal to the PPOOCCMMssbb of the image of the previous image before and after The sum of the largest PPOOCCLLssbb and, if not, then, when the current image of the previous image is PPOOCCMMssbb, etc. is equal to pprreevvPPiiccOOrrddeerrCCnnttMMssbb;; the last and then the root image according to the current image The PPOOCCMMssbb and the PPOOCCLLssbb of the current image of the previous image are indeed determined to be the PPOOCC of the current image of the previous image. . Therefore, the deciphering code setting 2200 can be determined to be determined according to the PPOOCC according to the image of the current front image and the PPOOCC of each image image of the DDPPBB. In order. .

Therefore, in the present embodiment, the embodiment of the present embodiment is simplified for the case where the PPOOCCs of the image images of the successively decoded codes are arranged in a non-descending descending order. The push-guided mode of PPOOCCMMssbb is improved, and the rate of efficiency of processing is improved. Moreover, based on the push-guide method of the improvement, it may be In order to remove the limit system of the PPOOCC difference value of the image of any two or two images of the image of the image of the image of the image of the image of the image The bit width and width of the PPOOCCLLssbb in the less code stream are reduced, and then the small transmission transmission and the overhead pin are reduced. .

2255 FIG. 33 is a flow chart of the third embodiment of the embodiment of the method for processing the video stream of the video of the clear video video image of the present invention, as shown in FIG. 33. It should be noted that the method of the embodiment of the present invention may be included in the package:

Step 330011, extracting the code stream of the code-coding device, and among them, the first class of grammatical elements and the second class The grammatical element element placement is placed adjacent to the nearest neighbor position in the code stream, or the second second class grammatical element element and And the third third class grammatical element element placement is placed in the adjacent neighboring near position in the code stream, and the first class is The grammatical element element refers to the grammatical element element of the packet 3300 containing the amount of information of the number of layers in the time period, and the second categorical grammatical element of the second class Elemental means that the package contains layers of time between each a syntax element of the code stream feature information, where the third class syntax element refers to a syntax element containing information about the size of the storage space occupied by each image;

Step 302: Extract at least two types of syntax elements from the neighboring locations of the code stream. In particular, the capabilities of the decoding device are limited. In order to ensure that the decoding device can decode the code stream normally, the decoding device needs to determine whether the decoding capability required by the code stream is within the range of decoding capabilities supported by the code stream. If yes, the decoding device can normally receive the code stream and decode it. h The decoding device will perform exception handling.

Among them, the decoding capability can include restrictions on the DPB. Such restrictions may include: Limitations on the number of images stored in the DPB, and limitations on the storage space occupied by all images in the DPB. The storage space occupied by all the images in the DPB can be calculated by the relevant parameters indicating the number of images stored in the DPB and the relevant parameters of the storage space occupied by each image, wherein the syntax elements indicating the storage space occupied by each image can be Including: related parameters indicating the image size, such as the width and height of the image; image cropping parameters; the bit width of each chroma component of each sample value, and the like. The code is the entire code stream, and the other is to extract and decode only part of the time layer code stream that can be supported while discarding the time layer code stream that the decoding device cannot support, in the case where the code stream contains several different time layers.

The above-mentioned operation of extracting the code stream by the time layer can also be performed in the media gateway. During the transmission of the video, the decoding device, such as the decoder in the client, can communicate with the media gateway to inform the media gateway of its own decoding capabilities. In one case, the media gateway can select not to transmit the code stream to the decoder by parsing part of the information in the SPS to find that the decoding capability of the current code stream exceeds the decoding capability supported by the decoding device. In another case, if the media gateway parses part of the information in the SPS, it is found that although the decoding capability required for the complete stream exceeds the decoding capability supported by the decoding device, the decoding capability required for the partial time layer stream is supported by the decoding device. Within the decoding capability range, the media gateway may choose to discard part of the time layer code stream, and only send the remaining part of the layer layer code stream to the decoding device to save transmission bandwidth.

The premise of the above-mentioned decoding device or media gateway performing the temporal layer stream extraction operation is to obtain relevant information by decoding some syntax elements in the SPS, thereby determining the time layer code stream that the decoding device can support.

In SPS, the syntax elements related to the temporal layer stream extraction operation can be divided into three categories. In this embodiment, the first type of syntax element, the second type of syntax element, and the third type of syntax element are respectively recorded. The first type of syntax element refers to a syntax element that includes time-level quantity information, such as a syntax element max_temporal-layer-minus 1; the second type of syntax element refers to a code stream characteristic information including each time layer. Syntax elements, such as the syntax elements max-dec-pic-buffering[i], num-reorder_pics[i], and max-latency-increment [i], where the syntax element max_dec_pic_buffering[i] represents decoding all Temporal—the size of the decoded image buffer required for the substream that does not exceed i, the syntax element num_reorder_pics[i] is expressed as the number of inverted images in the substream of temporal_id not exceeding i, i is the time layer The third type of syntax element refers to a syntax element that contains information about the size of the storage space occupied by each image, such as the syntax element pic_width_in-luma-samples representing the width of the image and the pic-height representing the height of the image. In- luma-sample , the syntax element for image cropping information pic- cropping_flag, pic-crop-left-offset, pic-crop-right- Offset , pic — crop — top — offset and pic — crop — bottom — offset , and the syntax element for each chrominance component bit width of each sample value — depth — luma — minus 8 and bit — depth — chroma — minus .

In existing HEVC, each time layer has its own parameters, which are passed in the SPS. For example, the SPS can include the following syntax elements:

Seq parameter set rbsp( ) { max — temporal — layers — minus 1

p ic— width— in— luma— samp le s

Pic— height— in— luma— samples

Pic— cropping — flag

If( pic- cropping-flag ) {

Pic— crop—left—offset

Pic— crop— right—offset

Pic— crop — top—offset

Pic— crop—bottom—offset

}

b it—depth— luma— minus 8 Bit—depth—chrome— minus8

Pcm— enabled— flag

If( pcm— enabled— flag ) {

Pcm — bit — depth — luma — minus 1

Pcm — bit — depth — chroma — minus 1

}

Qpprime—y— zero— transquant— bypass—flag

Log2_max_pic_order_cnt_lsb_minus4

For( i = 0; i <= max—temporal—layer— minus 1 ; i++ )

{

Max— dec_pic— buffering [ i ]

Num— reorder— pics [ i ]

Max- latency- increase [ i ]

} }

However, in the prior art, the first type of syntax elements, the second type of syntax elements, and the third type of syntax elements are located far apart in the SPS. For example, the syntax element max_tempo_layer- minus 1 indicating the number of time layers and the syntax elements max_dec_pic_buffering[i], num_reorder_pics [i] indicating the code stream feature information of each time layer are Max- latency-increase [i] is far apart, and there are more than 4 syntax elements in the middle that are independent of the decoding device or media gateway to extract the time-layer code stream operation, such as pcm-enabled-flag. At this time, the decoding device or the media gateway extracts each time layer information to complete the time layer stream extraction operation, and needs to additionally parse these irrelevant syntax elements, which reduces the processing efficiency of the decoding device or the media gateway.

In order to solve the above problem, the processing efficiency of extracting each time layer information by the decoding device or the media gateway is improved. This embodiment provides the following technical solutions:

The encoding device may place the first type of syntax element and the second type of syntax element in a neighboring position in the code stream, or place the second type of syntax element and the third type of syntax element in a vicinity of the code stream on. Correspondingly, the decoding device or the media gateway generates the extracted encoding device After the code stream, at least two types of syntax elements can be extracted from a neighboring position in the code stream.

In a specific implementation, the manner in which the above syntax elements are placed in the code stream can be implemented in at least the following four ways:

The first way is to keep the position of the third type of syntax element unchanged, and adjust the position of the first type of syntax element in the existing SPS, that is, the first type of syntax element is placed before the second type of syntax element; The positional relationship of each syntax element in the SPS obtained by the method is as follows:

Seq parameter set rbsp( ) { pic— width— in— luma— samples

Pic— height— in— luma— samples

Pic— cropping — flag

If( pic- cropping-flag ) {

Pic— crop—left—offset

Pic— crop— right—offset

Pic— crop — top—offset

Pic— crop—bottom—offset

}

Bit—depth— luma— minus 8

Bit-depth—chrome— minus 8

Pcm— enabled— flag

If( pcm— enabled— flag ) {

Pcm — bit — depth — luma — minus 1

Pcm — bit — depth — chroma — minus 1

}

Qpprime—y—zero— transquant—bypass—flag log2— max— pic—order— cnt—lsb—minus4

Max—tempo—layer— minus 1

For( i = 0; i <= max-temporal- layers- minus 1 ;

i++ ) {

Max_dec_pic_buffering[ i ] Num_reorder_p ic s [ i ]

Max- latency- increase [ i ]

}

The second way is to keep the position of the third type of syntax element and adjust the position of the second type of syntax element, that is, the second type of syntax element is placed after the first type of syntax element.

The positional relationship of each syntax element in the SPS obtained in the second way is as follows:

Seq parameter set rbsp( ) { max — temporal — layers — minus 1

For( i = 0; i <= max-temporal- layers- minus 1 ;

i++ ) {

Max—dec—pic— buffering [ i ]

Num_reorder_pics[ i ]

Max- latency- increase [ i ]

}

p ic— width— in— luma— samp le s

Pic— height— in— luma— samples

Pic— cropping — flag

If( pic- cropping-flag ) {

Pic— crop—left—offset

Pic— crop— right—offset

Pic— crop — top—offset

Pic— crop—bottom—offset

}

b it—depth— luma— minus 8

Bit—depth—chrome— minus8

Pcm— enabled— flag If( pcm— enabled— flag ) {

Pcm — bit — depth — luma — minus 1

Pcm — bit — depth — chroma — minus 1

}

Qpprime—y— zero— transquant— bypass—flag

Log2_max_pic_order_cnt_lsb_minus4

}

The third way is to keep the position of the third type of syntax element unchanged, and adjust the first type of syntax element and the second type of syntax element to the middle position, such as adjusting the first type of syntax element and the second type of syntax element to After the third type of syntax element.

The positional relationship of each syntax element in the SPS obtained in the third way is as follows:

Seq parameter set rbsp( ) { pic— width— in— luma— samples

Pic— height— in— luma— samples

Pic— cropping — flag

If( pic- cropping-flag ) {

Pic— crop—left—offset

Pic— crop— right—offset

Pic— crop — top—offset

Pic— crop—bottom—offset

}

Bit—depth— luma— minus 8

Bit-depth—chrome— minus 8

Max—tempo—layer— minus 1

For( i = 0; i <= max-temporal- layers- minus 1 ;

i++ ) {

Max_dec_pic_buffering[ i ]

Num_reorder_p ic s [ i ] Max- latency- increase [ i ]

}

Pcm— enabled— flag

If( pcm— enabled— flag ) {

Pcm — bit — depth — luma — minus 1

Pcm — bit — depth — chroma — minus 1

}

The fourth way is to place the third type of syntax element after the first type of syntax element and the second type of syntax element after the third type of syntax element.

The positional relationship of each syntax element in the SPS obtained in the fourth way is as follows:

Seq parameter set rbsp( ) { max — temporal — layers — minus 1

Pic— width—in— luma— samples

Pic— height— in— luma— samples

Pic— cropping — flag

If( pic- cropping-flag ) {

Pic— crop—left—offset

Pic— crop— right—offset

Pic— crop — top—offset

Pic— crop—bottom—offset

}

Bit—depth— luma— minus 8

Bit-depth—chrome— minus 8

For( i = 0; i <= max-temporal- layers- minus 1 ;

i++ ) { Max— dec_pic— buffering [ i ]

Num— reorder— pics [ i ]

Max- latency- increase [ i ]

}

Pcm— enabled— flag

If( pcm— enabled— flag ) {

Pcm — bit — depth — luma — minus 1

Pcm — bit — depth — chroma — minus 1

}

In this embodiment, the decoding device or the media gateway may extract a code stream generated by the encoding device, where the first class syntax element and the second class syntax element are placed in a neighboring position in the code stream, or the second The class syntax element and the third class syntax element are placed in the adjacent position in the code stream. Therefore, the decoding device or the media gateway does not need to additionally parse other irrelevant syntax elements when extracting each time layer information, but may continuously All the syntax elements of each time layer information are extracted, thereby improving the processing efficiency of the time layer information.

4 is a flowchart of Embodiment 4 of a video image code stream processing method according to the present invention. As shown in FIG. 4, the method in this embodiment may include:

Step 401: Extract syntax elements max_dec_pic_buffering_diff[i] and num_reorder_pics_diff[i] in the code stream generated by the encoding device, where the syntax element max_dec_pic_buffering- Diff[i] represents the difference between the size of the decoded picture buffer required for the sub-code stream whose decoding time layer identifier does not exceed i and the size of the decoded picture buffer required for the sub-code stream whose decoding time layer identifier does not exceed i-1, The syntax element num_reorder_pics_diff[i] represents the difference between the number of inverted images in the subcode stream whose time layer identifier does not exceed i and the number of inverted images in the subcode stream whose time layer identifier does not exceed i-1;

Step 402: If i is 0, obtain MaxDecPicBuffering[i] and NumReorderPics[i] in the following manner: MaxDecPicBuffering[i] = max— dec—pic—buffering—diff[i] ;

NumReorderPics[i] = num_reorder_pics_diff [i];

Step 403: If i is greater than 0, obtain MaxDecPicBuffering[i] and NumReorderPics[i] in the following manner:

MaxDecPicBuffering[i]=MaxDecPicBuffering[i-l]

+max— dec— pic— buffering— diff[i];

NumReorderPics[i] = NumReorderPics [i-1] + num_reorder_p ic s_dif f [i] ; where MaxDecPicBuffering[i] indicates that the DPB of the Hypothetical Reference Decoder (HRD) is for the i-th time-layer code. The size of the storage space required by the stream, NumReorderPics [i] represents the maximum number of time-series inverted images in the i-th temporal layer stream.

The i-th time-layer code stream has the same meaning as the sub-code stream whose time-layer identifier does not exceed i, and refers to the sub-code stream obtained by removing all NALUs whose temporal-id is greater than i in the original code stream.

As can be seen from the description in the third embodiment of the method shown in FIG. 3 above, the syntax element max_dec_pic_buffering[i] represents the size of the decoded image buffer required to decode all sub-streams whose temporal-id does not exceed i, and therefore, the syntax element max_dec The value of – pic—buffering^] is non-decreasing, ie max—dec_pic—buffering[i]≥max—dec—pic—buffering[il]. That is, there is no case where the decoded image buffer size required to decode an image whose temporal_id does not exceed i is smaller than the decoded image buffer size required to decode an image whose temporal_id does not exceed i-1. Similarly, the syntax element num_reorder_pics [i] indicates that the temporal_id does not exceed the number of inverted images in the substream of i, so the value of the syntax element num_reorder_pics[i] is also non-decreasing.

Therefore, in this embodiment, the value of the syntax element max_dec_pic_buffering [i] and the value of the syntax element num_reorder_pics[i] are both non-subtractive, and the two syntax elements are differentially encoded to reduce the representation. The syntax elements max- dec_pic- buffering in the SPS and the bits required by the syntax element num_reorder_pics improve the compression efficiency.

The specific implementation may be:

Use the syntax elements max_dec_pic-buffering-diff[i] and num_reorder_pics_diff[i] to replace the existing syntax elements max- dec_pic-buffering [i] and num-reorder-pics[i].

Use the variable MaxDecPicBuffering[i] to indicate the size of the storage space required by the DPB of the HRD for the i-th time-layer code stream. The size of the storage space is based on the size of the image storage space. The unit calculation, for example, the MaxDecPicBuffering[i]=2 indicates the amount of storage space required for the DPB storage space to be 2 images. The variable NumReorderPics[i] is used to represent the maximum number of time-series inverted images in the i-th temporal layer code stream.

When i is 0:

MaxDecPicBuffering[i]= max— dec_pic— buffering— diff[i]

NumReorderPics[i] = num_reorder_pics_diff [i];

When i>0:

MaxDecPicBuffering[i]=MaxDecPicBuffering[i-l]

+max— dec— pic— buffering— diff [i]

NumReorderPics[i]=NumReorderPics[i-1]+num_reorder_pics_diff [i] In this embodiment, the value and the presentation time of the syntax element indicating the size of the decoded image buffer required for the sub-code stream not exceeding i are represented by the decoding time layer identifier. The value of the syntax element whose layer identifier does not exceed the number of inverted images in the sub-stream of i is non-decreasing, and the two syntax elements are differentially encoded to reduce the bits required to represent the syntax elements in the SPS. Thereby improving the compression efficiency.

In the first embodiment of the decoding device of the present invention, the decoding device may include: an extracting module, configured to extract a code stream generated by the encoding device, where the code stream includes pure random access CRA image information, and the CRA image information satisfies CRA Constraints of the image and the constraints include that the CRA image must be a reference image.

In a specific implementation, the extracting module is specifically configured to extract a network abstraction layer unit NALU generated by the encoding device, where a syntax element nal_unit_type in the header field of the NALU indicates that the NALU includes CRA image information, where The constraint includes: the syntax element nal_ref_flag in the header field of the NALU must be 1.

The decoding device of this embodiment is used to perform the technical solution of the method embodiment shown in FIG. 1. The principle and technical effects are similar, and details are not described herein again.

FIG. 5 is a schematic structural diagram of Embodiment 2 of a decoding device according to the present invention. As shown in FIG. 5, the decoding device in this embodiment may include: a first determining module 21, a processing module 22, and a second determining module 23, where a determining module 21, configured to determine whether a picture sequence number POC of the sequentially decoded images in the active image is arranged in a non-descending order, the active image including the current image and all images in the decoded image buffer DPB that are marked for short-term reference and waiting for output of An image processing module 22, configured to determine a POC important bit POCMsb of the current image by using a method of: following a non-descending order: obtaining a POC non-significant bit POCLsb of the current image from the code stream; If the POCLsb of the image is smaller than the POCLsb of the previous image, the POCMsb of the current image is equal to the sum of the POCMsb of the previous image and the maximum POCLsb, wherein the maximum POCLsb=2 ^N , N is the bit width of the POCLsb; otherwise, the POCMsb of the current image is equal to prevPicOrderCntMsb; The second determining module 23 is configured to determine a POC of the current image according to the POCMsb of the current image and the POCLsb of the current image.

In a specific implementation, the first determining module 21 is specifically configured to determine, according to the value of the syntax element num_reorder_pics indicating the maximum number of time-stamped images in the sequence parameter set SPS, whether the POC of the sequentially decoded image in the activated image is Arranged in non-descending order.

The decoding device of this embodiment is used to perform the technical solution of the method embodiment shown in FIG. 2, and the principle and the technical effect are similar, and details are not described herein again.

FIG. 6 is a schematic structural diagram of an embodiment of a processing device according to the present invention. As shown in FIG. 6, the device in this embodiment may include: an extracting module 31, and an extracting module 32, where the extracting module 31 is configured to extract a code generated by the encoding device. a stream, wherein the first type of syntax element and the second type of syntax element are placed in a neighboring position in the code stream, or the second type of syntax element and the third type of syntax element are placed in a neighboring position in the code stream The first type of syntax element refers to a syntax element that includes time layer quantity information, and the second type of syntax element refers to a syntax element that includes code stream feature information of each time layer, and the third type of syntax element is Refers to a syntax element that contains information on the size of the storage space occupied by each image; an extraction module 32 is configured to extract at least two types of syntax elements from the neighboring locations of the code stream.

In a specific implementation, the positions of the second type of syntax element and the third type of syntax element in the code stream are unchanged, and the first type of syntax element is placed before the second type of syntax element; or, the The position of a class of syntax elements and a class of class 3 syntax elements in the code stream is unchanged, the second class of syntax elements are placed after the first class of syntax elements; or the third class of syntax elements are The position in the code stream is unchanged, the first class syntax element and the second class syntax element are placed in an intermediate position of the code stream; or the third class syntax element is placed in the first class syntax element Thereafter, the second type of syntax element is placed after the third type of syntax element.

Moreover, the processing device of this embodiment may be a decoding device, such as decoding in a terminal device. It can also be a media gateway.

The processing device of this embodiment is used to perform the technical solution of the method embodiment shown in FIG. 3, and the principle and the technical effect are similar, and details are not described herein again.

FIG. 7 is a schematic structural diagram of Embodiment 3 of a decoding device according to the present invention. As shown in FIG. 7, the device in this embodiment may include: an extracting module 41 and a differential decoding module 42, where the extracting module 41 is configured to extract an encoding device. The syntax elements max_dec_pic_buffering_diff[i] and num_reorder_pics_diff[i] in the code stream, where the syntax element max_dec_pic_buffering_diff[i] represents the decoding time The difference between the size of the decoded image buffer required for the sub-code stream whose layer identifier does not exceed i and the size of the decoded picture buffer required for the sub-code stream whose decoding time layer identifier does not exceed i-1, syntax element num_reorder_pics – diff[i] represents the difference between the number of inverted pictures in the sub-code stream whose time layer identifier does not exceed i and the number of inverted pictures in the sub-code stream whose time layer identifier does not exceed i-1; differential decoding module 42, for If i is 0, Max get MaxDecPicBuffering[i] and NumReorderPics[i] in the following way:

MaxDecPicBuffering[i] = max— dec— pic— buffering— diff[i];

NumReorderPics[i] = num_reorder_pics_diff[i];

MaxDecPicBuffering[i]=MaxDecPicBuffering[i-l]

+max— dec— pic— buffering— diff[i];

NumReorderPics[i] = NumReorderPics [i-1] + num_reorder_p ic s_diff [i] ; where MaxDecPicBuffering[i] represents the size of the storage space required by the DPB of the reference decoder HRD for the i-th temporal layer stream, NumReorderPics[ i] represents the maximum number of time-series inverted images in the i-th temporal layer code stream. The i-th time-layer code stream has the same meaning as the sub-code stream whose time-layer identifier does not exceed i, and refers to the sub-code stream obtained by removing all NALUs whose temporal-id is greater than i in the original code stream.

The decoding device of this embodiment is used to perform the technical solution of the method embodiment shown in FIG. 4, and the principle and the technical effect are similar, and details are not described herein again.

It will be understood by those skilled in the art that all or part of the steps of implementing the above method embodiments may be performed by hardware related to the program instructions. The aforementioned program can be stored in a computer readable storage medium. When the program is executed, the steps including the above method embodiments are performed. The foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

It should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

Claim

A video image code stream processing method, comprising:

The method according to claim 1, wherein the extracting the code stream generated by the encoding device, and the code stream includes CRA image information, including:

Extracting a network abstraction layer unit NALU generated by the encoding device, and a syntax element nal_unit_type in the header field of the NALU indicates that the NALU includes CRA image information;

The constraints include:

The syntax element nal_ref_flag in the header field of the NALU must be 1.

3. A video image code stream processing method, comprising:

The method according to claim 3, wherein the determining whether the POCs of the sequentially decoded images are arranged in non-descending order comprises:

According to the value of the syntax element num_reorder_pics indicating the maximum number of time-stamped images in the sequence parameter set SPS, it is determined whether the POCs of the sequentially decoded images are arranged in non-descending order 歹 |J.

The method according to claim 3 or 4, wherein the size of the POC difference value of any two images in the activated image has no constraint, and the activated image includes all of the current image and the decoded image buffer DPB. Images marked for short-term reference and waiting for input Out of the image.

6. A video image code stream processing method, comprising:

The method according to claim 6, wherein the positions of the second class syntax element and the third class syntax element in the code stream are unchanged, and the first class syntax element is placed in the Before the second type of syntax element; or,

The position of the first type of syntax element and the third type of syntax element in the code stream is unchanged, and the second type of syntax element is placed after the first type of syntax element; or

The position of the third type of syntax element in the code stream is unchanged, and the first type of syntax element and the second type of syntax element are placed in an intermediate position of the code stream; or

The third type of syntax element is placed after the first type of syntax element, and the second type of syntax element is placed after the third type of syntax element.

8. A video image code stream processing method, comprising:

MaxDecPicBuffering[i] = max— dec— pic— buffering— diff[i];

NumReorderPics[i] = num_reorder_pics_diff[i]; If i is greater than 0, get MaxDecPicBuffering[i] and NumReorderPics[i] in the following way:

MaxDecPicBuffering[i]=MaxDecPicBuffering[i-l]

+max— dec— pic— buffering— diff[i];

NumReorderPics[i] = NumReorderPics [i-1] + num_reorder_p ic s_diff [i]; where MaxDecPicBuffering[i] represents the size of the storage space required by the DPB of the reference decoder HRD for the i-th temporal layer stream, NumReorderPics[ i] represents the maximum number of time-series inverted images in the i-th temporal layer code stream.

9. A decoding device, comprising:

And an extracting module, configured to extract a code stream generated by the encoding device, where the code stream includes pure random access CRA image information, the CRA image information satisfies a constraint condition of the CRA image, and the constraint includes the CRA image must be a reference image.

The device according to claim 9, wherein the extracting module is specifically configured to extract a network abstraction layer unit NALU generated by the encoding device, and a syntax element nal_unit in a header field of the NALU. Type indicates that the NALU contains CRA image information;

The constraint includes: the syntax element nal_ref_flag in the header field of the NALU must be 1.

A decoding device, comprising:

And a second determining module, configured to determine a POC of the current image according to the POCMsb of the current image and the POCLsb of the current image.

The device according to claim 11, wherein the first determining module, Specifically, it is determined whether the POC of the image in the activated image is arranged in a non-descending order according to the value of the syntax element num_reorder_pics indicating the maximum number of time-stamped images in the sequence parameter set SPS.

The device according to claim 1 or 12, wherein the size of the POC difference value of any two images in the activated image has no constraint, and the activated image includes the current image and the decoded image buffer. All images in the DPB that are marked for short-term reference and images that are waiting for output.

14. A processing device, comprising:

An extraction module, configured to extract a code stream generated by the encoding device, where the first class syntax element and the second class syntax element are placed in a neighboring position in the code stream, or the second class syntax element and the third class syntax element Placed in a neighboring position in the code stream, the first type of syntax element refers to a syntax element including time layer quantity information, and the second type of syntax element refers to a code stream characteristic information including each time layer. a syntax element, the third type of syntax element is a syntax element that includes information about the size of the storage space occupied by each image;

The device according to claim 14, wherein the positions of the second class syntax element and the third class syntax element in the code stream are unchanged, and the first class syntax element is placed in the Before the second type of syntax element; or,

The device according to claim 14 or 15, wherein the device is a decoding device or a media gateway.

17. A decoding device, comprising:

An extraction module, configured to extract syntax elements max_dec_pic_buffering-diff[i] and num-reorder-pics-diff[i] in the code stream generated by the encoding device, where the syntax element Max_dec_pic_buffering_diff[i] indicates the size of the decoded image buffer required for the decoding of the sub-stream of the decoding time layer not exceeding i and the decoded image buffer required for the sub-stream of the decoding time layer not exceeding i-1 The difference in size, the syntax element num_reorder_pics_diff[i] represents the difference between the number of inverted images in the subcode stream whose time layer identifier does not exceed i and the number of inverted images in the subcode stream whose time layer identifier does not exceed i-1;

Differential decoding module, if i is 0, obtain MaxDecPicBuffering[i] and Num eorderPics[i] in the following way:

MaxDecPicBuffering[i] = max— dec— pic— buffering— diff[i];

NumReorderPics[i] = num_reorder_pics_diff [i];

If i is greater than 0, MaxDecPicBuffering[i] and NumReorderPics[i] are obtained as follows:

MaxDecPicBuffering[i]=MaxDecPicBufferin i- 1 ]

NumReorderPics[i] = NumReorderPics [i- 1 ] + num_reorder_pics_diff [i] ; where MaxDecPicBuffering[i] represents the size of the storage space required for the iB of the reference decoder HRD for the i-th temporal layer stream, NumReorderPics[i] Represents the maximum number of time-series inverted images in the i-th temporal layer stream.