WO2019227491A1

WO2019227491A1 - Coding and decoding methods, and coding and decoding devices

Info

Publication number: WO2019227491A1
Application number: PCT/CN2018/089673
Authority: WO
Inventors: 李蔚然; 郑萧桢
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2018-06-01
Filing date: 2018-06-01
Publication date: 2019-12-05
Also published as: CN110366851A; CN110366851B

Abstract

Coding and decoding methods, and coding and decoding devices. The coding method comprises: coding frames of a first type, wherein there are N frames of a second type needing interframe coding after the frames of the first type according to coding order, and the display order of the N frames of the second type is before that of the frames of the first type; performing interframe coding on at least one of the N frames of the second type according to a long-term reference image; and replacing the currently used long-term reference image after the interframe coding of the at least one of the N frames of the second type is completed. The coding method enables the at least one frame of the second type to refer to the long-term reference image before the frames of the first type, and can improve the coding efficiency.

Description

Encoding and decoding method and encoding and decoding equipment

Copyright statement

The content disclosed in this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the official records and archives of the Patent and Trademark Office.

Technical field

The present application relates to the field of image processing, and in particular, to an encoding and decoding method and an encoding and decoding device.

Background technique

In the process of inter-prediction of image coding and decoding, the more similar the selected reference image and the current image to be coded, the smaller the residuals generated by inter-prediction, thereby improving the coding efficiency of inter-prediction. Some existing technologies can use each image in the video to construct a high-quality specific reference image containing the background content of the scene, called a long-term reference image. That is, the specific reference image can be used as a reference image for inter prediction for a relatively long period of time. When performing inter prediction, the background portion of the current image to be encoded / decoded can be used to reduce residual information of inter prediction by referring to the long-term reference image, thereby improving encoding efficiency. The long-term reference picture is not an encoded / decoded picture, but an artificially constructed picture. The long-term reference image includes multiple image blocks, and any one image block is taken from a certain encoded / decoded image. Different image blocks in the long-term reference image may be taken from different encoded / decoded images. After the encoding / decoding of a certain frame is completed, the long-term reference image can be updated based on the encoded / decoded image.

In the practical application of image codecs, it often happens that the user starts watching not from the starting point of the video stream, but from the intermediate point of the video stream. The video stream may include, for example, TV programs, webcasts, local movies Wait. In order to support the ability to load videos from different moments, during the encoding / decoding process, a random access point (RAP) can usually be inserted into the video stream. The characteristics of the random access point make it possible for the decoder to ensure that frames with a display order after the random access point can be decoded normally.

If the random access point is not considered when updating the long-term reference image, but the update is based on the encoded / decoded image, it may cause that the frames in the display order after the random access point cannot be decoded normally. If the existing encoding / decoding technology for random access points is used when updating the long-term reference image, the encoding / decoding efficiency may be low.

Summary of the Invention

The application provides an encoding and decoding method and an encoding and decoding device, which can improve encoding / decoding efficiency.

According to a first aspect, an encoding method is provided, including: encoding a first type frame, wherein in the encoding order, there are N second type frames that need to be inter-frame encoded after the first type frame, the N number The display order of the second type frame is before the first type frame, where N is a positive integer; at least one second type frame of the N second type frames is inter-frame coded according to the long-term reference image; After the inter-frame encoding of at least one second type frame among the N second type frames is completed, the currently used long-term reference image is replaced.

According to a second aspect, a decoding method is provided, including: decoding a first type frame, wherein, in the decoding order, there are N second type frames that need to be inter-frame decoded after the first type frame, and the N The display order of the second type frame is before the first type frame, where N is a positive integer; at least one second type frame of the N second type frames is inter-frame decoded according to the long-term reference image; After the inter-frame decoding of at least one second type frame among the N second type frames is completed, the currently used long-term reference image is replaced.

According to a third aspect, an encoding device is provided, including: at least one memory for storing computer-executable instructions; at least one processor, alone or collectively, for: accessing the at least one memory and executing the computer-executable Instructions to implement the following operations: encoding a first type frame, where there are N second type frames that need to be inter-frame encoded after the first type frame in the encoding order, and the N second type frames have The display order is before the first type frame, where N is a positive integer; at least one second type frame of the N second type frames is inter-frame encoded according to the long-term reference image; After inter-coding of at least one second-type frame in the second-type frame, the currently used long-term reference image is replaced.

According to a fourth aspect, a decoding device is provided, including: at least one memory for storing computer-executable instructions; at least one processor, alone or collectively, for: accessing the at least one memory and executing the computer-executable Instructions to implement the following operations: decoding a first type frame, wherein, in the decoding order, there are N second type frames that need to be inter-frame decoded after the first type frame, and the N second type frames are The display order is before the first type frame, where N is a positive integer; at least one second type frame of the N second type frames is inter-frame decoded according to the long-term reference image; After the inter-frame decoding of at least one of the second type frames, the long-term reference image currently used is replaced.

According to a fifth aspect, a computer-readable storage medium is provided, on which instructions are stored, and when the instructions are run on the computer, the computer is caused to execute the encoding method of the first aspect.

According to a sixth aspect, a computer-readable storage medium is provided, on which instructions are stored, and when the instructions are run on the computer, the computer is caused to execute the decoding method of the first aspect.

This application only replaces the current use after completing the inter-frame encoding / decoding of at least one second type frame among the N second type frames in which the encoding / decoding order is after the first type frame and the display order is before the first type frame. Long-term reference image, so that at least one second type frame can refer to the long-term reference image before the first type frame, which can improve encoding / decoding efficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of an embodiment of out-of-order encoding in the present application.

FIG. 2 is a schematic flowchart of an embodiment of a video encoding method according to the present application.

FIG. 3 is a schematic diagram of a relationship between an image block in a current to-be-encoded image and an image block in a long-term reference image.

FIG. 4 is a schematic diagram showing a relationship between multiple images and a long-term reference image in a video.

FIG. 5 is a schematic flowchart of another embodiment of a video encoding method according to the present application.

FIG. 6 is a schematic diagram of a video decoding method according to another embodiment of the present application.

FIG. 7 is a schematic flowchart of another embodiment of a video decoding method according to the present application.

FIG. 8 is a schematic flowchart of an encoding method according to an embodiment of the present application.

FIG. 9 is a schematic flowchart of an encoding method according to an embodiment of the present application.

FIG. 10 is a schematic flowchart of a decoding method according to an embodiment of the present application.

FIG. 11 is a schematic flowchart of a decoding method according to another embodiment of the present application.

FIG. 12 is a schematic flowchart of a decoding method according to another embodiment of the present application.

FIG. 13 is a schematic block diagram of an encoding device according to an embodiment of the present application.

FIG. 14 is a schematic block diagram of an encoding device according to another embodiment of the present application.

FIG. 15 is a schematic block diagram of a decoding device according to an embodiment of the present application.

FIG. 16 is a schematic block diagram of a decoding device according to another embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below with reference to the drawings.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms used herein in the specification of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application.

First, related technologies and concepts involved in the embodiments of the present application are introduced.

A video is made up of multiple images. When encoding / decoding a video, different images in the video can use different prediction methods. According to the prediction method adopted for the image, the image can be divided into an intra-prediction image and an inter-prediction image, where the inter-prediction image includes a forward-prediction image and a bi-prediction image. I picture is an intra prediction picture, also called a key frame; P picture is a forward prediction picture, that is, a P picture or I picture that has been previously encoded / decoded is used as a reference picture; B picture is a bidirectional prediction picture, that is, The front and back images are used as reference images. An implementation method is to encode / decode the multiple pictures to generate a group of pictures (GOP) after encoding / decoding. The GOP consists of an I picture and multiple B pictures (or bidirectional prediction). Picture) and / or P picture (or forward prediction picture). When the decoder is playing, it reads the GOP one by one, decodes it, reads the picture, and then renders it.

In modern video encoding / decoding standards, images of different resolutions can be encoded / decoded by dividing the image into multiple small blocks, that is, the image can be divided into multiple image blocks. The image can be divided into any number of image blocks. For example, the image can be divided into an array of m × n image blocks. The image block may have a rectangular shape, a square shape, a circular shape, or any other shape. An image block can have any size, such as p × q pixels. Each image block can have the same size and / or shape. Alternatively, two or more image blocks may have different sizes and / or shapes. An image block may or may not have any overlapping portions. In some embodiments, the image block is referred to as a macroblock or a maximum coding unit (LCU) in some encoding / decoding standards. For the H.264 standard, an image block is called a macroblock, and its size can be 16 × 16 pixels. For the high efficiency video coding (HEVC) standard, an image block is called a maximum coding unit, and its size can be 64 × 64 pixels.

In other embodiments, an image block may not be a macro block or a maximum coding unit, but a portion containing a macro block or a maximum coding unit, or at least two complete macro blocks (or maximum coding units). Contains at least one complete macro block (or maximum coding unit) and a portion of one macro block (or maximum coding unit), or contains at least two complete macro blocks (or maximum coding unit) and some macro blocks (or maximum coding) Unit). In this way, after the image is divided into a plurality of image blocks, these image blocks in the image data can be encoded / decoded separately.

The encoding process includes prediction, transformation, quantization, and entropy encoding. Among them, prediction includes two types of intra prediction and inter prediction, the purpose of which is to remove redundant information of the current image block to be encoded by using prediction block data. Intra prediction uses the information of the frame image to obtain prediction block data. Inter prediction uses the information of the reference image to obtain prediction block data. The process includes dividing the current to-be-encoded image into several to-be-encoded image blocks, and then dividing the to-be-encoded image block into several sub-image blocks; then, for each sub-image block , Search the reference image for the image block that most closely matches the current sub-image block as the predicted image block, and the relative displacement between the predicted image block and the current sub-image block is the motion vector; thereafter, the sub-image block and the predicted image block are Subtract the corresponding pixel values to get the residual. The residuals corresponding to the obtained sub-image blocks are combined to obtain the residuals of the image blocks to be encoded.

In each embodiment of the present application, a correlation matrix may be used to remove the correlation of the residuals of the image blocks, that is, the redundant information of the image blocks is removed in order to improve the coding efficiency. The transformation of the data block in the image block usually uses two-dimensional transformation, that is, the residual information of the data block is multiplied with an N × M transformation matrix and its transposition matrix at the encoding end, and the transformation coefficient is obtained after multiplication. . Transform coefficients are quantized to obtain quantized coefficients. Finally, the quantized coefficients are entropy-coded to obtain an entropy-coded bitstream. The entropy-coded bitstream and the encoded coding mode information, such as information such as the intra prediction mode and the motion vector (or motion vector residual), are stored or sent to the decoding end.

At the decoding side of the image, the entropy-decoded bitstream is obtained and the entropy decoding is performed to obtain the corresponding residual; the predicted image block corresponding to the image block is found according to the decoded motion vector, intra prediction and other information; according to the predicted image block and the residual Get the value of each pixel in the current sub-image block.

It was mentioned in the foregoing that an image that has been encoded / decoded is used as a reference image to be currently encoded / decoded. In some embodiments, a reference image may be constructed to improve the similarity between the reference image and the current image to be encoded / decoded. When an image in a video can be used as a reference image, the long-term reference image and the short-term reference image can be distinguished. The short-term reference image is a concept corresponding to the long-term reference image. The short-term reference image exists in the reference image buffer for a period of time. After the decoded reference image after the short-term reference image is moved into and out of the reference image buffer, the short-term reference image is removed from the reference image buffer. The reference image buffer may also be referred to as a reference image list buffer, a reference image list, a reference frame list buffer, or a reference frame list, etc., which are collectively referred to herein as a reference image buffer.

There is a specific type of encoding / decoding scene in the video content, in which the background does not change, only the foreground in the video changes or moves. For example, video surveillance belongs to this type of scene. In a video surveillance scene, the surveillance camera is usually stationary or only moves slowly, and it can be considered that the background is basically unchanged. In contrast, objects such as people or cars that are captured in video surveillance lenses often move or change, and it can be considered that the foreground changes frequently. In this type of scene, a specific reference image can be constructed, and the specific reference image contains only high-quality background information. For example, the specific reference image may be a long-term reference image, and may also be referred to as a composite reference frame. The long-term reference image includes multiple image blocks, and any one image block is taken from a decoded image. Different image blocks in the long-term reference image may be taken from different decoded images. When performing inter prediction, the background portion of the current image to be encoded / decoded can be referred to the long-term reference image, thereby reducing the residual information of inter prediction, thereby improving encoding / decoding efficiency.

The long-term reference picture (or a part of the data in the long-term reference picture) always exists in the reference picture buffer. The long-term reference picture (or a part of the data in the long-term reference picture) is not affected by the decoded reference picture in the reference picture buffer. For the effect of the move-in and move-out operations, the long-term reference image (or a part of the data in the long-term reference image) will be removed from the reference image buffer only when the decoder sends an update instruction operation.

Short-term reference pictures and long-term reference pictures may be called differently in different standards. For example, short-term reference pictures are called short-term in H.264 / advanced video coding (AVC) or H.265 / HEVC. Reference frames (short-term references), and long-term reference images are referred to as long-term references. For another example, in the audio coding standard (AVS) 1-P2, AVS2-P2, Institute of Electrical and Electronics Engineers (IEEE) 1857.9-P4 and other standards, the long-term reference image is called For the background frame (background picture). For another example, in standards such as VP8 and VP9, long-term reference images are called golden frames. The long-term reference image may also be obtained by the foregoing image block structure obtained from a plurality of decoded images, which is not limited in the embodiments of the present application. It should be understood that the long-term reference image in each embodiment of the present application may be an image that is not output.

Common encoding methods include sequential encoding / decoding and out-of-order encoding / decoding. Sequential encoding / decoding means that the encoding / decoding sequence is consistent with the display sequence. For example, the structure of the low-delay code stream is: I0, P1, P2, P3, P4, P5, etc. (where I frame is the intra encoding / decoding frame, and P frame is the previous Encode / decode frames between reference frames, the numbers represent the display order), the display order corresponding to this encoding / decoding structure is also I0, P1, P2, P3, P4, P5 ... Out-of-order encoding / decoding In order to maximize the encoding efficiency, the display order is disrupted for encoding / decoding. FIG. 1 is a schematic diagram of an embodiment of out-of-order encoding / decoding in the present application. As shown in FIG. 1, frames in the display order may be encoded / decoded first, and frames in the display order may be encoded / decoded later. For example, the coding / decoding sequence of the hierarchical-B code structure is: I0, P8, B4, B2, B1, B3, B6, B5, B7, P16, B12, B10, B9, B11, B14, B13, B13, B15, etc. (The B frame is a bidirectional reference inter-reference frame). The corresponding display order is I0, B1, B2, B3, B4, B5, B6, B7, P8, B9, B10, B11, B12, B13, B14, B15, P16 ...

When a random access point needs to be inserted, according to the definition of random access, the encoding / decoding process of frames whose display order is behind the random access point should not use data before the random access point. Random access in image sequences is generally divided into two categories. For the first type of random access point, the short-term reference image in the reference image buffer is completely cleared at the random access point, and the reconstructed image of the random access point is used to replace the long-term reference image. If there is a frame after the random access point in the display order before the random access point, it will be discarded without decoding. In some standards, this first type of random access point is called Instantaneous Decoder Refresh (IDR). For the second type of random access point, the short-term reference image in the reference image buffer at the random access point will not be immediately cleared, and the frames encoded / decoded after the random access point and displayed before the random access point can still be used. Short-term reference frames before the random access point (if they exist in the reference image buffer). Frames that are displayed after the random access point must be used as frames for short-term reference frames. . Furthermore, after encoding / decoding the second type of random access point, the reconstructed image of the second type of random access point is also used to replace the long-term reference image. In some standards, this second type of random access point is called Clean Random Access (CRA).

In some embodiments of the present application, after the second type of random access point (the frames corresponding to the random access point are all I frames), the short-term reference images in the reference image buffer are not immediately emptied. The frames encoded / decoded after the random access point and displayed in order before the random access point can still use the reference image before the random access point (if it exists in the reference image buffer) for inter-frame encoding / decoding. Frames that are encoded / decoded after the random access point and are displayed after the random access point use frames in the reference image buffer that are displayed after the random access point (including the random access point) as the reference image for inter-frame encoding / decoding.

In some embodiments of the present application, the long-term reference image remains unchanged after the second-type random access point is encoded and before the next frame of the second-type random access point is encoded / decoded, that is, the long-term reference image is not The reconstructed image of the second type of random access point is replaced. At least one frame of image coded after the second type of random access point can use this long-term reference image that has not been replaced for inter-frame coding / decoding.

For the encoding method to be out-of-order encoding, for example, the code-stream structure of hierarchical-B, after encoding a random access point, there are still several frames that need to be encoded before the random access point. When encoding these frames whose display order is before the random access point, it is necessary to maintain and continue to maintain the reference image buffer. After encoding all frames whose display order is before the random access point, maintain and delete the reference image buffer. All short-term reference images displayed in order before the random access point. During the decoding process, when a random access point is encountered, if the random access function is not used, that is, if the video is not played from the random access point, the reference image buffer needs to be maintained and continued to be maintained. Until all frames whose display order is before the random access point are decoded, the reference image buffer is adjusted to delete all short-term reference images whose display order is before the random access point. If the random access function of the random access point is used, frames whose display order is before the random access point are not decoded, and the reference image buffer is also retained at this time. Until the frames whose display order is after the random access point are decoded, all reference images whose display order is before the random access point are deleted, and then the reference image buffer is decoded and maintained.

What is given above is the processing method of short-term reference images when setting random access points. The characteristics of the long-term reference image are different from those of the short-term reference image. Generally, the short-term reference image is a reconstructed image of a decoded image, and the long-term reference image is synthesized through some mechanism, that is, the long-term reference image is obtained by using block-level refresh. As mentioned earlier, long-term reference images can be continuously updated during the encoding / decoding process. When a random access point is encountered, applying the short-term reference image processing method to long-term reference images may fail or may affect encoding efficiency. . In the case where there is a frame after the random access point in which the display order precedes the random access point participates in the encoding / decoding process, if the random access point problem is not considered when updating the long-term reference image, but the update is based on the encoded / decoded image, then It may cause that the frames whose display order is after the random access point cannot be decoded normally. If the existing encoding / decoding technology for random access points is used when updating the long-term reference image, the encoding / decoding efficiency may be low.

In this application, the case of "updating all image blocks of the long-term reference image" is referred to as "replacement of the long-term reference image", or "replacement of the long-term reference image"; Update other than ", for example," update only a part of the long-term reference image "is called" update the long-term reference image ", or" update long-term reference image ".

Next, some update / replace methods of updating / replacing long-term reference images involved in this application are exemplified.

The following describes some methods for updating the long-term reference image with reference to FIG. 2. As shown in FIG. 2, FIG. 2 is a schematic flowchart of an embodiment of a video encoding method of the present application. The method of updating a long-term reference image is performed by an image processing apparatus, and the image processing apparatus may be various types of chips for image processing, an image processor, and the like. As shown in Figure 2, the method includes:

101. When a current image to be encoded can be used as a reference image, and when an image block can be used to update a long-term reference image, a specific image block in the long-term reference image is updated according to the image block, where the image block is the current to-be-encoded image. An image block in an image. The position of a specific image block in a long-term reference image is determined by the position of the image block in the current image to be encoded.

In some embodiments, determining that the current image to be encoded may be used as a reference image may occur before encoding the image to be encoded. In this way, according to the determination result, it can be determined that each image block in the image to be encoded satisfies a condition for updating the long-term reference image when encoding or after encoding. Or, determining that the current image to be encoded can be used as a reference image, or it can occur when encoding or after encoding each image block in the image to be encoded, that is, when encoding each image block or after encoding, first determining the image block The current image can be used as a reference image, and when it can be determined as a reference image, it can be determined that the image block can be used to update the long-term reference image.

In some embodiments, since the I-picture and the P-picture (or forward-predicted picture) can be used as reference pictures for inter-prediction of other pictures, when it is determined that the currently-to-be-encoded picture is an I-picture or a P-picture (or forward-looking picture) (Predicted image), the current image to be encoded is determined as the reference image. In some embodiments, some B-pictures (or bi-predictive pictures) can also be used as reference pictures for inter-prediction of other pictures. For example, in hierarchical B (hierarchical B) technology, lower-level B-pictures can be used as reference. Frames are used. Therefore, when it is determined that the current image to be encoded is this part of the B image, it can also be determined that the current image to be encoded can be used as a reference image.

In some embodiments, after determining that the current to-be-encoded image can be used as a reference image, the encoding end is still in a video parameter set (VPS), a sequence parameter set (SPS), and a sequence header. Write parameters or flags in at least one of picture header, picture header, slice header, reference picture set (RPS), and reference picture configuration set (RCS), To indicate that the current to-be-encoded image can be used as a reference image.

In some embodiments, the judgment image block may be used to update a long-term reference image, and the judgment may be performed by using pixel information of the image block. For example, when it is determined that the content difference between an image block and a block at the same position in a previously encoded image is small, the image block is considered to contain background content and can be used to update a long-term reference image. Of course, there are other methods that can be used to determine which image patches are available for updating long-term reference images.

In some embodiments, the position of the specific image block in the long-term reference image and the position of the image block in the current image to be encoded are the same. As shown in FIG. 3, FIG. 3 is a schematic diagram of a relationship between an image block in a current image to be encoded and an image block in a long-term reference image. The image blocks 210 and 220 in the current to-be-encoded image shown in FIG. 3 are image blocks satisfying a preset condition, where the position of the image block 210 in the current to-be-encoded image is the same as the position of the image block 110 in the long-term reference image. The positions of the image block 220 in the current image to be encoded and the image 120 in the long-term reference image are the same.

In some embodiments, the position in the long-term reference image is offset by a preset offset value from the position of the image block in the current image to be encoded.

In some embodiments, updating a specific image block in the long-term reference image according to the image block, specifically replacing the current content of the specific reference block with the pixel value of the image block. In some embodiments, updating a specific image block in a long-term reference image according to the image block may be to replace the current pixel value of the specific reference block with the pixel value of the image block for processing, and the processing may be based on the pixels of the image block. The value is averaged with the pixel value of an image block in the long-term reference image, or a weighted average is performed according to the pixel value of the image block and the pixel value of the image block in the long-term reference image, where the coefficient of the weighted average is a preset value or Parsed in the code stream. When the pixel value of the image block is used, it may be the original pixel value of the image block or the reconstructed pixel value of the image block.

In video technology, the image information of an image that is not used as a reference image is not used in the encoding and decoding process of other images during the encoding and decoding process. In this embodiment, when it is determined that the current to-be-encoded image can be used as a reference image, the image block in the current to-be-encoded image is considered to update the long-term reference image, thereby avoiding the construction process of the long-term reference image and not being used as the reference image. The definition of the image violates the situation. In addition, since an image that is not used as a reference image does not need to be decoded during the decoding of other images, the image that is not used as a reference image may not be decoded when it is decoded, thereby speeding up the decoding speed of the video code stream and achieving Variable frame rate playback of video content. In this embodiment, when it is determined that the current to-be-encoded image can be used as a reference image, it is considered to update the long-term reference image by using the image block in the currently-to-be-encoded image, avoiding the use of the image block pair in the image that is not a reference image The long-term reference image cannot be updated when discarded parts cannot be used as the reference image.

In some embodiments, when the currently to-be-encoded image is not an image that can be used as a reference image, it is determined that the image block is not used to update the long-term reference image.

There are several ways to determine which image blocks can be used to update long-term reference images. For example, an image block can be used to update a long-term reference image based on the pixel value of the image block and the pixel value of the encoded block, where the encoded block refers to the image at a specific position in the encoded image that is located before the current image to be encoded. Piece. For example, the coded block refers to a coded image located in a previous frame (or the first two frames) of the currently-coded image. For example, the specific position may be the same position as the image block in the current image to be encoded, or the position of the image block in the current image to be encoded plus a preset offset value.

In some embodiments, when determining that an image block can be used to update a long-term reference image based on pixel information of the image block and pixel information of an encoded block, specifically determining that the image block can be used to update a long-term reference image according to at least one of the following:

Pixel value difference of the luminance component between the image block and the coded block;

The total number of pixels of the luminance component of the image block and / or the coded block;

Pixel value difference between chrominance components between the image block and the coded block;

The total number of pixels of the chroma component of the image block and / or the coded block.

The difference between the pixel values of the luminance components between the image block and the coded block may be the distribution of the difference between the pixel values of the luminance components of the image block and the second coded block at the same positions, respectively. The sum of the differences between the pixel values of the luminance components of the second encoding block at the same positions, or the difference between the average of the luminance components of the image block at each pixel and the average of the luminance components of the encoded block at each pixel. value.

The pixel value difference of the chrominance component between the image block and the coded block may be the distribution of the difference in pixel values of the chrominance components of the image block and the second coding block at the same positions, respectively, and may be an image The sum of the difference between the pixel values of the chroma components of the block and the second coding block at the same positions, respectively. It can also be the average value of the chroma components of the image block at each pixel and the chroma components of the coded block at each pixel. The difference of the mean.

The image block can be used to update the long-term reference image according to the pixel value difference of the luminance component between the image block and the coded block. Specifically, it can be based on the pixel value difference of the luminance component between the image block and the second coded block. The sum of the absolute value of the image block can be used to update the long-term reference image.

The image block can be used to update the long-term reference image according to the pixel value difference of the chrominance component between the image block and the coded block. Specifically, it can be based on the pixel value of the chrominance component between the image block and the second coded block. The sum of the absolute value of the difference judgment image block can be used to update the long-term reference image.

Wherein, in the embodiment in which the image block can be used to update the long-term reference image according to the pixel value difference of the luminance component between the image block and the coded block, optionally, when it is determined that the image block can be used to update the long-term reference image, The condition that the image block needs to satisfy includes: the number of specific pixels in the image block is less than the first threshold. Wherein, the specific pixel is a pixel whose pixel value difference in the first color channel from the pixel at the same position in the coded block is not less than the second threshold.

Specifically, an image is stored according to three components, which are Y (luminance), U (chrominance 1), and V (chrominance 2). The first color channel is a Y channel. The coded block is an image block that is located in the previous frame (or the first two frames) of the current image to be encoded, and whose position is the same as the position of the image block, or the position is compared to the image block in the current image to be encoded. An image block offset by a preset value. The pixel difference between the Y component of the encoded block and the Y component of the image block at the same arbitrary position is Dist. Wherein, when Dist is not less than the second threshold, the pixels in the image block corresponding to the Dist are specific pixels. When determining that the image block satisfies the condition that the number of specific pixels of the image block is not less than the first threshold value, the first threshold value may be a preset value, or the total number of pixels of the Y component of the image block and a preset value. The product of the proportions is not limited here.

Wherein, in the embodiment in which the image block can be used to update the long-term reference image according to the pixel value difference of the luminance component between the image block and the coded block, optionally, when it is determined that the image block can be used to update the long-term reference image, The condition that the image block needs to satisfy includes: a difference in pixel value of a luminance component between the image block and the coded block is less than a third threshold.

When it is determined that the image block satisfies the condition that the difference between the pixel values of the luminance components between the image block and the coded block is less than the third threshold value, the third threshold value may be a preset value or an image block The product of the total number of pixels of the Y component and a preset ratio is not limited here.

In the embodiment in which the image block can be used to update the long-term reference image according to the pixel value difference between the chrominance components between the image block and the coded block, optionally, when it is determined that the image block can be used to update the long-term reference image The condition that the image block needs to satisfy includes: a difference between pixel values of chrominance components between the image block and the coded block is less than a fourth threshold.

The pixel value difference between the chrominance components between the image block and the coded block is smaller than the fourth threshold, and may be the pixel value difference between the U component between the image block and the coded block is less than the fourth threshold, or the image The pixel value difference between the V component between the block and the second encoding block is less than the fourth threshold, or the pixel value difference between the U component between the image block and the encoded block is less than a preset value, and the image block and the The pixel value difference of the V component between the coding blocks is smaller than another preset value.

When it is determined that the image block satisfies the condition that the pixel value difference between the chrominance components between the image block and the coded block is less than the fourth threshold, the fourth threshold may be a preset value or an image block. The product of the total number of pixels of the luminance component (or chrominance component) and a preset ratio is not limited here.

The above describes some conditions that can be used to update the image blocks of the long-term reference image. In some embodiments, when the currently to-be-encoded image is an I-picture or a random access point, or when the currently-to-be-encoded image is both an I-picture and a random access point, the long-term reference is made according to all graphics blocks of the currently-to-be-encoded image. All image blocks of the image are updated. The entire image blocks of the image to be encoded may refer to all image blocks after the encoding and reconstruction of the current image to be encoded, or may refer to all original image blocks of the current to-be-encoded image. For example, all image blocks in the long-term reference image may be replaced with all image blocks of the current image to be encoded. Alternatively, all image blocks in the current to-be-encoded image are processed with certain image blocks to replace all image blocks in the long-term reference image. The process may be averaging or weighted averaging the pixel values of all image blocks in the current-to-code image. There are no restrictions here.

In some embodiments, the number of image blocks in the current to-be-encoded image that can be used to update the long-term reference image may be one or more than one. In some embodiments, the number of image blocks in the current to-be-encoded image that can be used to update the long-term reference image may be unlimited, that is, all image blocks in the current to-be-encoded image that meet the conditions that can be used to update the long-term reference image are used. For updating long-term reference images.

In some embodiments, the number of image blocks that can be used to update the long-term reference image is large. Considering the implementation complexity of the codec system, the number of image blocks that can be used to update the long-term reference image in the current image to be encoded can be limited to Greater than M, the M is an integer not less than 1.

In this way, after determining all image blocks in the current to-be-encoded image that can be used to update the long-term reference image, if the determined number of image blocks is not greater than M, the long-term reference image is updated according to each determined image block. For a method of updating the long-term reference image according to each image block, reference may be made to the foregoing description, and details are not described herein again. If the determined number of image blocks is greater than M, M image blocks are selected from the determined image blocks, and the long-term reference image is updated according to the M image blocks.

There are multiple methods for determining the value of M corresponding to the current image to be encoded. For example, the value of M corresponding to the current image to be encoded is determined based on the type of the current image to be encoded.

In some embodiments, when the type of the current image to be encoded is different, the value of M corresponding to the current image to be encoded is different. Specifically, for example, when the currently-to-be-encoded image is an I-picture, the currently-to-be-encoded image has at most a first preset number of image blocks for updating the long-term reference image; the currently-to-be-encoded image is a P-picture (or forward-predicted image) ), The currently to-be-encoded image has at most a second preset number of image blocks for long-term reference image update; when the currently-to-be-encoded image is a B-picture (or bidirectionally predicted image), the currently-to-be-encoded image has a maximum of The preset number of image blocks are used for updating the long-term reference image, wherein the first preset value, the second preset value, and the third preset value are different.

In some embodiments, the value of M corresponding to the current image to be encoded is determined based on the total number of images of the type to which the current image to be encoded belongs. Specifically, for example, the M value corresponding to the I image is half of the total number of images of the I image in the video; the M value corresponding to the P image (or forward prediction image) is the total number of images of the P image (or forward prediction image). One-quarter; the M value corresponding to the B-picture (or bi-predictive picture) that can be used as the reference picture is one-eighth of the total number of pictures of the B-picture (or bi-predictive picture).

There are various methods for selecting M image blocks from all the image blocks that can be used to update the long-term reference image, which are determined from the current image to be encoded. For example, in the case that the image is stored according to the components of different color channels, the cost selected from all the image blocks that can be used to update the long-term reference image determined from the current to-be-encoded image is less than a preset value, The least expensive M image blocks are used to update the long-term reference image.

The cost of the image block is the sum of the pixel differences of the image block in each color channel; the pixel differences of each image channel in the color block are: each pixel of the image block and the third coded block respectively The sum of the differences in pixel values of pixels with the same position in the color channel; the third coded block refers to the image block at a specific position in the coded image that is located before the current image to be coded. For example, the third coded block refers to a coded image located in a previous frame (or the first two frames) of the current image to be coded. For example, the specific position may be the same position as the image block in the current image to be encoded.

For example, the image is stored in three YUV components. The difference between the pixel values of the image block and the third encoding block on the three YUV components is Dist, Y, Dist, U, and Dist, and the total number of pixels on the luminance component of the image block is PixCount. Then the cost of the image block (cost) = (Dist Y + Dist U + Dist V) / PixCount. When determining that the current to-be-encoded image can be used to update the long-term reference image, it is determined that all the image blocks in the current to-be-encoded image cost less than a preset value. If the determined number of image blocks is not greater than M, all the image blocks are Both can be used to update the long-term reference image; if the determined number of image blocks is greater than M, it is determined that the M image blocks with the least cost among the current to-be-encoded images can be used to update the long-term reference image.

In some embodiments, an identification bit of the image block is also encoded, and the identification bit is used to identify whether the image block is used to update the long-term reference image. The code stream sent from the encoding end to the decoding end also includes identification bits of each image, where the identification bits of each image are used to indicate whether each image block in the image is used to update the long-term reference image.

The following describes the video encoding method with specific examples.

As shown in FIG. 4, FIG. 4 is a schematic diagram of a relationship between multiple images and a long-term reference image in a video. In FIG. 4, four images in the video are sequentially encoded as an example. The first three images are encoded images, and the fourth image is the current image to be encoded. Of the first three images, the first and third encoded images can be used as reference images, and the second encoded image cannot be used as a reference image. Among them, the image block 11 and the image block 12 in the first encoded image are used to update the long-term reference image. Image blocks 31, 32, 33, and 34 in the third encoded image are used to update the long-term reference image. For example, the pixel values in the image blocks 11, 12, 31, 32, 33, and 34 are respectively used to replace the pixel values of the image blocks at the same position in the long-term reference image.

The following uses a video encoding method for encoding a current image to be encoded as an example. Before encoding the current image to be encoded, it is determined that the current image to be encoded can be used as a reference image according to the type of the current image to be encoded. For example, since the current image to be encoded is an I image, it is determined that the current image to be encoded can be used as a reference image. Therefore, when encoding each image block in the current image to be encoded separately, for each image block, it is determined whether the image block can be used to update the long-term reference image.

The following describes a method for determining that an image block can be used to update a long-term reference image with two examples.

Example one

Images are stored according to three components, which are Y, U, and V components. Any image block in the current to-be-encoded image is referred to as image block 1, and the image block in the previous frame of the currently-encoded image that has the same position as image block 1 is referred to as image block 2.

In the Y component of the image block 1 and the image block 2, the sum of the difference between the pixel values of the pixels at the same positions is DistY. The sum of the differences between the pixel values of the pixels at the same positions in the U components of the image block 1 and the image block 2 is DistU. The sum of the difference between the pixel values of the pixels at the same positions in the V components of the image block 1 and the image block 2 is DistV.

The total pixel of the luminance component of image block 1 is PixCount, and the large error point count between image block 1 and image block 2 is LargeDist1. The initial value of LargeDist1 is set to 0. When a pixel having a difference in pixel value at the same position as the brightness component of image block 2 in the luminance component of image block 1 is larger than a preset value (eg, 20), LargeDist1 is incremented by one.

When the following four conditions are satisfied at the same time, image block 1 is considered to be a candidate block for updating the long-term reference image.

a) LargeDist1 is smaller than the preset ratio of PixCount (for example, 1%);

b) Dist is smaller than a preset multiple of PixCount (for example, 4 times);

c) DistU is less than the preset multiple of PixCount (for example, 0.5 times);

d) DistV is smaller than a preset multiple of PixCount (for example, 0.5 times).

Among the candidate blocks in the current to-be-encoded image that conform to the updated long-term reference image, when the number of the candidate blocks is not greater than M, all the candidate blocks are used as the updated long-term reference image.

Among the candidate blocks in the current to-be-encoded image that meet the updated long-term reference image, when the number of the candidate blocks is greater than M, the cost of each image block is recorded as: Cost = (DistY + DistU + DistV) / PixCount. The M candidate blocks with the lowest cost are selected from the currently to-be-coded images for updating the specific reference image.

Example two

Images are stored according to three components, which are Y, U, and V components. Any image block in the current to-be-encoded image is referred to as image block 1, and the image block in the previous frame of the current to-be-encoded image that has the same position as image block 1 is referred to as image 2, and the current to-be-encoded image is referred to as image 2. In the first two frames of the encoded image, the image block having the same position as the image block 1 is the image 3.

The sum of the difference between the pixel values of the pixels at the same position among the Y components of the image block 1 and the image block 3 is DistY '. The sum of the difference between the pixel values of the pixels at the same position among the U components of the image block 1 and the image block 2 is DistU '. The sum of the difference between the pixel values of the pixels at the same position among the V components of the image block 1 and the image block 2 is DistV '.

The total pixel of the luminance component of image block 1 is PixCount, the large error point count between image block 1 and image block 2 is LargeDist1, and the large error point count between image block 1 and image block 3 is LargeDist2. Set the initial values of LargeDist1 and LargeDist2 to 0. When a pixel value at the same position as the brightness component of image block 2 appears in the luminance component of image block 1 and the pixel value difference is greater than a preset value (for example, 20), LargeDist1 accumulates. 1. When a pixel value difference at the same position as the brightness component of the image block 3 in the luminance component of the image block 1 is larger than a preset value (for example, 20), LargeDist2 is incremented by 1.

When the following four conditions are satisfied at the same time, image block 1 is considered as a candidate block for updating a specific background block.

a) Both LargeDist1 and LargeDist2 are smaller than the preset ratio of PixCount (for example, 2%);

b) Dist Y and Dist Y ′ are both smaller than a preset multiple of PixCount (for example, 6 times);

c) Dist U and Dist U ’are both smaller than the preset multiple of PixCount (for example, 0.5 times)

d) Both DistV and DistV ’are smaller than the preset multiple of PixCount (for example, 0.5 times)

After determining all the image blocks (specifically image blocks 41, 42, 43 and 44 in FIG. 4) that can be used to update the long-term reference image in the current image to be encoded, the image blocks that can be used to update the long-term reference image are used to compare the long-term reference image. Update. For example, as shown in FIG. 4, the pixel values in the image blocks 41, 42, 43, and 44 are respectively used to replace the pixel values of the image blocks at the same position in the long-term reference image.

As shown in FIG. 5, FIG. 5 is a schematic flowchart of still another embodiment of a video encoding method of the present application. As shown in Figure 5, the method includes:

401. When the current to-be-encoded image can be used to update the long-term reference image, and when the image block is available to update the long-term reference image, update a specific image block in the long-term reference image according to the image block, where the image block is An image block in the image to be encoded. The position of the specific image block in the long-term reference image is determined by the position of the image block in the current image to be encoded.

There are multiple methods for determining that the current to-be-encoded image can be used to update the long-term reference image.

In some embodiments, when the currently-to-be-encoded image is available for inter prediction, it is determined that the currently-to-be-encoded image can be used to update the long-term reference image. For example, when it is determined that the current image to be coded is an intra-prediction image or a forward prediction image, it is determined that the current to-be-coded image can be used to update the long-term reference image. For example, since I-pictures and P-pictures (or forward-predicted pictures) can be used as reference pictures for inter-frame prediction of other pictures, when it is determined that the currently to-be-encoded picture is an I-picture or a P-picture (or forward-predicted picture) To determine the current image to be encoded as the reference image. In some embodiments, some B-pictures (or bi-predictive pictures) can also be used as reference pictures for inter-prediction of other pictures. For example, in hierarchical B (hierarchical B) technology, lower-level B-pictures can be used as reference. Frames are used. Therefore, when it is determined that the current image to be encoded is this part of the B image, it can also be determined that the current image to be encoded can be used as a reference image.

In some embodiments, when the current to-be-coded image is an intra-predicted image and / or a random access point, all image blocks of the long-term reference image are updated according to all the image blocks of the currently-to-code image.

In some embodiments, when an image in a video can be used as a reference image (that is, it can be used for inter prediction), it is not distinguished as a long-term reference image or a short-term reference image, as long as the current to-be-encoded image can be framed as other images. The inter-predicted reference image can determine that the current to-be-encoded image can be used as the reference image. At this time, the image block in the to-be-encoded image can be used to update a specific image block.

In some embodiments, when the current to-be-coded image is not available as a short-term reference image and cannot be used as a long-term reference image, it is determined that an image block in the to-be-coded image is not available for updating a specific image block in the long-term reference image.

In some embodiments, when the current to-be-coded image can be used as a short-term reference image and can be used as a long-term reference image, it is determined that an image block in the to-be-coded image can be used to update the long-term reference image.

In some embodiments, when the current to-be-encoded image is not available as a short-term reference image but can be used as a long-term reference image, it is determined that the to-be-encoded image can be used to update the long-term reference image.

In some embodiments, when the current to-be-coded image can be used as a short-term reference image but cannot be used as a long-term reference image, determining that the to-be-coded image is not available for updating the long-term reference image, that is, determining the Image patches are not used to update specific image patches in the long-term reference image.

In some embodiments, when the current to-be-coded picture can be used as a short-term reference picture but not as a long-term reference picture, it is determined that the to-be-coded picture can be used to update the long-term reference picture.

In some embodiments, when the current to-be-coded image can be used as a short-term reference image and can be used as a long-term reference image, it is determined that the to-be-coded image can be used to update the long-term reference image.

In some embodiments, an identification bit is further added in at least one of the following, and the identification bit is used to identify whether an image that cannot be used as a short-term reference image can be used to update the long-term reference image:

Video parameter set, sequence parameter set, sequence header, image parameter set, image header, slice header, reference image set, reference configuration set.

Optionally, the flag is a time-domain scalable flag. When there is a need for time domain scalability, the value of the identification bit is used to indicate that an image block in an image that is not a short-term reference image cannot be used to update the long-term reference image; and / or, when there is no time-domain scalability When required, the value of the identification bit is used to indicate that an image block in an image that is not a short-term reference image can be used to update the long-term reference image.

Considering that image blocks that are not available as short-term reference images are not used to update long-term reference images, it will affect the update speed of specific reference frames, which will affect the effect of specific reference frames on improving coding quality. An image block in the current to-be-encoded image that is a long-term reference image can still be used to update a specific reference frame "scheme, which can reduce the impact.

Moreover, the meaning of the scheme "the image blocks in the current to-be-encoded image that cannot be used as short-term reference pictures but can be used as long-term reference frames" can be used to update long-term reference pictures. When encoding characteristics or directly discarding the short-term reference image when decoding is not important, it can ensure the effect of specific reference frames on improving the encoding quality while taking into account the concept of short-term reference images, which cannot be used as a reference. The definition of the image of the image; when the encoding system is concerned about the parallel encoding characteristics that are not used as short-term reference pictures or directly discards the characteristics that are not used as short-term reference pictures to speed up the encoding speed, the An image block cannot be used to update a specific reference frame.

Further, at least one of a video parameter set, a sequence parameter set, a sequence header, an image parameter set, an image header, a slice header, a reference image set, and a reference configuration set may be added with an identification bit, and the identification bit is used to identify that the Whether the image block information in the image of the short-term reference image can be used to update a specific reference frame. In some embodiments, the identification bit may also be a time-domain scalable identification bit. When the encoding system has a time domain scalability requirement, the value of the flag is used to indicate that the image block in the image that is not a short-term reference image is not available for the update of a specific reference frame; and / or, when the encoding system does not have When the time domain is scalable, the value of the flag is used to indicate that an image block in an image that is not a short-term reference image can be used for updating a specific reference frame.

For the explanation of the long-term reference image and the specific image block, refer to the explanation of the long-term reference image and the specific image block in the above description, and details are not described herein again.

Regarding how to determine that the image block can be used to update the long-term reference image, refer to the explanation of “determining the image block can be used to update the long-term reference image” in the above description, which is not repeated here.

Regarding how to update specific image blocks in the long-term reference image according to the image blocks in the current image to be encoded, refer to the description above for "specific image blocks in the long-term reference image based on the image blocks in the current image to be encoded. "Update" explanation will not be repeated here.

In some embodiments, when the type of the current image to be encoded is different, the value of M corresponding to the current image to be encoded is different. Specifically, for example, when the currently-to-be-encoded image is an I-picture, the currently-to-be-encoded image has at most a first preset number of image blocks for updating the long-term reference image; the currently-to-be-encoded image is a P-picture (or forward-predicted image) ), The currently to-be-encoded image has at most a second preset number of image blocks for long-term reference image updating; when the currently-to-be-encoded image is a B-picture (or bidirectionally predicted image), the currently-to-be-encoded image has at most a third The preset number of image blocks are used for updating the long-term reference image, wherein the first preset value, the second preset value, and the third preset value are different.

There are multiple methods for selecting M image blocks from all the image blocks that can be used to update the long-term reference image, which are determined from the current image to be encoded. For details, reference may be made to the explanation in the above description of “the method for selecting M image blocks from all the image blocks that can be used to update the long-term reference image determined from the current to-be-encoded image”, and details are not described herein.

In video technology, an image that is not used as a reference image is not used in the encoding and decoding process of other images during the encoding and decoding process, that is, when some images are unavailable for updating the long-term reference image, they are also deleted. Used to update long-term reference images. In this embodiment, only when it is determined that the current to-be-encoded image can be used to update the long-term reference image, the image block in the currently-to-be-encoded image is considered to update the long-term reference image, which avoids that when some images are not available to update the long-term reference image It is also used in the case of updating the long-term reference image, for example, avoiding the case where the construction process of the long-term reference image is inconsistent with the definition of an image that is not used as a reference image. In addition, since an image that is not used as a reference image does not need to be decoded during the decoding of other images, the image that is not used as a reference image may not be decoded when it is decoded, thereby speeding up the decoding speed of the video code stream and achieving Variable frame rate playback of video content. Further, in some embodiments, when it is determined that the current to-be-encoded image can be used as a reference image, the use of image blocks in the current to-be-encoded image is considered to update the long-term reference image, which avoids using an image that cannot be used as a reference image. When the image block in the image updates the long-term reference image, it is impossible to realize the situation that the discarded part cannot be used as the reference image.

In some embodiments, when the current to-be-coded image is not available for updating the long-term reference image, the specific image block in the long-term reference image is not updated with the image block in the currently-to-be-coded image. For example, when the currently-to-be-encoded image is unavailable for inter prediction, it is determined that the currently-to-be-encoded image is not available for updating the long-term reference image. As another example, when the current to-be-coded image can be used as a short-term reference image but cannot be used as a long-term reference image, it is determined that an image block in the to-be-coded image is not used to update a specific image block in the long-term reference image. For another example, when the current to-be-coded image is not a short-term reference image or a long-term reference image, it is determined that the image block in the to-be-coded image is not used to update a specific image block in the long-term reference image.

In some embodiments, after determining that the current to-be-encoded image can be used to update the long-term reference image, the encoding end is still in a video parameter set (VPS), a sequence parameter set (SPS), and a sequence header. write parameters or identifiers in at least one of header, picture header, slice header, reference picture set (RPS), and reference picture configuration set (RCS) Bit to indicate that the currently-to-be-encoded picture is available for updating the long-term reference picture.

In some embodiments, after determining that the current to-be-encoded image is available for updating the long-term reference image, an identification bit of the image block in the currently-to-be-encoded image is also used to identify whether the image block in the currently-to-be-encoded image is For updating the long-term reference image.

The video encoding method according to the embodiment of the present application has been described above from the encoding side with reference to FIG. 2, FIG. 3, FIG. 4, and FIG. 5. The video encoding method according to another embodiment of the present application will be described in detail from the decoding side in conjunction with FIG. FIG. 6 is a schematic diagram of a video decoding method according to another embodiment of the present application. The method may be performed by an image processing apparatus, which may be various types of chips for image processing, an image processor, and the like. As shown in Figure 6, the video decoding method includes:

501: When a current to-be-decoded image can be used as a reference image, and when an image block can be used to update a long-term reference image, a specific image block in the long-term reference image is updated according to the image block, where the image block is all The image block in the current image to be decoded is described, and the position of the specific image block in the long-term reference image is determined by the position of the image block in the current image to be decoded.

In some embodiments, determining that the current image to be decoded can be used as a reference image may occur before decoding the image to be decoded. In this way, according to the determination result, when each image block in the image to be decoded is decoded or after decoding, it is determined that each image block meets a condition for updating the long-term reference image. Or, determining that the current image to be decoded can be used as a reference image, can also occur when decoding or decoding each image block in the image to be decoded, that is, when decoding or decoding each image block, first determine the image block The current image is an image that can be used as a reference image. When it is determined that the image can be used as a reference image, the image block can be used to update the long-term reference image.

In some embodiments, it is determined that the current image to be decoded can be used as a reference image. A parameter or an identification bit indicating a reference relationship of the current image to be decoded can be obtained, and whether the current image to be decoded is available is determined according to the parameter or the identification bit. Image as a reference image. Among them, parameters or identification bits used to indicate the reference relationship of the current to-be-decoded image can be obtained through various ways. For example, from video parameter set (VPS), sequence parameter set (SPS), sequence header, picture header, slice header, reference image set (reference picture set (RPS) and reference picture configuration set (RCS) to obtain a parameter or an identification bit indicating a reference relationship of a current picture to be decoded.

In some embodiments, since the I-picture and the P-picture (or forward-predicted picture) can be used as reference pictures for inter-frame prediction of other pictures, when determining whether the currently to-be-decoded picture is an I-picture or a P-picture (or forward-looking picture) (Predicted image), it is determined that the image to be decoded is an image that can be used as a reference image. In some embodiments, some B pictures (or bi-predictive pictures) can also be used as reference pictures for inter prediction. For example, in hierarchical B (hierarchical B) technology, lower-level B pictures can be used as reference frames. It is used, therefore, when it is determined that the current image to be encoded is this part of the B image, it can also be determined that the current image to be encoded can be used as a reference image.

The method for judging the image block can be used to update the long-term reference image. For the method for judging that the image block in the current to-be-encoded image can be used to update the long-term reference image, refer to the method described above. Or, in some embodiments, the decoding end further parses the identification bits of each image from the code stream, wherein the identification bits of each image are used to indicate whether each image block in the image is used to update the long-term reference image. . The decoding end can obtain an identification bit of the image block from the decoding end, and determine whether the image block is used to update the long-term reference image according to the identification bit.

The method for updating a specific image block in a long-term reference image according to the image block may refer to the method for updating a specific image block in a long-term reference image according to the image block in the current image to be encoded in the above description. To repeat. The position of a specific image block in the long-term reference image is specifically determined by the position of the image block in the current image to be decoded. For details, refer to step 101. How is the position of the specific image block in the long-term reference image determined by the image? The position of the block in the current image to be encoded is determined, and details are not described herein again.

In some embodiments, the number of image blocks that can be used to update the long-term reference image in the current to-be-decoded image may be one or more than one. In some embodiments, the number of image blocks in the current to-be-decoded image that can be used to update the long-term reference image may be unlimited, that is, all image blocks in the current to-be-decoded image that meet the conditions that can be used to update the long-term reference image are used. For updating long-term reference images.

In some embodiments, the number of image blocks that can be used to update the long-term reference image is large. Considering the implementation complexity of the codec system, the number of image blocks that can be used to update the long-term reference image in the current to-be-decoded image can be limited to Greater than M, the M is an integer not less than 1.

In this way, after determining all image blocks in the current to-be-decoded image that can be used to update the long-term reference image, if the determined number of image blocks is not greater than M, the long-term reference image is updated according to each determined image block. For a method of updating the long-term reference image according to each image block, reference may be made to the foregoing description, and details are not described herein again. If the determined number of image blocks is greater than M, M image blocks are selected from the determined image blocks, and the long-term reference image is updated according to the M image blocks.

The value of M corresponding to the current image to be decoded may be based on multiple determination methods. For example, the value of M corresponding to the current image to be decoded is determined based on the type of the current image to be decoded. In some embodiments, when the image types of the current image to be decoded are different, the values of M corresponding to the current image to be decoded are different. Specifically, for example, when the current image to be decoded is an I image, the current image to be decoded has at most a first preset number of image blocks for updating the long-term reference image; the current image to be decoded is a P image (or a forward prediction image) ), The currently to-be-decoded image has at most a second preset number of image blocks for updating the long-term reference image; when the currently-to-be-decoded image is a B-picture (or bidirectionally predicted image), the currently-to-be-decoded image has up to a third The preset number of image blocks are used for updating the long-term reference image, wherein the first preset value, the second preset value, and the third preset value are different.

In some embodiments, the value of M corresponding to the current image to be decoded is determined based on the total number of images of the type to which the current image to be decoded belongs. Specifically, for example, the M value corresponding to the I image is half of the total number of images of the I image in the video; the M value corresponding to the P image (or forward prediction image) is the total number of images of the P image (or forward prediction image). One-quarter; the M value corresponding to the B-picture (or bi-predictive picture) that can be used as the reference picture is one-eighth of the total number of pictures of the B-picture (or bi-predictive picture).

In some embodiments, at least one of the following carries the number of image blocks in the current to-be-decoded image that can be used to update the long-term reference image: the image header of the currently-to-decode image, the image parameter set of the currently-decoded image, and the current-to-be-decoded image block. The sequence header corresponding to the decoded image, the sequence parameter set corresponding to the current image to be decoded, and the video parameter set corresponding to the current image to be decoded. The decoder can parse out the number of image blocks in the current to-be-decoded image that can be used to update the long-term reference image.

In some embodiments, the value of M corresponding to the current to-be-decoded image is only used to inform the decoding end how many image blocks of the current image can be used for updating the specific reference frame, which can facilitate the design of the decoding end and reduce the complexity of the decoding end. degree.

As shown in FIG. 7, FIG. 7 is a schematic flowchart of another embodiment of a video decoding method of the present application. As shown in Figure 7, the method includes:

601. When a currently-to-be-encoded image is available to update a long-term reference image, and when an image block is available to update a long-term reference image, update a specific image block in the long-term reference image according to the image block, where the image block Is an image block in the current to-be-coded image, the position of the specific image block in the long-term reference image is determined by the position of the image block in the currently-to-be-coded image.

In some embodiments, when the currently-to-be-decoded picture is available for inter prediction, it is determined that the currently-to-be-decoded picture can be used to update the long-term reference picture. For example, since I-pictures and P-pictures (or forward-predicted pictures) can be used as reference pictures for inter-prediction of other pictures, when the current picture to be decoded is determined to be an I-picture or a P-picture (or forward-predicted picture) To determine that the current image to be decoded is an image that can be used as a reference image. In some embodiments, some B pictures (or bi-predictive pictures) can also be used as reference pictures for inter prediction. For example, in hierarchical B (hierarchical B) technology, lower-level B pictures can be used as reference frames. It is used, therefore, when it is determined that the current image to be encoded is this part of the B image, it can also be determined that the current image to be encoded can be used as a reference image.

In some embodiments, when the currently-to-be-encoded image is an intra-predicted image and / or a random access point, all image blocks of the long-term reference image are updated according to all-image blocks of the currently-to-be-decoded image. For specific explanation, reference may be made to the above explanation of “updating all image blocks of the long-term reference image according to all image blocks of the currently to-be-decoded image”, and details are not described herein again.

Determine that the current image to be decoded can be used as a reference image (that is, it can be used for inter prediction). You can obtain a parameter or identification bit that indicates the reference relationship of the current image to be decoded, and determine the current image to be decoded according to the parameter or identification bit Whether it is an image that can be used as a reference image. The reference relationship of the current to-be-decoded picture may refer to whether the currently-decoded picture is a short-term reference picture or a long-term reference picture. Among them, parameters or identification bits used to indicate the reference relationship of the current to-be-decoded image can be obtained through various ways. For example, from video parameter set (VPS), sequence parameter set (SPS), sequence header, picture header, slice header, reference image set (reference picture set (RPS) and reference picture configuration set (RCS) to obtain a parameter or an identification bit indicating a reference relationship of a current picture to be decoded.

In some embodiments, when an image in a video can be used as a reference image, it is not distinguished as a long-term reference image or a short-term reference image, as long as the current image to be decoded can be used as a reference image for inter-frame prediction of other images, it can be determined. The current picture to be decoded can be used as a reference picture, and the current picture to be encoded can be used to update the long-term reference picture.

In some embodiments, when the current to-be-decoded image is not available as a short-term reference image and cannot be used as a long-term reference image, determining that the to-be-decoded image is not available for updating the long-term reference image, that is, determining the Image patches are not used to update specific image patches in the long-term reference image.

In some embodiments, when the current image to be decoded can be used as a short-term reference image and can be used as a long-term reference image, it is determined that the image block in the image to be decoded can be used to update the long-term reference image.

In some embodiments, when the current to-be-decoded image is not available as a short-term reference image but can be used as a long-term reference image, it is determined that the to-be-decoded image can be used to update the long-term reference image.

In some embodiments, when the current image to be decoded can be used as a short-term reference image but cannot be used as a long-term reference image, determining that the image to be decoded is not available for updating the long-term reference image, that is, determining the Image patches are not used to update specific image patches in the long-term reference image.

In some embodiments, when the current picture to be decoded can be used as a short-term reference picture but cannot be used as a long-term reference picture, it is determined that the picture to be decoded can be used to update the long-term reference picture.

In some embodiments, when the current picture to be decoded can be used as a short-term reference picture and can be used as a long-term reference picture, it is determined that the picture to be decoded can be used to update the long-term reference picture.

There are various methods for determining which image is the currently decoded image. For example, it can be analyzed in the video parameter set, sequence parameter set, sequence header, image parameter set, image header, slice header, reference image set, and reference configuration set. Obtain whether the current image is a short-term reference image or a long-term reference image.

Further, an identification bit may be parsed by using at least one of a video parameter set, a sequence parameter set, a sequence header, an image parameter set, an image header, a slice header, a reference image set, and a reference configuration set. It is used to identify whether the image to be decoded is available for updating the long-term reference image. In some embodiments, the identification bit may also be a time-domain scalable identification bit. When the decoder has time domain scalability requirements, the value of this flag is used to indicate that the image block in the image that is not a short-term reference image is not available for the update of a specific reference frame; and / or, when the decoder does not have the time domain When scalability is required, the value of this flag is used to indicate that image blocks in an image that is not a short-term reference image can be used for the update of a specific reference frame.

In some embodiments, an identification bit of an image block of the image to be decoded is further obtained, where the identification bit is used to identify whether the image block in the image to be decoded is used to update the long-term reference image. An image block that can be used to update the long-term reference image is determined according to the identification bit.

Regarding how to update specific image blocks in the long-term reference image according to the image blocks in the current image to be decoded, refer to the description above for "specific image blocks in the long-term reference image based on the image blocks in the current image to be decoded. "Update" explanation will not be repeated here.

Among them, in some embodiments, the number of image blocks that can be used to update the long-term reference image in the current to-be-decoded image may be one or more than one. In some embodiments, the number of image blocks in the current to-be-decoded image that can be used to update the long-term reference image may be unlimited, that is, all image blocks in the current to-be-decoded image that meet the conditions that can be used to update the long-term reference image are used. For updating long-term reference images.

In some embodiments, when it is determined that the current to-be-decoded image is not available for inter prediction of an image other than the current to-be-decoded image, it is determined that the image block is not used to update the long-term reference image.

In some embodiments, when the current picture to be decoded can be used as a short-term reference picture but not as a long-term reference picture, it is determined that the image block in the picture to be decoded is not used to update the long-term reference picture.

In some embodiments, when the current to-be-decoded image is not available as a short-term reference image and cannot be used as a long-term reference image, it is determined that an image block in the to-be-decoded image is not available for updating the long-term reference image.

In the embodiment of the present application, the code stream includes a first type frame and a second type frame, and may further include a third type frame and a fourth type frame. The first type frame is an I frame or a random access point, or the first type frame is an I frame and a random access point at the same time. The second type frame is an inter-coded frame whose encoding order is after the first type frame and whose display order is before the first type frame. The second type frame may be a P frame or a B frame. The third type frame is an inter-coded frame whose coding order precedes the first type frame. The display order of the third type frame may precede the first type frame. The third type frame may be a P frame or a B frame. The fourth type frame is an inter-coded frame whose encoding order is after the first type frame and whose display order is after the first type frame. The fourth type frame may be a P frame or a B frame.

The inter-frame coding mentioned herein includes inter-frame coding of an entire frame of images or inter-frame coding of at least one image block in the entire frame of images. This definition of inter-frame coding applies to all types of frames.

The encoding method according to the embodiment of the present application will be described in detail below from the perspective of the encoding end.

FIG. 8 is a schematic flowchart of an encoding method 800 according to an embodiment of the present application. As shown in FIG. 8, the encoding method 800 includes the following steps.

S810. Encode the first type frame. In the encoding order, there are N second type frames that need to be inter-frame encoded after the first type frame. The display order of the N second type frames is before the first type frame. Where N is a positive integer.

S820. Inter-code at least one second type frame of the N second type frames according to the long-term reference image.

S830. After completing inter-frame coding of at least one second type frame among the N second type frames, replace the currently used long-term reference image.

The encoding method in the embodiment of the present application is only replaced after completing the inter-frame encoding of at least one second-type frame among the N second-type frames in which the encoding order is after the first-type frame and the display order is before the first-type frame. The currently used long-term reference picture enables at least one second-type frame to reference the long-term reference picture before the first-type frame, which can improve coding efficiency.

Optionally, encoding the first-type frame refers to performing intra-frame encoding on the first-type frame.

Optionally, the first type of frame is a random access point. In this embodiment, the current long-term reference is replaced only after the inter-frame encoding of at least one second type frame among the N second type frames in the display order before the random access point is completed after the random access point is completed. An image, so that at least one second type frame can refer to a long-term reference image before a random access point, which can improve coding efficiency. It should be understood that the method in the embodiment of the present application may be applied only to a random access point in a video, or may be applied to an I frame in a video.

Generally speaking, at least one second type frame in S830 is concentrated on a part of the N second type frames in the coding order, and is particularly located at the front of the N second type frames. For example, the at least one second-type frame may be a part of the second-type frame in which the coding order is relatively earlier among the N second-type frames. The solution in the embodiment of the present application is not limited to the case of S830. That is, at least one second type frame may refer to a long-term reference image before a random access point, and the at least one second type frame may be located at any position among the N second type frames, respectively. The second long-term reference image is replaced by the second long-type reference frames of the N second-type frames except for at least one second-type frame. Optionally, the at least one second-type frame may be a part of the second-type frame in which the display order is relatively higher among the N second-type frames. At least one second type frame may also be located at other positions in the N second type frames, which is not limited in this embodiment of the present application.

In a specific embodiment of the present application, after S830 completes the inter-frame encoding of at least one second type frame of the N second type frames, replacing the currently used long-term reference image may include: after completing the After inter-coding of all second-type frames in the N second-type frames, the currently used long-term reference image is replaced. That is, all the coding sequences are after the first type frame, and the frames in the display order before the first type frame refer to the long-term reference image before the first type frame. All the second type frames refer to the long-term reference image before the random access point, which can maximize the coding efficiency.

The encoding method according to another embodiment of the present application may include the following steps. Intra-frame encoding of the first type of frame, in which, according to the encoding order, there is a third type of frame (which may be one or more frames) for inter-frame encoding before the first type of frame, and there is a display after the first type of frame N second type frames that need to be inter-frame coded before the first type frame, and the third type frame is encoded with reference to a long-term reference image. Based on the long-term reference image, inter-frame encoding is performed on at least one second-type frame of the N second-type frames.

At least one second type frame may reference a long-term reference image before the random access point, that is, at least one second type frame is based on the long-term reference image before the random access point. The second type frames other than at least one second type frame in the N second type frames refer to the new long-term reference image after replacement, for example, the long-term reference image after the random access point.

Further, based on the long-term reference image before the random access point is referenced, all the second type frames of the N second type frames may be inter-frame coded.

In some embodiments of the present application, there is a third type frame that needs to be inter-frame encoded before the first type frame in the coding order. S820 Inter-coding at least one second type frame of the N second type frames based on the long-term reference image may include: updating the long-term reference image after inter-frame coding based on the third type frame, and At least one second type frame of the N second type frames is inter-frame coded. The updated long-term reference image may be updated based on a specific image block of the third type frame after the third type frame is inter-coded; or, it may be a specific image block of the third type frame. After the image block is inter-coded, it is updated based on the specific image block. Optionally, the third type frame is displayed before the first type frame.

In a specific embodiment of the present application, the long-term reference image before the first-type frame may be an updated long-term reference image after one or more frames of the encoded image before the first-type frame. The long-term reference picture may be based on, for example, a long-term reference picture before a first type of frame. The long-term reference image before the first type of frame can be continuously updated, for example, the long-term reference image after one or more encoded images in the third type of frame is updated before the first type of frame updated in this document. The long-term reference picture is still referred to as the long-term reference picture before the first type of frame.

In some embodiments of the present application, at least one second type frame of the N second type frames is inter-frame encoded based on the long-term reference image, and one or more frames in the second type frame have been encoded. The image updates the long-term reference image for reference to subsequent frames to be encoded.

In a specific embodiment, S820 performs inter-frame encoding on at least one second type frame of the N second type frames based on the long-term reference image, which may include: performing the N second type based on the long-term reference image. At least one second-type frame in the frame is inter-coded, and when at least a portion of the second-type frame completes encoding, the long-term reference image is updated by using a specific image block in the at least part of the second-type frame that has been encoded, The updated long-term reference picture is used as the long-term reference picture of the next type 2 frame.

In a specific embodiment, S820 performs inter-frame encoding on at least one second type frame of the N second type frames based on the long-term reference image, which may include: performing the N second type based on the long-term reference image. At least one second-type frame in the frame is inter-coded. When encoding is performed on at least a part of the specific image block of the second-type frame, the long-term reference image is updated by using the specific image block that has been encoded, and the updated long-term reference image is used. The reference image is used as a long-term reference image of the next frame of the second type or as a long-term reference image of the current frame of the second type.

In some embodiments of the present application, replacing the currently used long-term reference image may include: replacing the currently used long-term reference image with the reconstructed image of the first type frame.

In a specific embodiment, replacing the currently used long-term reference image may include: after completing encoding the first type frame, placing the reconstructed image of the first type frame into a reference image buffer Medium; after completing inter-frame encoding of at least one second type frame of the N second type frames, outputting a reconstructed image of the first type frame from the reference image buffer, and using the The reconstructed image of the first type of frame replaces the currently used long-term reference image. That is, the reconstructed image of the first type of frame may be put into a reference image buffer, and used as a short-term reference image and stored all the time. After the inter-frame encoding of at least one frame of the second type is completed, the reconstructed image of the frame of the first type is output from the reference image buffer as a new long-term reference image in the encoding process.

In a specific embodiment, placing the reconstructed image of the first type frame in a reference image buffer may include: when the reconstructed image of the first type frame does not exist in the reference image buffer When the reconstructed image of the first type frame is placed in the reference image buffer. That is, before putting the reconstructed image of the first type frame into the reference image buffer, it can be determined whether there is a reconstructed image of the first type frame in the reference image buffer. When the reconstructed image of the first type frame does not exist in the reference image buffer, the reconstructed image of the first type frame is placed in the reference image buffer.

In some embodiments of the present application, there is a fourth type frame that needs to be inter-frame encoded after the first type frame in the coding order, and the display order of the fourth type frame is after the first type frame. The encoding method may further include: performing inter-frame encoding on the fourth type frame based on the replaced long-term reference image as a basis for the long-term reference image.

Optionally, the fourth type frame may include one frame or multiple frames.

Optionally, the fourth type frame may be inter-coded with reference to the replaced long-term reference image. Alternatively, when the fourth type frame includes multiple frames, one or more frames of multiple fourth type frames may be inter-frame-coded based on the replaced long-term reference image, and a frame or The multi-frame encoded image updates the long-term reference image for reference of subsequent frames to be encoded.

In a specific embodiment, multiple fourth types may be inter-coded based on the replaced long-term reference image, and when each fourth-type frame is encoded, the long-term reference is updated using the fourth-type frame that has been encoded. An image, using the updated long-term reference image as a long-term reference image of a next fourth type frame.

In a specific embodiment, the long-term reference image remains unchanged after the first type frame is encoded and before the next frame of the first type frame is encoded in the encoding order.

In a specific embodiment, after updating the long-term reference image, after encoding at least part of the image frames other than the first type frame, the long-term reference image is based on the partial image blocks in the image frame. Update part of the image blocks in the image; or, after encoding specific image blocks in at least part of the image frames other than the first type frame, update part of the image blocks in the long-term reference image based on the specific image block .

FIG. 9 is a schematic flowchart of an encoding method 900 according to an embodiment of the present application. As shown in FIG. 9, the method 900 includes the following steps.

S910. Encode a first type frame (for example, a random access point).

S920. After encoding the first type frame (for example, a random access point), there are still several second type frames that need to be processed before the first type frame. When encoding the second type frame, maintain and continue to maintain the reference image buffer (short-term reference image buffer), and make at least one second type frame refer to the first type frame (including the third type frame, Long-term reference images used by the first type of frame) may also be included.

S930. The remaining second-type frame and / or the fourth-type frame after the second-type frame refer to the long-term reference image constructed by referring to the reconstructed image of the first-type frame during encoding.

S940. After encoding all the frames of the second type, the reference image buffer is maintained, and all short-term reference images whose display order precedes the frames of the first type are deleted.

During the encoding process, long-term reference images can be continuously updated.

In an optional implementation manner, when encoding these second-type frames, maintain and continue to maintain a reference image buffer (short-term reference image buffer), and make all second-type frames refer to the first type when encoding. The long-term reference image used before the frame (including the third type frame and may also include the first type frame). Among them, after encoding all the second type frames, the reference image buffer is maintained, and all short-term reference images whose display order precedes the first type frames are deleted. Replace the original long-term reference image with the long-term reference image constructed from the reconstructed image of the first type of frame. The fourth type frame after the second type frame refers to the long-term reference image constructed by referring to the reconstructed image of the first type frame when encoding. During the encoding process, long-term reference images can be continuously updated.

The following uses specific video sequence coding to describe the coding method in the embodiment of the present application.

In one example, the coding sequence is I0, P8, B4, B2, B1, B6, B5, B7, I16, B12, B10, B9, B11, B14, B13, B15, P24, B20, B18, B17, B19, B19, B21, B23 ...

Among them, I0 and I16 are and are random access points. Assuming I16 is the first type frame, then B12, B10, B9, B14, B13, B13, B15 are the second type frame, P8, B4, B2, B1, B3, B6, B5, B7, etc. can be regarded as the third type frame, P24, B20, B18, B17, B19, B19, B21, B21, B23 ... Fourth type frame.

After the encoding of I0 is completed, the reconstructed image structure of I0 is placed in the reference image buffer. Since there is no frame after I0 in the display order before I0, there is no frame in the reference image buffer before the display order. Use the reference The reconstructed image of I0 in the image buffer constructs a long-term reference image, and continues to encode P8, B4, B1, B3, B6, B5, and B7 after the display order of I0, and updates the reference image buffer and the long-term reference image of the reconstructed image of I0.

After I16 encoding is completed, the reconstructed image of I16 is placed in the reference image buffer, and B12, B9, B11, B14, B13, and B15 before the display order of I16 continue to be encoded. These long-term reference images referenced by the second type frames are long-term reference images constructed by the reconstructed image of I0, and after each encoding of the second type frame is completed, the reconstructed image structure of I0 can be updated according to the encoded second-type frames. Long-term reference image. After the encoding of B15 is completed, the subsequent frames before I16 are not displayed. At this time, the reconstructed image of I16 in the reference image buffer is output as a new long-term reference image, that is, the long-term reference image constructed by the reconstructed image of I16 is replaced. The reconstructed image of I0 constructs a long-term reference image. Clear all short-term reference images in the reference image buffer before the display order I16. Then continue to encode P24, B20, B18, B17, B19, B21, B21, B23 ..., and update the reference image buffer and the long-term reference image constructed by the reconstructed image of I16.

In another example, the coding sequence is I0, P8, B4, B2, B1, B6, B5, B7, I16, B12, B10, B9, B11, B14, B13, B15, P24, B20, B18, B17, B19, B19, B21, B23 ...

Among them, I0 and I16 are and are random access points. Assuming I16 is the first type frame, then B12, B10, B9, B11, B14B13, B15 are the second type frame, P8, B4, B2, B3, B6, B5, B7, etc. can all be regarded as the third type frame, P24, B20, B18, B17, B19, B19, B22, B21, B23 ... Four types of frames.

After the encoding of I0 is completed, the reconstructed image structure of I0 is placed in the reference image buffer. Since there is no frame after I0 in the display order before I0, there is no frame in the reference image buffer before the display order. The reconstructed image of I0 in the image buffer constructs a long-term reference image, and continues to encode P8, B4, B1, B3, B6, B5, and B7 after the display order of I0, and updates the reference image buffer and the long-term reference image of the reconstructed image of I0.

After the encoding of I16 is completed, the reconstructed image of I16 is placed in the reference image buffer, and a portion of the second type frames B12, B10, B9, and B11 that are displayed before I16 are continued to be encoded. These long-term reference images referenced by the second type frame are long-term reference images constructed by the reconstructed image of I0, and after each encoding of the second type frame is completed, the reconstructed image structure of I0 may be updated according to the encoded second-type frame. Long-term reference image. After the encoding of B11 is completed, the reconstructed image of I16 in the reference image buffer is output as a new long-term reference image, that is, the long-term reference image of the reconstructed image of I0 is replaced with the long-term reference image of the reconstructed image of I16. Another part of the second type frame B14, B13, and B15 can be encoded with reference to the reconstructed image of I16 (the long-term reference image constructed by the reconstructed image of I16). In some embodiments, an identifier indicating which frame in the code stream starts to be another part of the second type frame is also encoded, or the codec segment is set to start as the first frame of the other part of the second type frame by default. . After each encoding of the second type frame is completed, the long-term reference image constructed by the reconstructed image of I16 may be updated according to the encoded second type frame. After the encoding of B15 is completed, all short-term reference images whose display order is before I16 in the reference image buffer are cleared. The reconstructed image of I16 in the reference image buffer is output as a new long-term reference image, that is, the long-term reference image constructed by the reconstructed image of I16 is replaced with the long-term reference image constructed by the reconstructed image of I16 again. Then continue to encode P24, B20, B18, B17, B19, B21, B21, B23 ..., and update the reference image buffer and the long-term reference image constructed by the reconstructed image of I16.

The decoding method in the embodiment of the present application will be described in detail from the perspective of the decoding end.

FIG. 10 is a schematic flowchart of a decoding method 1000 at a random access point according to an embodiment of the present application. As shown in FIG. 10, the decoding method 1000 includes the following steps.

S1010: Determine whether the random access function is used at the random access point. When using the random access function, execute 1200 for decoding; when not using the random access function, execute 1100 for decoding.

The specific steps for 1100 and 1200 are expanded below. After completing 1100 or 1200, execute S1020.

S1020. Continue to decode subsequent frames.

FIG. 11 is a schematic flowchart of a decoding method 1100 at a random access point according to an embodiment of the present application. As shown in FIG. 11, 1100 may include the following steps.

S1110. Decode the first type frame. In the decoding order, there are N second type frames that need to be inter-frame decoded after the first type frame. The display order of the N second type frames is before the first type frame. Where N is a positive integer.

S1120: Inter-frame decode at least one second type frame of the N second type frames according to the long-term reference image.

S1130: After completing inter-frame decoding of at least one second type frame among the N second type frames, replace the currently used long-term reference image.

The encoding method in the embodiment of the present application is only replaced after completing the inter-frame encoding of at least one second-type frame among the N second-type frames in which the decoding order is after the first-type frame and the display order is before the first-type frame. The currently used long-term reference picture enables at least one second-type frame to reference the long-term reference picture before the first-type frame, which can improve decoding efficiency.

In some embodiments, the first type frame is a random access point.

In some embodiments, the first type of frame is a clear random access point.

In some embodiments, after completing the inter-frame decoding of at least one second type frame of the N second type frames, S1130 replacing the currently used long-term reference image may include: after completing the N second type frames, After the inter-frame decoding of all the frames of the second type in the type frame, the currently used long-term reference image is replaced.

In some embodiments, the replacing the currently used long-term reference image may include: replacing the currently used long-term reference image with the reconstructed image of the first type frame.

In some embodiments, replacing the currently used long-term reference image includes: after the decoding of the first type frame is completed, placing the reconstructed image of the first type frame into a reference image buffer; After completing inter-frame decoding of at least one second type frame among the N second type frames, output the reconstructed image of the first type frame from the reference image buffer to the first type The reconstructed image of the frame replaces the currently used long-term reference image.

In some embodiments, placing the reconstructed image of the first type frame in a reference image buffer may include: when the reconstructed image of the first type frame does not exist in the reference image buffer, Placing the reconstructed image of the first type of frame into the reference image buffer.

In some embodiments, there is a third type frame that needs to be inter-frame decoded before the first type frame in the decoding order, and the at least one second type of the N second type frames is based on the long-term reference image. Inter-frame decoding of a frame may include: performing inter-frame decoding on at least one second-type frame of the N second-type frames according to the long-term reference image updated after inter-frame decoding of the third-type frame.

In some embodiments, the third type frame may be displayed before the first type frame.

In some embodiments, there is a fourth type frame that needs to be inter-frame decoded after the first type frame in the decoding order, the display order of the fourth type frame is after the first type frame, and the decoding method It also includes: performing inter-frame decoding on the fourth type frame according to the replaced long-term reference image.

In some embodiments, the updated long-term reference image is updated based on a specific image block of the third-type frame after inter-frame decoding of the third-type frame; or, the updated long-term reference image is updated. The reference image is updated based on the specific image block after inter-frame decoding of the specific image block of the third type frame.

In some embodiments, performing inter-frame decoding on at least one second type frame of the N second type frames according to a long-term reference image includes: Perform inter-frame decoding on at least one second type of frame, and when at least part of the second type of frame is decoded, use the specific image block in the at least part of the second type of frame that has been decoded to update the long-term reference image, The long-term reference image of the second-type frame as the long-term reference image of the next type; or, at least one second-type frame of the N second-type frames is inter-frame decoded according to the long-term reference image. When a specific image block of a frame is decoded, the long-term reference image is updated using the decoded specific image block, and the updated long-term reference image is used as a long-term reference image of the next second type frame or as the current second Long-term reference image of a type frame.

In some embodiments, the long-term reference image remains unchanged after the first type frame is decoded and before the next frame of the first type frame is decoded in decoding order.

In some embodiments, the method further comprises: after decoding at least a part of the image frames other than the first type frame, based on a part of the image blocks in the image frame, the part of the image blocks in the long-term reference frame Update; or, after decoding a specific image block in at least a part of the image frame other than the first type frame, update a part of the image block in the long-term reference frame based on the specific image block.

The specific implementation of the decoding method 1100 may correspond to the specific implementation of the encoding method 800, and details are not described herein again.

The following uses specific video sequence coding to describe the decoding method 1100 in the embodiment of the present application without using a random access function.

In one example, the decoding sequence is I0, P8, B4, B2, B1, B6, B5, B7, I16, B12, B10, B9, B11, B14, B13, B15, P24, B20, B18, B17, B19, B19, B21, B23 ...

After the decoding of I0 is completed, the reconstructed image structure of I0 is placed in the reference image buffer. Since there is no frame after I0 in the display order before I0, there is no frame in the reference image buffer before the display order. Use the reference The reconstructed image of I0 in the image buffer constructs a long-term reference image, and continues to decode P8, B2, B1, B3, B6, B5, and B7 after the display order of I0, and updates the long-term reference image constructed by the reference image buffer and the reconstructed image of I0.

After completing the decoding of I16, put the reconstructed image of I16 into the reference image buffer, and continue to decode B12, B9, B11, B14, B14, B13 and B15 before the display order of I16. These long-term reference pictures referenced by the second type frame are long-term reference pictures referenced by the third type frame, and after each decoding of the second type frame is completed, the long-term reference image may be updated according to the decoded second type frame. . After the decoding of B15 is completed, the subsequent frames before I16 are not displayed. At this time, the reconstructed image of I16 in the reference image buffer is output as a new long-term reference image, that is, the long-term reference image constructed by the reconstructed image of I16 is replaced. The current long-term reference image. Clear all short-term reference images in the reference image buffer before the display order I16. Then continue to decode P24, B20, B18, B19, B22, B21, B23 ..., and update the long-term reference image replaced by the reference image buffer and the reconstructed image of I16.

In another example, the decoding sequence is I0, P8, B4, B2, B1, B6, B5, B7, I16, B12, B10, B9, B11, B14, B13, B15, P24, B20, B18, B17, B19, B19, B21, B23 ...

Among them, I0 and I16 are and are random access points. Assuming I16 is the first type frame, then B12, B10, B9, B11B14, B13, B15 are the second type frame, P8, B4, B2, B3, B6, B5, B7, etc. can all be regarded as the third type frame, P24, B20, B18, B17, B19, B19, B22, B21, B23 ... Four types of frames.

After completing the decoding of I0, the reconstructed image of I0 is placed in the reference image buffer. Since there is no frame after I0 in the display order before I0, there is no frame in the reference image buffer before the display order. Use the reference image The reconstructed image of I0 in the buffer constructs a long-term reference image, and continues to decode P8, B2, B1, B3, B6, B5, B7, and B7 after the display order of I0, and updates the long-term reference image constructed by the reference image buffer and the reconstructed image of I0.

After the decoding of I16 is completed, the reconstructed image of I16 is placed in the reference image buffer, and a part of the second type frames B12, B10, B9, and B11 displayed before I16 are continued to be decoded. These long-term reference images referenced by the second type frame are long-term reference images constructed by the reconstructed image of I0, and after each decoding of the second type frame is completed, the reconstructed image structure of I0 may be updated according to the decoded second-type frame. Long-term reference image. After the decoding of B11 is completed, the reconstructed image of I16 in the reference image buffer is output as a new long-term reference image, that is, the long-term reference image of the reconstructed image of I0 is replaced with the long-term reference image of the reconstructed image of I16. Another part of the second type frame B14, B13, and B15 can be decoded by referring to the reconstructed image of I16 (the long-term reference image constructed by the reconstructed image of I16). And after each decoding of the second type frame is completed, the long-term reference image constructed by the reconstructed image of I16 may be updated according to the decoded second type frame. After the decoding of B15 is completed, all short-term reference images in the reference image buffer display order before I16 are cleared, and the reconstructed image of I16 is used again to replace the current long-term reference image. Then continue to decode P24, B20, B18, B17, B19, B21, B21, B23 ..., and update the reference image buffer and the long-term reference image constructed by the reconstructed image of I16.

FIG. 12 is a schematic flowchart of a decoding method 1200 at a random access point according to an embodiment of the present application. As shown in FIG. 12, 1200 may include the following steps.

S1210. It is determined that a random access function is performed on the first type frame, and intra frame decoding is performed on the first type frame. In the decoding order, a display order exists after the first type frame before the first type frame. There are N second type frames, and there is a fourth type frame that needs to be inter-frame decoded after the first type frame and whose display order is after the first type frame.

S1220: Inter-decode the fourth-type frame using a long-term reference image formed from the reconstructed image of the first-type frame, and when each fourth-type frame completes decoding, use the Four types of frames update the long-term reference image, and use the updated long-term reference image as the long-term reference image of the next frame.

In some embodiments, S1220 performing inter-frame decoding on the fourth type of frame using the long-term reference image formed by the reconstructed image of the first type of frame may include: within the frame of completing the first type of frame After decoding, the reconstructed image of the first type frame is put into a reference image buffer; the reconstructed image of the first type frame is output from the reference image buffer, and the reconstruction of the first type frame is used The image is used as a currently used long-term reference image, and the fourth type frame is inter-frame decoded.

The following uses specific video sequence coding to describe the decoding method 1200 using the random access function in the embodiment of the present application.

Among them, I0 and I16 are and are random access points. Assuming I16 is the first type of frame, that is, using the random access function and decoding from I16, then B12, B10, B9, B11, B14, B13, and B15 are the second type of frame, P8, B4, B2, B1, B3, B6, B5, B7, etc. can be regarded as the third type of frame. , P24, B20, B18, B17, B19, B22, B21, B23 ... are frames of the fourth type.

After the decoding of I16 is completed, the reconstructed image of I16 is placed in the reference image buffer, and all short-term reference images in the display order of the reference image buffer before I16 are cleared. Instead of decoding B12, B10, B11, B11, B13, B15, B15, and B15 before the display order of I16, directly decode P24, B20, B18, B17, B19, B21, B21, B23 ..., and update the reference image buffer and the long-term reference image constructed by the reconstructed image of I16.

The method of the embodiment of the present application has been described in detail above, and the encoding device and the decoding device of the embodiment of the present application are described in detail below.

FIG. 13 is a schematic block diagram of an encoding device 1300 according to an embodiment of the present application. As shown in FIG. 13, the encoding device 1300 includes a first encoding module 1310, a second encoding module 1320, and a replacement module 1330. The first encoding module 1310 is configured to encode a first type frame. In the encoding order, there are N second type frames that need to be inter-frame encoded after the first type frame. The display order is before the first type frame, where N is a positive integer. The second encoding module 1320 is configured to perform inter-frame encoding on at least one second type frame of the N second type frames according to the long-term reference image. The replacement module 1330 is configured to replace a currently used long-term reference image after completing inter-frame encoding of at least one second type frame among the N second type frames.

The encoding device in the embodiment of the present application replaces only after completing inter-frame encoding of at least one second type frame among N second type frames in which the encoding order is after the first type frame and the display order is before the first type frame. The currently used long-term reference picture enables at least one second-type frame to reference the long-term reference picture before the first-type frame, which can improve coding efficiency.

In some embodiments, the replacement module 1330 is specifically configured to: after completing the inter-coding of all the second type frames of the N second type frames, replace the currently used long-term reference image.

In some embodiments, the replacement module 1330 is specifically configured to: replace the currently used long-term reference image with the reconstructed image of the first type frame.

In some embodiments, the replacement module 1330 is specifically configured to: after the encoding of the first type frame is completed, place the reconstructed image of the first type frame into a reference image buffer; After the inter-frame encoding of at least one second type frame among the N second type frames, the reconstructed image of the first type frame is output from the reference image buffer, and the reconstruction of the first type frame is performed. The image replaces the currently used long-term reference image.

In some embodiments, the replacing module 1330 placing the reconstructed image of the first type frame in a reference image buffer includes: when the first type frame does not exist in the reference image buffer, When the image is reconstructed, the reconstructed image of the first type frame is placed in the reference image buffer.

In some embodiments, there is a third type of frame that needs to be inter-frame encoded before the first type of frame according to the encoding order. The second encoding module 1320 is specifically configured to: The updated long-term reference image performs inter-frame coding on at least one second type frame among the N second type frames.

In some embodiments, the third type frame is displayed before the first type frame.

In some embodiments, the updated long-term reference image is updated based on a specific image block of the third-type frame after inter-coding of the third-type frame; or, the updated long-term reference image is updated. The reference image is updated based on the specific image block after inter-frame encoding of the specific image block of the third type frame.

In some embodiments, there is a fourth type frame that needs to be inter-frame encoded after the first type frame in the coding order, and the display order of the fourth type frame is after the first type frame. The encoding device 1300 also A third encoding device may be included to perform inter-frame encoding on the fourth type frame according to the replaced long-term reference image.

In some embodiments, the first type frame is a random access point.

In some embodiments, the first type of frame is a clear random access point.

In some embodiments, the second encoding module 1320 is specifically configured to:

Perform inter-frame coding on at least one second-type frame of the N second-type frames according to the long-term reference image, and when at least part of the second-type frame is encoded, use the at least part of the second-type frame of the completed encoding Update a long-term reference image for a specific image block of, and use the updated long-term reference image as a long-term reference image for a next second type frame; or, according to the long-term reference image, at least one of the N second type frames is first The two types of frames are inter-frame encoded. When at least some of the specific image blocks of the second type of frames are encoded, the long-term reference image is updated using the encoded specific image blocks, and the updated long-term reference image is used as the next first The long-term reference image of the second-type frame or the long-term reference image of the current second-type frame.

In some embodiments, the long-term reference image remains unchanged after the first type frame is encoded and before the next frame of the first type frame is encoded in the encoding order.

In some embodiments, the encoding device 1300 is further configured to, after encoding at least a part of the image frames other than the first type frame, perform a part of the long-term reference frame based on a part of the image blocks in the image frame. Update image blocks; or, after encoding specific image blocks in at least part of the image frames other than the first type frame, update some image blocks in the long-term reference frame based on the specific image blocks. The long-term reference image remains unchanged until the next frame is encoded.

FIG. 14 is a schematic block diagram of an encoding device 1400 according to another embodiment of the present application. The encoding device 1400 shown in FIG. 14 may include at least one processor 1410 and at least one memory 1420 for storing computer-executable instructions; at least one processor 1410, alone or collectively, for: accessing the at least one memory 1420 And execute the computer-executable instructions to perform the following operations: encode a first type frame, where there are N second type frames that require inter-frame encoding after the first type frame in the encoding order, so The display order of the N second type frames is before the first type frame, where N is a positive integer; at least one second type frame among the N second type frames is interframed according to the long-term reference image. Encoding; after completing inter-frame encoding of at least one second type frame among the N second type frames, replacing a currently used long-term reference image.

In some embodiments, the processor 1410 is specifically configured to: after completing inter-frame encoding of all the second type frames of the N second type frames, replace the currently used long-term reference image.

In some embodiments, the processor 1410 is specifically configured to: replace the currently used long-term reference image with the reconstructed image of the first type frame.

In some embodiments, the processor 1410 is specifically configured to: after the encoding of the first type frame is completed, place the reconstructed image of the first type frame into a reference image buffer; After the inter-frame encoding of at least one second type frame among the N second type frames, the reconstructed image of the first type frame is output from the reference image buffer, and the reconstruction of the first type frame is performed. The image replaces the currently used long-term reference image.

In some embodiments, the processor 1410 is specifically configured to: when the reconstructed image of the first type frame does not exist in the reference image buffer, place the reconstructed image of the first type frame into the reference image buffer. Reference image buffer.

In some embodiments, there is a third type of frame that needs to be inter-frame encoded before the first type of frame in the coding order, and the processor 1410 is specifically configured to: The updated long-term reference image performs inter-frame coding on at least one second type frame among the N second type frames.

In some embodiments, there is a fourth type frame that needs to be inter-frame encoded after the first type frame in the coding order, and the display order of the fourth type frame is after the first type frame, and the processor 1410 is further configured to perform inter-frame coding on the fourth type frame according to the replaced long-term reference image.

In some embodiments, the first type frame is a random access point.

In some embodiments, the first type of frame is a clear random access point.

In some embodiments, the processor 1410 is specifically configured to perform inter-frame coding on at least one second type frame of the N second type frames according to a long-term reference image, and complete coding on at least a part of the second type frames. When a long-term reference image is updated using a specific image block in the at least part of the second type frame that has been encoded, and the updated long-term reference image is used as a long-term reference image for the next second-type frame; or according to the long-term reference The image performs inter-frame coding on at least one second type frame of the N second type frames, and when at least a part of the specific image block of the second type frame is encoded, the long-term reference is updated by using the specific image block of the completed encoding An image, using the updated long-term reference image as a long-term reference image of a next second type frame or as a long-term reference image of the current second type frame.

In some embodiments, the processor 1410 is further configured to, after encoding at least a part of the image frames other than the first type frame, based on a part of the image blocks in the image frame, part of the image blocks in the long-term reference frame Updating; or, after encoding a specific image block in at least a part of the image frame other than the first type frame, updating a part of the image block in the long-term reference frame based on the specific image block. The long-term reference image remains unchanged until the next frame is encoded.

FIG. 15 is a schematic block diagram of a decoding device 1500 according to an embodiment of the present application. As shown in FIG. 15, the decoding device 1500 includes a first decoding module 1510, a second decoding module 1520, and a replacement module 1530. The first decoding module 1510 is configured to decode a first type frame. In the decoding order, there are N second type frames that need to be inter-frame decoded after the first type frame. The display order is before the first type frame, where N is a positive integer. The second decoding module 1520 is configured to perform inter-frame decoding on at least one second type frame of the N second type frames according to the long-term reference image. The replacement module 1530 is configured to replace a currently used long-term reference image after completing inter-frame decoding of at least one second type frame among the N second type frames.

The decoding device in this embodiment of the present application replaces only after completing inter-frame decoding of at least one second-type frame among N second-type frames in which the decoding order is after the first-type frame and the display order is before the first-type frame. The currently used long-term reference picture enables at least one second-type frame to reference the long-term reference picture before the first-type frame, which can improve decoding efficiency.

In some embodiments, the replacement module 1530 is specifically configured to: after completing the inter-frame decoding of all the second type frames of the N second type frames, replace the currently used long-term reference image.

In some embodiments, the replacement module 1530 is specifically configured to: replace the currently used long-term reference image with the reconstructed image of the first type frame.

In some embodiments, the replacement module 1530 is specifically configured to: after the decoding of the first type frame is completed, place the reconstructed image of the first type frame into a reference image buffer; After the inter-frame decoding of at least one second type frame among the N second type frames, output the reconstructed image of the first type frame from the reference image buffer, and reconstruct the first type frame The image replaces the currently used long-term reference image.

In some embodiments, the replacing module 1530 placing the reconstructed image of the first type frame in a reference image buffer includes: when the first type frame does not exist in the reference image buffer, When the image is reconstructed, the reconstructed image of the first type frame is placed in the reference image buffer.

In some embodiments, there is a third type of frame that needs to be inter-frame decoded before the first type of frame according to the decoding order, and the second decoding module 1520 is specifically configured to: The updated long-term reference image performs inter-frame decoding on at least one second type frame among the N second type frames.

In some embodiments, there is a fourth type frame that requires inter-frame decoding after the first type frame in the decoding order, and the display order of the fourth type frame is after the first type frame. The decoding device 1500 also A third decoding device may be included to perform inter-frame decoding on the fourth type frame according to the replaced long-term reference image.

In some embodiments, the first type frame is a random access point.

In some embodiments, the first type of frame is a clear random access point.

In some embodiments, the second decoding module 1520 is specifically configured to:

Perform inter-frame decoding on at least one second-type frame of the N second-type frames according to the long-term reference image, and when at least part of the second-type frame completes decoding, use the at least part of the second-type frame that has been decoded. Update a long-term reference image for a specific image block of, and use the updated long-term reference image as a long-term reference image for a next second type frame; or, according to the long-term reference image, at least one of the N second type frames is first The two types of frames are inter-frame decoded. When at least some of the specific image blocks of the second type of frames are decoded, the decoded specific image blocks are used to update the long-term reference image, and the updated long-term reference image is used as the next first The long-term reference image of the second-type frame or the long-term reference image of the current second-type frame.

In some embodiments, the decoding device 1500 is further configured to, after decoding at least a part of the image frame other than the first type frame, perform a part of the long-term reference frame based on a part of the image block in the image frame. Update the image block; or, after decoding a specific image block in at least a part of the image frame other than the first type frame, update a part of the image block in the long-term reference frame based on the specific image block. The long-term reference image remains unchanged until the next frame is decoded.

FIG. 16 is a schematic block diagram of a decoding device 1600 according to another embodiment of the present application. The decoding device 1600 shown in FIG. 16 may include at least one processor 1610 and at least one memory 1620 for storing computer-executable instructions; at least one processor 1610, alone or collectively, for: accessing the at least one memory 1620 And execute the computer-executable instructions to perform the following operations: decoding a first type frame, where there are N second type frames that require inter-frame decoding after the first type frame in the decoding order, so The display order of the N second type frames is before the first type frame, where N is a positive integer; at least one second type frame among the N second type frames is interframed according to the long-term reference image. Decoding; after completing inter-frame decoding of at least one second type frame among the N second type frames, replacing a currently used long-term reference image.

In some embodiments, the processor 1610 is specifically configured to: after completing inter-frame decoding of all the second type frames of the N second type frames, replace the currently used long-term reference image.

In some embodiments, the processor 1610 is specifically configured to: replace the currently used long-term reference image with the reconstructed image of the first type frame.

In some embodiments, the processor 1610 is specifically configured to: after completing decoding the first type frame, place the reconstructed image of the first type frame into a reference image buffer; After the inter-frame decoding of at least one second type frame among the N second type frames, output the reconstructed image of the first type frame from the reference image buffer, and reconstruct the first type frame The image replaces the currently used long-term reference image.

In some embodiments, the processor 1610 is specifically configured to: when the reconstructed image of the first type frame does not exist in the reference image buffer, place the reconstructed image of the first type frame into the reference image buffer. Reference image buffer.

In some embodiments, there is a third type of frame that requires inter-frame decoding before the first type of frame in the decoding order, and the processor 1610 is specifically configured to: The updated long-term reference image performs inter-frame decoding on at least one second type frame among the N second type frames.

In some embodiments, there is a fourth type frame that needs to be inter-frame decoded after the first type frame in the decoding order, and the display order of the fourth type frame is after the first type frame, and the processor 1610 is further configured to perform inter-frame decoding on the fourth type frame according to the replaced long-term reference image.

In some embodiments, the first type frame is a random access point.

In some embodiments, the first type of frame is a clear random access point.

In some embodiments, the processor 1610 is specifically configured to perform inter-frame decoding on at least one second type frame of the N second type frames according to a long-term reference image, and complete decoding on at least a part of the second type frames. The long-term reference image is updated by using the specific image block in the at least part of the second-type frame that has been decoded, and the updated long-term reference image is used as the long-term reference image of the next second-type frame; or according to the long-term reference The image performs inter-frame decoding on at least one second type frame of the N second type frames, and when at least part of the specific image block of the second type frame completes decoding, the long-term reference is updated by using the specific image block that has been decoded. An image, using the updated long-term reference image as a long-term reference image of a next second type frame or as a long-term reference image of the current second type frame.

In some embodiments, the processor 1610 is further configured to, after decoding at least a part of the image frames other than the first type frame, based on a part of the image blocks in the image frame, for a part of the image blocks in the long-term reference frame Update; or, after decoding a specific image block in at least a part of the image frame other than the first type frame, update a part of the image block in the long-term reference frame based on the specific image block. The long-term reference image remains unchanged until the next frame is decoded.

It should be understood that the devices of the embodiments of the present application may be implemented based on a memory and a processor. Each memory is used to store instructions for executing the method of the embodiments of the application. The processor executes the foregoing instructions, so that the device executes the embodiments of the application. Methods.

It should be understood that the processor mentioned in the embodiments of the present application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSPs), and application-specific integrated circuits (DSPs). application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), or Erase programmable read-only memory (EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM ) And direct memory bus random access memory (direct RAMbus RAM, DR RAM).

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA, or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.

It should be noted that the memories described herein are intended to include, but are not limited to, these and any other suitable types of memory.

An embodiment of the present application further provides a computer-readable storage medium having instructions stored thereon. When the instructions are run on the computer, the computer is caused to execute the methods of the foregoing method embodiments.

An embodiment of the present application further provides a computer program, which causes a computer to execute the methods of the foregoing method embodiments.

An embodiment of the present application further provides a computing device, where the computing device includes the computer-readable storage medium described above.

The embodiments of the present application can be applied in the field of aircraft, especially in the field of drones.

It should be understood that the division of circuits, sub-circuits, and sub-units in the embodiments of the present application is merely schematic. Those of ordinary skill in the art may realize that the circuits, sub-circuits, and sub-units of the examples described in the embodiments disclosed herein can be split or combined again.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When computer instructions are loaded and executed on a computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a web site, computer, server, or data center via a wired (e.g., Coaxial cable, optical fiber, digital subscriber line (DSL), or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server, or data center. A computer-readable storage medium may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like, which contains one or more available media integrations. Usable media may be magnetic media (for example, floppy disks, hard disks, magnetic tapes), optical media (for example, high-density digital video discs (DVDs)), or semiconductor media (for example, solid state disks (SSDs) )Wait.

It should be understood that each embodiment of the present application is described by taking a total bit width of 16 bits as an example, and the embodiments of the present application may be applicable to other bit widths.

It should be understood that “an embodiment” or “an embodiment” mentioned throughout the specification means that a particular feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Thus, the appearances of "in one embodiment" or "in an embodiment" appearing throughout the specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the above processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application. The implementation process constitutes any limitation.

It should be understood that in the embodiment of the present application, “B corresponding to A” means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B based on A does not mean determining B based solely on A, but also determining B based on A and / or other information.

It should be understood that the term “and / or” herein is only an association relationship describing an associated object, and indicates that there can be three kinds of relationships, for example, A and / or B can mean: A exists alone, and A and B exist simultaneously. There are three cases of B alone. In addition, the character "/" in this text generally indicates that the related objects are an "or" relationship.

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in combination with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A professional technician can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.

The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

An encoding method, comprising:

Encoding a first type frame, where there are N second type frames that need to be inter-frame encoded after the first type frame in the encoding order, and the display order of the N second type frames is in the first Before type frames, where N is a positive integer;

Performing inter-frame coding on at least one second type frame of the N second type frames according to a long-term reference image;

After the inter-frame encoding of at least one second type frame among the N second type frames is completed, the currently used long-term reference image is replaced.
The encoding method according to claim 1, wherein after the completion of inter-frame encoding of at least one second type frame of the N second type frames, replacing the currently used long-term reference image comprises:

After the inter-frame encoding of all the second type frames in the N second type frames is completed, the currently used long-term reference image is replaced.
The encoding method according to claim 1 or 2, wherein the replacing a long-term reference image currently used comprises:

Replace the currently used long-term reference image with the reconstructed image of the first type of frame.
The encoding method according to claim 3, wherein the replacing a currently used long-term reference image comprises:

After the encoding of the first type frame is completed, placing the reconstructed image of the first type frame in a reference image buffer;

After completing inter-frame encoding of at least one second type frame among the N second type frames, outputting a reconstructed image of the first type frame from the reference image buffer, and using the first The reconstructed image of the type frame replaces the currently used long-term reference image.
The encoding method according to claim 4, wherein the placing the reconstructed image of the first type frame in a reference image buffer comprises:

When the reconstructed image of the first type frame does not exist in the reference image buffer, the reconstructed image of the first type frame is placed in the reference image buffer.
The encoding method according to any one of claims 1 to 5, characterized in that, there is a third type frame that needs to be inter-frame encoded before the first type frame in the encoding order,

Inter-coding the at least one second type frame of the N second type frames according to the long-term reference image includes:

Performing inter-frame coding on at least one second-type frame of the N second-type frames according to the long-term reference image updated after inter-coding of the third-type frame.
The encoding method according to claim 6, wherein the third type frame is displayed before the first type frame.
The encoding method according to claim 6, wherein the updated long-term reference image is updated based on a specific image block of the third type frame after inter-frame encoding of the third type frame;

or,

The updated long-term reference image is updated based on the specific image block after inter-frame encoding of the specific image block of the third type frame.
The encoding method according to any one of claims 1 to 8, characterized in that, in the encoding order, there is a fourth type frame that requires inter-frame encoding after the first type frame, and display of the fourth type frame After the sequence of the first type of frames, the encoding method further includes:

Inter-coding the fourth type frame according to the replaced long-term reference image.
The encoding method according to any one of claims 1 to 9, wherein the first type frame is a random access point.
The encoding method according to claim 10, wherein the first type frame is a clear random access point.
The encoding method according to any one of claims 1 to 11, wherein the performing inter-frame encoding on at least one second type frame of the N second type frames according to a long-term reference image includes:

Perform inter-frame coding on at least one second-type frame of the N second-type frames according to the long-term reference image, and when at least part of the second-type frame is encoded, use the at least part of the second-type frame of the completed encoding Updating a long-term reference image with a specific image block of, and using the updated long-term reference image as a long-term reference image for a next type 2 frame;

or,

Perform inter-frame encoding on at least one second type frame of the N second type frames according to a long-term reference image, and when at least part of the specific image block of the second type frame is encoded, use the specific image block that has been encoded The long-term reference image is updated, and the updated long-term reference image is used as a long-term reference image of a next second type frame or as a long-term reference image of the current second type frame.
The encoding method according to any one of claims 1 to 12, wherein after the first type frame is encoded and before the next frame of the first type frame is encoded in the encoding order, the long-term The reference image remains unchanged.
The encoding method according to any one of claims 1 to 13, wherein the encoding method further comprises:

After encoding at least part of the image frames other than the first type frame, updating part of the image blocks in the long-term reference frame based on the part of image blocks in the image frame;

or,

After encoding a specific image block in at least a part of the image frame other than the first type frame, updating a part of the image block in the long-term reference frame based on the specific image block.
A decoding method, comprising:

Decoding a first type frame, where there are N second type frames that need to be inter-frame decoded after the first type frame in decoding order, and the display order of the N second type frames is in the first Before type frames, where N is a positive integer;

Performing inter-frame decoding on at least one second type frame of the N second type frames according to a long-term reference image;

After the inter-frame decoding of at least one second type frame among the N second type frames is completed, the currently used long-term reference image is replaced.
The decoding method according to claim 15, wherein after the completion of inter-frame decoding of at least one second type frame of the N second type frames, replacing the currently used long-term reference image comprises:

After the inter-frame decoding of all the second type frames in the N second type frames is completed, the currently used long-term reference image is replaced.
The decoding method according to claim 15 or 16, wherein the replacing a currently used long-term reference image comprises:

Replace the currently used long-term reference image with the reconstructed image of the first type of frame.
The decoding method according to claim 17, wherein the replacing a long-term reference image currently used comprises:

After the decoding of the first type frame is completed, placing the reconstructed image of the first type frame into a reference image buffer;

After completing the inter-frame decoding of at least one second type frame among the N second type frames, output the reconstructed image of the first type frame from the reference image buffer, and use the first The reconstructed image of the type frame replaces the currently used long-term reference image.
The decoding method according to claim 18, wherein the putting the reconstructed image of the first type frame into a reference image buffer comprises:

When the reconstructed image of the first type frame does not exist in the reference image buffer, the reconstructed image of the first type frame is placed in the reference image buffer.
The decoding method according to any one of claims 15 to 19, wherein there is a third type frame that needs to be inter-frame decoded before the first type frame in the decoding order,

The inter-frame decoding of at least one second type frame of the N second type frames according to the long-term reference image includes:

Performing inter-frame decoding on at least one second-type frame of the N second-type frames according to the long-term reference image updated after inter-frame decoding of the third-type frame.
The decoding method according to claim 20, wherein the third type frame is displayed before the first type frame.
The decoding method according to claim 20, wherein the updated long-term reference image is updated based on a specific image block of the third type frame after inter-frame decoding of the third type frame;

or,

The updated long-term reference image is updated based on the specific image block after inter-frame decoding of the specific image block of the third type frame.
The decoding method according to any one of claims 15 to 22, characterized in that, in the decoding order, there is a fourth type frame that requires inter-frame decoding after the first type frame, and the display of the fourth type frame After the sequence of the first type of frames, the decoding method further includes:

Performing inter-frame decoding on the fourth type frame according to the replaced long-term reference image.
The decoding method according to any one of claims 15 to 23, wherein the first type frame is a random access point.
The decoding method according to claim 24, wherein the first type frame is a clear random access point.
The decoding method according to any one of claims 15 to 25, wherein the performing inter-frame decoding on at least one second type frame of the N second type frames according to a long-term reference image comprises:

Perform inter-frame decoding on at least one second-type frame of the N second-type frames according to the long-term reference image, and when at least part of the second-type frame completes decoding, use the at least part of the second-type frame that has been decoded Updating a long-term reference image with a specific image block of, and using the updated long-term reference image as a long-term reference image for a next type 2 frame;

or,

Perform inter-frame decoding on at least one second-type frame of the N second-type frames according to a long-term reference image, and when the decoding of a specific image block of at least part of the second-type frame is completed, use the specific image block that has been decoded The long-term reference image is updated, and the updated long-term reference image is used as a long-term reference image of a next second type frame or as a long-term reference image of the current second type frame.
The decoding method according to any one of claims 15 to 26, wherein after the first type frame is decoded and before the next frame of the first type frame is decoded in the decoding order, the long-term The reference image remains unchanged.
The decoding method according to any one of claims 15 to 27, wherein the decoding method further comprises:

After decoding at least part of the image frames other than the first type frame, updating part of the image blocks in the long-term reference frame based on the part of image blocks in the image frame;

or,

After decoding a specific image block in at least a part of the image frame other than the first type frame, updating a part of the image block in the long-term reference frame based on the specific image block.
An encoding device, comprising:

At least one memory for storing computer-executable instructions;

At least one processor, individually or collectively, for accessing the at least one memory and executing the computer-executable instructions to perform the following operations:

Encoding a first type frame, where there are N second type frames that need to be inter-frame encoded after the first type frame in the encoding order, and the display order of the N second type frames is in the first Before type frames, where N is a positive integer;

Performing inter-frame coding on at least one second type frame of the N second type frames according to a long-term reference image;

After the inter-frame encoding of at least one second type frame among the N second type frames is completed, the currently used long-term reference image is replaced.
The encoding device according to claim 29, wherein the processor is specifically configured to:

After the inter-frame encoding of all the second type frames in the N second type frames is completed, the currently used long-term reference image is replaced.
The encoding device according to claim 29 or 30, wherein the processor is specifically configured to:

Replace the currently used long-term reference image with the reconstructed image of the first type of frame.
The encoding device according to claim 31, wherein the processor is specifically configured to:

After the encoding of the first type frame is completed, placing the reconstructed image of the first type frame in a reference image buffer;

After completing inter-frame encoding of at least one second type frame among the N second type frames, outputting a reconstructed image of the first type frame from the reference image buffer, and using the first The reconstructed image of the type frame replaces the currently used long-term reference image.
The encoding device according to claim 32, wherein the processor is specifically configured to:

When the reconstructed image of the first type frame does not exist in the reference image buffer, the reconstructed image of the first type frame is placed in the reference image buffer.
The encoding device according to any one of claims 29 to 33, wherein a third type frame that needs to be inter-frame encoded exists before the first type frame in the encoding order,

The processor is specifically configured to:

Performing inter-frame coding on at least one second-type frame of the N second-type frames according to the long-term reference image updated after inter-coding of the third-type frame.
The encoding device according to claim 34, wherein the third type frame is displayed before the first type frame.
The encoding device according to claim 34, wherein the updated long-term reference image is updated based on a specific image block of the third type frame after inter encoding of the third type frame;

or,

The updated long-term reference image is updated based on the specific image block after inter-frame encoding of the specific image block of the third type frame.
The encoding device according to any one of claims 29 to 36, characterized in that, in the encoding order, there is a fourth type frame that needs to be inter-frame encoded after the first type frame, and the display of the fourth type frame is After the sequence of the first type frame, the processor is further configured to:

Inter-coding the fourth type frame according to the replaced long-term reference image.
The encoding device according to any one of claims 29 to 37, wherein the first type frame is a random access point.
The encoding device according to claim 38, wherein the first type frame is a clear random access point.
The encoding device according to any one of claims 29 to 39, wherein the processor is specifically configured to:

Perform inter-frame coding on at least one second-type frame of the N second-type frames according to the long-term reference image, and when at least part of the second-type frame is encoded, use the at least part of the second-type frame of the completed encoding Updating a long-term reference image with a specific image block of, and using the updated long-term reference image as a long-term reference image for a next type 2 frame;

or,

Perform inter-frame encoding on at least one second type frame of the N second type frames according to a long-term reference image, and when at least part of the specific image block of the second type frame is encoded, use the specific image block that has been encoded The long-term reference image is updated, and the updated long-term reference image is used as a long-term reference image of a next second type frame or as a long-term reference image of the current second type frame.
The encoding device according to any one of claims 29 to 40, wherein after the first type frame is encoded and before the next frame of the first type frame is encoded in the encoding order, the long-term The reference image remains unchanged.
The encoding device according to any one of claims 29 to 41, wherein the processor is further configured to:

After encoding at least part of the image frames other than the first type frame, updating part of the image blocks in the long-term reference frame based on the part of image blocks in the image frame;

or,

After encoding a specific image block in at least a part of the image frame other than the first type frame, updating a part of the image block in the long-term reference frame based on the specific image block.
A decoding device, comprising:

At least one memory for storing computer-executable instructions;

At least one processor, individually or collectively, for accessing the at least one memory and executing the computer-executable instructions to perform the following operations:

Decoding a first type frame, where there are N second type frames that need to be inter-frame decoded after the first type frame in decoding order, and the display order of the N second type frames is in the first Before type frames, where N is a positive integer;

Performing inter-frame decoding on at least one second type frame of the N second type frames according to a long-term reference image;

After the inter-frame decoding of at least one second type frame among the N second type frames is completed, the currently used long-term reference image is replaced.
The decoding device according to claim 43, wherein the processor is specifically configured to:

After the inter-frame decoding of all the second type frames in the N second type frames is completed, the currently used long-term reference image is replaced.
The decoding device according to claim 43 or 44, wherein the processor is specifically configured to:

Replace the currently used long-term reference image with the reconstructed image of the first type of frame.
The decoding device according to claim 45, wherein the processor is specifically configured to:

After the decoding of the first type frame is completed, placing the reconstructed image of the first type frame into a reference image buffer;

After completing the inter-frame decoding of at least one second type frame among the N second type frames, output the reconstructed image of the first type frame from the reference image buffer, and use the first The reconstructed image of the type frame replaces the currently used long-term reference image.
The decoding device according to claim 46, wherein the processor is specifically configured to:

When the reconstructed image of the first type frame does not exist in the reference image buffer, the reconstructed image of the first type frame is placed in the reference image buffer.
The decoding device according to any one of claims 43 to 47, wherein a third type frame that needs to be inter-frame decoded exists before the first type frame in the decoding order,

The processor is specifically configured to:

Performing inter-frame decoding on at least one second-type frame of the N second-type frames according to the long-term reference image updated after inter-frame decoding of the third-type frame.
The decoding device according to claim 48, wherein the third type frame is displayed before the first type frame.
The decoding device according to claim 48, wherein the updated long-term reference image is updated based on a specific image block of the third type frame after inter-frame decoding of the third type frame;

or,

The updated long-term reference image is updated based on the specific image block after inter-frame decoding of the specific image block of the third type frame.
The decoding device according to any one of claims 43 to 50, wherein a fourth type frame that needs to be inter-frame decoded exists after the first type frame in decoding order, and the display of the fourth type frame is After the sequence of the first type frame, the processor is further configured to:

Performing inter-frame decoding on the fourth type frame according to the replaced long-term reference image.
The decoding device according to any one of claims 43 to 51, wherein the first type frame is a random access point.
The decoding device according to claim 52, wherein the first type frame is a clear random access point.
The decoding device according to any one of claims 43 to 53, wherein the processor is specifically configured to:

Perform inter-frame decoding on at least one second-type frame of the N second-type frames according to the long-term reference image, and when at least part of the second-type frame completes decoding, use the at least part of the second-type frame that has been decoded Updating a long-term reference image with a specific image block of, and using the updated long-term reference image as a long-term reference image for a next type 2 frame;

or,

Perform inter-frame decoding on at least one second-type frame of the N second-type frames according to a long-term reference image, and when the decoding of a specific image block of at least part of the second-type frame is completed, use the specific image block that has been decoded The long-term reference image is updated, and the updated long-term reference image is used as a long-term reference image of a next second type frame or as a long-term reference image of the current second type frame.
The decoding device according to any one of claims 43 to 54, wherein after the first type frame is decoded and before the next frame of the first type frame is decoded in the decoding order, the long-term The reference image remains unchanged.
The decoding device according to any one of claims 43 to 55, wherein the processor is further configured to:

After decoding at least part of the image frames other than the first type frame, updating part of the image blocks in the long-term reference frame based on the part of image blocks in the image frame;

or,

After decoding a specific image block in at least a part of the image frame other than the first type frame, updating a part of the image block in the long-term reference frame based on the specific image block.
A computer-readable storage medium, characterized in that instructions are stored thereon, and when the instructions are run on a computer, the computer is caused to execute the encoding method according to any one of claims 1 to 14.
A computer-readable storage medium, characterized in that instructions are stored thereon, and when the instructions are run on a computer, the computer is caused to execute the decoding method according to any one of claims 15 to 28.