WO2021196238A1

WO2021196238A1 - Video processing method, video processing device, and computer-readable storage medium

Info

Publication number: WO2021196238A1
Application number: PCT/CN2020/083376
Authority: WO
Inventors: 郑萧桢; 王苏红; 马思伟; 王苫社
Original assignee: 深圳市大疆创新科技有限公司; 北京大学
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2021-10-07
Also published as: CN112868235A

Abstract

A video processing method, a video processing device, and a computer-readable storage medium. The video processing method comprises: when a first image block of a current frame satisfies a preset criterion identified in a high-level syntax, utilizing HMVP to construct a first motion information candidate list for the first image block; and, when the first image block of the current frame does not satisfy the preset criterion identified in the high-level syntax, utilizing motion information of a spatially adjacent block and HMVP to construct a second motion information candidate list for the first image block. The employment of the embodiments of the present application reduces the complexity of the process of constructing the motion information candidate lists and increases the efficiency of image coding and image decoding.

Description

Video processing method, video processing equipment and computer readable storage medium

Technical field

This application relates to the field of communication technology, and in particular to a video processing method, video processing device, and computer-readable storage medium.

Background technique

Video data is a continuous image sequence, composed of continuous frames, one frame is one image. Video data has a strong correlation, which means that there is a lot of redundant information. The redundant information can be divided into spatial redundant information and temporal redundant information. Image coding is to remove the redundant information of each frame image in the video data (that is, to remove the correlation between the data) to obtain the coded image. Image decoding is to get the original image based on the encoded image. However, the traditional image coding and decoding methods need to use the motion information of the spatial neighboring blocks of the image block to construct a motion information candidate list for the image block, which leads to a complicated construction process of the motion information candidate list and reduces the efficiency of image coding and image decoding.

Summary of the invention

The embodiments of the application provide a video processing method, a video processing device, and a computer-readable storage medium. When an image block satisfies a preset condition, there is no need to use the motion information of spatial neighboring blocks to construct a motion information candidate list for the image block. Therefore, the complexity of the construction process of the motion information candidate list is reduced, and the efficiency of image coding and image decoding is improved, so that the process of constructing the motion information candidate list by the image blocks that meet the preset conditions can be performed in parallel.

In the first aspect, an embodiment of the present application provides a video processing method, and the video processing method includes:

When the first image block of the current frame satisfies the preset condition identified in the high-level grammar, construct a first motion information candidate list for the first image block by using HMVP;

When the first image block of the current frame does not meet the preset condition identified in the high-level grammar, the motion information of the neighboring blocks in the spatial domain and the HMVP are used to construct a second motion information candidate list for the first image block.

In the second aspect, an embodiment of the present application provides another video processing method, and the video processing method includes:

When the first image block of the current frame meets a preset condition, construct a first motion information candidate list for the first image block by using HMVP;

When the first image block of the current frame does not satisfy the preset condition, a second motion information candidate list is constructed for the first image block by using the motion information of the neighboring blocks in the spatial domain and the HMVP.

In a third aspect, an embodiment of the present application provides a video processing device, the video processing device includes a memory and a processor, wherein:

The memory is used to store a computer program, and the computer program includes program instructions;

The processor, calling program instructions, is used to perform the following steps:

In a fourth aspect, an embodiment of the present application provides a video processing device. The video processing device includes a memory and a processor, wherein:

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program. The computer program includes program instructions that, when executed by a processor, cause the The processor executes the video processing method as described in the first aspect.

In the sixth aspect, the embodiments of the present application provide another computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause all The processor executes the video processing method described in the second aspect.

In the embodiment of this application, when a certain image block in the current frame meets a preset condition, the HMVP can be used to construct a motion information candidate list for the image block, and there is no need to use the motion information of the spatial neighboring blocks to construct a motion information candidate for the image block. Therefore, the complexity of the construction process of the motion information candidate list is reduced, and the efficiency of image coding and image decoding is improved.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.

FIG. 1 is a schematic diagram of a codec system framework provided by an embodiment of the present application;

Fig. 2 is a schematic diagram of an image block provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a framework of an encoder provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of the architecture of a video processing system provided by an embodiment of the present application;

FIG. 5 is a schematic flowchart of a video processing method provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a traditional construction of a motion information candidate list provided by an embodiment of the present application;

FIG. 7 is a schematic flowchart of another video processing method provided by an embodiment of the present application;

FIG. 8 is a schematic flowchart of another video processing method provided by an embodiment of the present application;

FIG. 9 is a schematic flowchart of another video processing method provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a video processing device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

The video processing method proposed in the embodiment of the present invention can be applied to a video processing device. The video processing device can be set on a smart terminal (such as a mobile phone, a tablet computer, etc.), and the video processing device can be used on an encoding end or a decoding end. It can be an encoder or a decoder. In some embodiments, the embodiments of the present invention can be applied to aircraft (such as drones). In other embodiments, the embodiments of the present invention can also be applied to other movable platforms (such as unmanned ships, unmanned vehicles). , Robots, etc.), the embodiment of the present invention does not make specific limitations.

Specifically, Figure 1 can be used as an example to illustrate the coding and decoding system framework. Figure 1 is an architecture diagram of a coding and decoding system. As shown in FIG. 1, the system 100 can receive the data 102 to be processed, process the data 102 to be processed, and generate processed data 108. For example, the system 100 may receive the data to be encoded and encode the data to be encoded to generate encoded data, or the system 100 may receive the data to be decoded and decode the data to be decoded to generate decoded data. In some embodiments, the components in the system 100 may be implemented by one or more processors. The processor may be a processor in a computing device or a processor in a mobile device (such as a drone). The processor may be any type of processor, which is not limited in the embodiment of the present invention. In some possible designs, the processor may include an encoder, a decoder, or a codec, etc. One or more memories may also be included in the system 100. The memory can be used to store instructions and data, for example, computer-executable instructions that implement the technical solutions of the embodiments of the present invention, to-be-processed data 102, processed data 108, and so on. The memory may be any type of memory, which is not limited in the embodiment of the present invention.

The data to be encoded may include text, images, graphic objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensor data from sensors, which may be vision sensors (for example, cameras, infrared sensors), microphones, near-field sensors (for example, ultrasonic sensors, radars), position sensors, and temperature sensors. Sensors, touch sensors, etc. In some cases, the data to be encoded may include information from the user, for example, biological information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA sampling, and the like.

Generally speaking, a video is a continuous image sequence composed of continuous frames, and one frame is an image. For a frame of image, the image can be divided into multiple coding regions (Coding Tree Unit, CTU), and the size of each CTU is the same, and the size of the CTU is, for example, 64x64 or 128x128. Each CTU may be further divided into multiple coding units (Coding Unit, CU). Exemplarily, the shape of the CU may be a square or a rectangle. For ease of understanding, the CU is an image block as an example for description, and the image block mentioned below is the CU. Taking the schematic diagram of the image block shown in Figure 2 as an example, the image shown in Figure 2 is composed of 4 CTUs, and each CTU is composed of multiple image blocks. Illustratively, the size of each image block contained in the image can be Completely different or partly the same.

In an embodiment, the frame diagram of the encoder can be specifically illustrated in FIG. 3, which is a frame diagram of an encoder. The following will exemplarily introduce the process of inter-frame coding with reference to FIG. 3.

As shown in Figure 3, the process of inter-frame encoding and decoding can be as follows:

In 301, the current frame image is acquired. In 302, a reference frame image is obtained. In 303a, a reference frame image is used to perform motion estimation to obtain a motion vector (Motion Vector, MV) of each image block of the current frame image. In 304a, the motion vector obtained by the motion estimation is used to perform motion compensation to obtain the estimated value of the current image block. In 305, the estimated value of the current image block is subtracted from the current image block to obtain the residual. In 306, the residual is transformed to obtain transform coefficients. In 307, the transform coefficient is quantized to obtain the quantized coefficient. In 308, the quantized coefficients are subjected to entropy coding, and finally the bit stream obtained by entropy coding and the coding mode information after coding are stored or sent to the decoding end. In 309, the quantization result is dequantized. In 310, the inverse quantization result is inversely transformed. In 311, the reconstructed pixels are obtained by using the inverse transform result and the motion compensation result. In 312, the reconstructed pixels are filtered. In 313, the filtered reconstructed pixels are output.

As shown in Figure 3, the intra-frame encoding and decoding process can be as follows:

In 302, the current frame image is obtained. In 303b, intra-frame prediction selection is performed on the current frame image. In 304b, the current image block in the current frame performs intra-frame prediction. In 305, the estimated value of the current image block is subtracted from the current image block to obtain the residual. In 306, the residual of the image block is transformed to obtain transform coefficients. In 307, the transform coefficient is quantized to obtain the quantized coefficient. In 308, the quantized coefficients are entropy coded, and finally the bit stream obtained by entropy coding and the coded coding mode information are stored or sent to the decoding end. In 309, the quantization result is dequantized. In 310, the inverse quantization result is inversely transformed, and in 311, the inverse transform result and the intra-frame prediction result are used to obtain reconstructed pixels. In 312, the reconstructed pixels are filtered. In 313, the filtered reconstructed pixels are output.

As shown in Figure 3, in the encoding process, in order to remove redundancy, the image can be predicted. Different images in the video can use different prediction methods. According to the prediction method adopted by the image, the image can be divided into an intra-frame prediction image and an inter-frame prediction image. Among them, the inter-frame prediction refers to the use of the correlation in the time domain of the video and the correlation in the real-time domain to predict the pixels of the current image using the pixels adjacent to the encoded image to achieve the purpose of effectively removing the redundant information in the video time domain. Due to the high similarity between consecutive frames (strong real-time domain correlation), in order to facilitate storage and transmission, inter-frame prediction can be used to encode and compress the original video to remove redundancy in the time dimension.

In addition to the prediction mode described above, it can also include intra block copy (IBC) technology. IBC refers to the use of spatial correlation in the same frame of image, that is, spatial correlation. The pixels of the coded CU predict the pixels of the current CU that needs to be coded, so as to achieve the purpose of effectively removing redundant information in the image space. There are repeated textures in the same frame, which has strong spatial correlation. In order to facilitate storage and transmission, the original image can be encoded and compressed using IBC to remove the redundancy of the spatial dimension.

Inter prediction and IBC may include merge (Merge) mode and non-Merge mode (for example, advanced motion vector prediction mode, Advanced Motion Vector Prediction, AMVP). The feature of the Merge mode is that the MV of the image block is equal to the prediction MV (Motion Vector Prediction, MVP), and there is no need to transmit the motion vector difference (MVD) in the code stream. It only needs to pass the MVP index and the reference frame index to Just the decoder. The feature of the non-Merge mode is that MVD, MVP index, and reference frame index need to be transmitted in the code stream to the decoder.

Exemplarily, for the Merge mode, the motion vector prediction (MVP) can be determined first, and the MVP can be directly determined as the MV. Among them, in order to obtain the MVP, an MVP candidate list (merge candidate list) can be constructed first In the MVP candidate list, at least one candidate MVP can be included. Each candidate MVP can have an index. After selecting the MVP from the MVP candidate list, the encoder can write the MVP index into the code stream, and then decode The terminal can find the MVP corresponding to the index from the MVP candidate list according to the index, so as to realize the decoding of the image block.

In order to understand the Merge mode more clearly, the following will introduce the operation process of using the Merge mode for encoding.

Step 1: Obtain the MVP candidate list;

Step 2: Select an optimal MVP from the MVP candidate list, and at the same time obtain the index of the MVP in the MVP candidate list;

Step 3: Use the MVP as the MV of the current block;

Step 4: Determine the position of the reference block (also called the prediction block) in the reference frame image according to the MV;

Step 5. The current block is subtracted from the reference block to obtain residual data;

Step 6. Pass the residual data and the index of the MVP to the decoding end.

It should be understood that the above process is only a specific implementation of the Merge mode. Merge mode can also have other implementations.

In the current coding and decoding standards, when constructing the MVP candidate list, the motion information of the spatial neighboring blocks is usually added, and the priority of the joining order of the motion information of the spatial neighboring blocks is the highest. However, adding the motion information of the spatial neighboring block to the MVP candidate list makes the encoding or decoding of the current image block dependent on the spatial neighboring block, and the parallel processing of multiple image blocks cannot be performed, which is not conducive to the improvement of encoding or decoding efficiency. . At the same time, the parallel technology when constructing the MVP candidate list is not marked in the high-level syntax (for example, sequence header/picture header/slice header, etc.), that is, this operation cannot be performed through the high-level syntax identifier. The switch is not conducive to adjusting the special requirements that the image block needs to meet, and realizes the flexible adaptation of encoding or decoding.

Therefore, the embodiment of the present application discloses a video processing method. The video processing device can use the history-based motion vector prediction (History-based motion vector prediction) when the first image block of the current frame satisfies the preset condition identified in the high-level syntax. prediction, HMVP) constructs a first motion information candidate list for the first image block, and encodes or decodes the first image block according to the motion information in the first motion information candidate list. The video processing device can also construct a second motion information candidate list for the first image block by using the motion information of the neighboring blocks in the spatial domain and the HMVP when the first image block does not meet the preset conditions identified in the high-level grammar, and according to the second motion information The motion information in the candidate list encodes or decodes the first image block. Among them, the current frame is the frame currently being encoded or decoded. The first image block can be any image block in the current frame.

In the embodiment of this application, when a certain image block of the current frame meets the preset conditions identified in the high-level syntax, HMVP can be used to construct a motion information candidate list for the image block. Compared with the traditional video processing method, the spatial neighboring block needs to be used. The motion information of the image block constructs a motion information candidate list. The embodiment of the present application reduces the complexity of the construction process of the motion information candidate list, and improves the efficiency of image coding and image decoding.

The embodiment of the present application also discloses another video processing method. The video processing device can use the HMVP to construct the first motion information candidate list for the first image block when the first image block of the current frame meets the preset condition. The video processing device may also construct a second motion information candidate list for the first image block by using the motion information of the neighboring blocks in the spatial domain and the HMVP when the first image block does not meet the preset condition.

In the embodiment of this application, when a certain image block in the current frame meets a preset condition, the HMVP can be used to construct a motion information candidate list for the image block. Compared with the traditional video processing method, the motion information of the neighboring blocks in the spatial domain needs to be used as the image The motion information candidate list is constructed by blocks. The embodiment of the present application reduces the complexity of the construction process of the motion information candidate list, and improves the efficiency of image coding and image decoding.

The embodiment of the application also discloses another video processing method. When the first image block of the current frame meets the preset condition identified in the high-level grammar, the video processing device constructs the first motion information for the first image block according to the preset rule Candidate list, the preset rule is used to indicate that the first motion information added to the first motion information candidate list is the motion information of temporal neighboring blocks or HMVP, and then according to the motion information in the first motion information candidate list, the first image The block is encoded or decoded.

In the embodiment of the present application, when a certain image block of the current frame satisfies the preset condition identified in the high-level syntax, the motion information added for the first time in the first motion information candidate list is the motion information of the temporal neighboring block or HMVP. Compared with the traditional video processing method, the motion information candidate list is constructed by the motion information of the spatial neighboring blocks in the first motion information candidate list for the first time. This embodiment of the application reduces The complexity of the construction process of the motion information candidate list improves the efficiency of image coding and image decoding.

The embodiment of the application also discloses another video processing method. When the first image block of the current frame meets a preset condition, the video processing device constructs a first motion information candidate list for the first image block according to the preset rule, and the preset The rule is used to indicate that the first motion information added to the first motion information candidate list is the motion information of the temporal neighboring block or HMVP, and then the first image block is encoded or decoded according to the motion information in the first motion information candidate list .

In the embodiment of the present application, when a certain image block in the current frame meets a preset condition, the motion information is constructed by the motion information of the temporal neighboring block or HMVP added to the first motion information candidate list for the first time Candidate list. Compared with the traditional video processing method, the motion information candidate list is constructed by the motion information of the neighboring blocks in the spatial domain as the motion information added to the first motion information candidate list for the first time. The complexity of the construction process improves the efficiency of image coding and image decoding.

Based on the foregoing description, please refer to FIG. 4, which is a schematic structural diagram of a video processing system provided by an embodiment of the present application. As shown in FIG. 4, the video processing system includes an encoding terminal 401 and a decoding terminal 402. The encoding terminal 401 is used to encode original video data to obtain encoded video data, or to encode original image data to obtain encoded image data. The encoding terminal 401 sends the encoded video data to the decoding terminal 402. The decoding terminal 402 is used to decode the encoded video data to obtain original video data, or to decode the encoded image data to obtain original image data.

In an example, the encoding terminal 401 and the decoding terminal 402 may run in the same video processing device. For example, after the video processing device collects the original video data, the original video data may be encoded by the encoding terminal 401 to obtain encoded video data, and then the encoded video data may be stored. Before playing the video data through the player, the video processing device may decode the encoded video data through the decoding terminal 402 to obtain the original video data, and then play the decoded original video data through the player. For another example, after the video processing device collects the original image data, the original image data may be encoded by the encoding terminal 401 to obtain encoded image data, and then the encoded image data may be stored. Before displaying the image data on the display screen, the video processing device may decode the encoded image data through the decoder 402 to obtain the original image data, and then play the decoded original image data through the player.

In another example, the encoding end 401 and the decoding end 402 may run in different video processing devices. For example, the encoding terminal 401 runs in a first video processing device, and the decoding terminal 402 runs in a second video processing device. After the first video processing device collects the original video data, the original video data can be encoded by the encoding terminal 401 to obtain encoded video data, and then the first video processing device sends the encoded video data to the second video processing device. equipment. The second video processing device may decode the encoded video data through the decoding terminal 402 to obtain the original video data.

In a video processing system composed of an encoding end 401 and a decoding end 402, the video processing method may be: when the first image block of the current frame meets the preset condition identified in the high-level syntax, the encoding end 401 uses HMVP as the The first image block constructs a first motion information candidate list, and encodes the first image block according to the motion information in the first motion information candidate list. When the first image block of the current frame does not meet the preset conditions identified in the high-level syntax, the motion information of the neighboring blocks in the spatial domain and the HMVP are used to construct a second motion information candidate list for the first image block, and based on the second motion The motion information in the information candidate list encodes the first image block. Or when the first image block of the current frame satisfies the preset condition identified in the high-level syntax, the decoder 402 uses the HMVP to construct the first motion information candidate list for the first image block, and according to the information in the first motion information candidate list Motion information, decode the first image block. When the first image block of the current frame does not meet the preset conditions identified in the high-level syntax, the motion information of the neighboring blocks in the spatial domain and the HMVP are used to construct a second motion information candidate list for the first image block, and based on the second motion The motion information in the information candidate list decodes the first image block.

It is understandable that the video processing system described in the embodiments of the present application is to illustrate the technical solutions of the embodiments of the present application more clearly, and does not constitute a limitation on the technical solutions provided in the embodiments of the present application. Those of ordinary skill in the art will know that, With the evolution of the system architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

Based on the foregoing description, please refer to FIG. 5. FIG. 5 is a schematic flowchart of a video processing method provided by an embodiment of the present application. The video processing method may include the following steps S501 to S503:

Step S501: When the first image block of the current frame satisfies the preset condition identified in the high-level grammar, use HMVP to construct a first motion information candidate list for the first image block. That is, when the first image block of the current frame satisfies the preset condition identified in the high-level syntax, the video processing device does not fill the motion information of the spatial neighboring block into the first motion information candidate list, but uses HMVP or other A method that does not involve spatial dependence constructs a first motion information candidate list for the first image block.

In an implementation manner, that the first image block satisfies the preset condition identified in the high-level grammar includes: the size of the first image block is smaller than or equal to the size of the image block identified in the high-level grammar. If the size of the first image block is less than or equal to the size of the image block identified in the high-level grammar, the video processing device may determine that the first image block satisfies the preset condition identified in the high-level grammar. For example, if the size of the image block identified in the high-level grammar is 64x64, then the image blocks in the current frame whose size is less than or equal to 64x64 (for example, 4x8, 8x4, 16x32, etc.) meet the preset conditions identified in the high-level grammar, which is to satisfy The motion information candidate list constructed by the image blocks with the preset conditions identified in the high-level syntax may all use the HMVP, but does not include the motion information of the neighboring blocks in the spatial domain.

Optionally, if the size of the first image block is greater than the size of the image block identified in the high-level grammar, the video processing device may determine that the first image block does not meet the preset condition identified in the high-level grammar, and further execute step S502. For example, if the size of the image block identified in the high-level grammar is 64x64, then the image block in the current frame whose size is larger than 64x64 (for example, 64x128, 128x64, 128x128, etc.) does not meet the preset condition identified in the high-level grammar, which is not satisfied The motion information candidate list constructed by the image blocks with the preset conditions identified in the high-level grammar will all utilize the motion information of the neighboring blocks in the spatial domain, but does not include the motion information of the neighboring blocks in the spatial domain.

In an implementation manner, that the first image block satisfies the preset condition identified in the high-level grammar includes: the size of the image block identified in the high-level grammar includes the size of the first image block. If the size of the image block identified in the high-level grammar includes the size of the first image block, the video processing device may determine that the first image block satisfies the preset condition identified in the high-level grammar. In specific implementation, the size of the image block identified in the high-level syntax may include at least one. For example, the size of the image block identified in the high-level syntax is 4x8, 8x4, and 64x64, then the size of the current frame is 4x8, 8x4, and 64x64. All image blocks satisfy the preset conditions identified in the high-level grammar, and the motion information candidate list constructed for the image blocks that meet the preset conditions identified in the high-level grammar may use HMVP, but does not include the motion information of the spatial neighboring blocks .

Optionally, if the size of the image block identified in the high-level grammar does not include the size of the first image block, the video processing device may determine that the first image block does not meet the preset condition identified in the high-level grammar. In specific implementation, the size of the image block identified in the high-level syntax may include at least one. For example, the size of the image block identified in the high-level syntax is 4x8, 8x4, and 64x64, then the size of the current frame is not 4x8, 8x4, or None of the 64x64 image blocks meets the preset conditions identified in the high-level grammar, and the motion information candidate list constructed for image blocks that do not meet the preset conditions identified in the high-level grammar will use the motion information of the spatial neighboring blocks.

Wherein, the size of each identified image block is M*N, and both M and N are greater than or equal to 4. M and N can be equal, or M and N can be unequal. specific:

1. Taking the prediction mode of the first image block as inter-frame prediction as an example, M is greater than or equal to 4, N is greater than or equal to 4, M and N may be equal, and M and N may not be equal. For example, a high-level syntax can be used to identify image blocks with sizes of 4x4, 8x4, 16x32, 32x16, 64x128, and 128x128.

2. Taking the prediction mode of the first image block as IBC as an example, M is greater than or equal to 4, N is greater than or equal to 4, and M and N may not be equal. For example, a high-level syntax can be used to identify image blocks with a size of 8x4, 16x32, 32x16, and 64x128.

3. Taking the prediction mode of the first image block as IBC as an example, one of M and N is greater than or equal to 4, and the other of M and N is greater than 4. For example, a high-level syntax can be used to identify image blocks with a size of 4x8, 8x4, 16x32, 32x16, 64x128, 128x128.

In an implementation manner, the video processing device may add the identification of the syntax element to the high-level syntax, that is, identify that for image blocks of certain sizes, use HMVP to construct a motion information candidate list. By setting this syntax element, the size of the image block that can use HMVP instead of the motion information of the spatial neighboring blocks to construct the motion information candidate list in the subsequent encoding process is determined.

Taking a scenario where the size of the image block identified in the high-level syntax includes the size of the first image block as an example, in one example, the syntax element set by the video processing device may include an index value of the size of at least one image block. For example, the index value of 4x4 is 0, the index value of 4x8 is 1, and the index value of 4x16 is 2. If the video processing device sets the syntax element to include 0 and 1, then it can be determined that the size of the image block is 4x4 or 4x8. The HMVP can be used to construct the motion information candidate list instead of the motion information of the neighboring blocks in the spatial domain. In another example, the syntax element set by the video processing device may include the size of at least one image block. If the video processing device sets the syntax elements to include 4x4 and 4x8, it can be determined that for image blocks with a size of 4x4 or 4x8 in the image, HMVP can be used to construct a motion information candidate list instead of the motion information of spatial neighboring blocks.

Taking a scenario where the size of the first image block is smaller than or equal to the size of the image block identified in the high-level syntax as an example, in one example, the syntax element set by the video processing device may include an index value of the size of the image block. For example, the index value of 4x4 is 0, the index value of 4x8 is 1, and the index value of 4x16 is 2. If the video processing device sets the syntax element to include 2, then it can be determined that for image blocks with a size less than or equal to 4x16 in the image, use HMVP Build a list of motion information candidates. In another example, the syntax element set by the video processing device may include the size of the image block. If the video processing device sets the syntax element to include 4x16, then it can be determined that for image blocks with a size less than or equal to 4x16 in the image, the motion information candidate list can be constructed using HMVP instead of using the motion information of spatial neighboring blocks.

It can be understood that the preset conditions identified in the high-level grammar in the embodiments of the present application include but are not limited to the foregoing content. For example, the first image block meeting the preset condition identified in the high-level grammar includes: the position of the image block identified in the high-level grammar includes the position of the first image block in the image. If the position of the image block identified in the high-level grammar includes the position of the first image block in the image, the video processing device can determine that the first image block satisfies the preset condition identified in the high-level grammar; if the position of the image block identified in the high-level grammar The position does not include the position of the first image block in the image, then the video processing device may determine that the first image block does not satisfy the preset condition identified in the high-level grammar. For example, if the position of the image block identified in the high-level grammar is the upper left corner or the lower right corner, the video processing device can obtain the position of the first image block in the image. If the first image block is located in the upper left corner of the image, then the video processing device It may be determined that the first image block satisfies the preset condition identified in the high-level grammar. If the first image block is located in the upper right corner of the image, the video processing device may determine that the first image block does not satisfy the preset condition identified in the high-level grammar. Wherein, the position of the image block identified in the high-level grammar includes at least one position.

In an implementation manner, the video processing device may add the identifier of the syntax element to the image header information, sequence header information, or strip header information. Among them, each image corresponds to an image header information. If a syntax element identification is added to the image header information of a certain frame, then it can be determined that the size of the image block in the frame is the size indicated by the syntax element. HMVP or other methods that do not involve spatial dependence are constructed, and the motion information of neighboring blocks in the spatial domain is not used to construct a candidate list of motion information. Among them, each sequence of video data corresponds to a sequence header information. If the identifier of a syntax element is added to the sequence header information of a certain sequence of video data, then the size of each frame contained in the sequence of video data can be determined For image blocks whose size is the size indicated by the syntax element, the motion information candidate list can be constructed by using HMVP instead of using the motion information of the neighboring blocks in the spatial domain. Among them, each frame can correspond to at least one piece of header information. If a syntax element identifier is added to a certain piece of header information in a frame, then it can be determined that the size of the image block in the frame is the size indicated by the syntax element. HMVP can be used to construct a motion information candidate list instead of the motion information of neighboring blocks in the spatial domain.

It can be understood that the identification method of the preset condition in the high-level grammar in the embodiment of the present application includes but is not limited to the foregoing content. As long as it can be used to determine which image block of the current frame is constructing the motion information candidate list, it does not need to use the motion information of the neighboring blocks in the spatial domain.

In an implementation manner, when the first image block meets the preset condition identified in the high-level syntax, the video processing device constructs the first motion information candidate list for the first image block according to the type of the motion information of the first image block. The prediction mode is determined.

Among them, the prediction mode may include inter prediction or IBC.

When the prediction mode is IBC, the motion information in which the video processing device constructs the first motion information candidate list for the first image block may include HMVP.

In an example, the video processing device may select HMVP from the HMVP list as the motion information in the first motion information candidate list. If after filling the selected candidate HMVP into the first motion information candidate list, the first motion information candidate list is not filled, then use zero motion vector (0, 0) to fill the first motion information candidate list until it is filled .

In another example, when the HMVP list is empty, the motion information in the first motion information candidate list includes a zero motion vector. For example, if all the first image blocks meet the preset conditions identified in the high-level syntax, after each first image block is encoded or decoded, the motion information used in the encoding or decoding of each first image block is not used If the HMVP list is updated, the HMVP list may be empty. Based on this, the motion information in the first motion information candidate list may be a zero motion vector.

When the prediction mode is inter prediction, the motion information of the first motion information candidate list constructed by the video processing device for the first image block may also include motion information of temporal neighboring blocks.

In an example, if the prediction mode is the merge mode of inter prediction, the motion information of the first motion information candidate list constructed by the video processing device for the first image block may include motion information of temporal neighboring blocks, HMVP, and composition. For the average candidate MV. If the first motion information candidate list is not filled after the motion information, HMVP, and paired average candidate MV of neighboring blocks in the time domain are filled into the first motion information candidate list, the zero motion vector (0, 0) pair is used The first motion information candidate list is filled until it is full.

In another example, if the prediction mode is the merge mode of inter prediction and the HMVP list is empty, the motion information of the first motion information candidate list constructed by the video processing device for the first image block may include temporal neighboring blocks The motion information and the paired average candidate MV. Further, the motion information in which the video processing device constructs the first motion information candidate list for the first image block may also include a zero motion vector.

In an example, if the prediction mode is a non-merge mode of inter prediction, the motion information of the first motion information candidate list constructed by the video processing device for the first image block may include motion information of temporal neighboring blocks and HMVP. If the motion information of the neighboring blocks in the time domain and the selected candidate HMVP are filled into the first motion information candidate list, and the first motion information candidate list is not filled, the zero motion vector (0, 0) is used for the first motion The information candidate list is filled until it is full.

In another example, if the prediction mode is the non-merge mode of inter prediction and the HMVP list is empty, then the motion information of the first motion information candidate list constructed by the video processing device for the first image block may include temporal neighbors The movement information of the block. Further, the motion information in which the video processing device constructs the first motion information candidate list for the first image block may also include a zero motion vector.

In an implementation manner, the encoding of the first image block is synchronized with the encoding of the second image block of the current frame, or the decoding of the first image block is synchronized with the decoding of the second image block. Wherein, the second image block is a spatial neighboring block of the first image block.

In the embodiment of the present application, since the video processing device does not need to use the motion information of spatial neighboring blocks to construct the first motion information candidate list for the first image block when the first image block satisfies the preset condition identified in the high-level grammar, it can Parallelize the construction process of the first motion information candidate list of the first image block and the construction process of the motion information candidate list of the second image block to realize the encoding of the first image block and the encoding of the second image block of the current frame Synchronization, or the decoding of the first image block is synchronized with the decoding of the second image block. In addition, in the high-level grammar, the identification of the syntax element is added for the above operation, that is, the image block of certain size can be operated in parallel with other image blocks to achieve a parallel operation of the construction of the motion information candidate list with adjustable size. Effect. Specifically, by setting the syntax element, the size of the image block that can be parallelized in the construction of the motion information candidate list with other image blocks is determined in the subsequent encoding process.

In an implementation manner, after the video processing device encodes or decodes the first image block according to the motion information in the first motion information candidate list, it may not use the motion information used in the encoding or decoding of the first image block. , To update the HMVP list. In specific implementation, the video processing device uses HMVP to construct a first motion information candidate list for the first image block when the first image block satisfies the preset condition identified in the high-level grammar, and then the video processing device can use the HMVP to construct the first motion information candidate list according to the first motion information candidate list The motion information in the first image block is encoded or decoded. After the video processing device encodes or decodes the first image block according to the motion information in the first motion information candidate list, it does not use the motion information used in the encoding or decoding of the first image block to update the HMVP list, That is, the HMVP list does not include the motion information used in encoding or decoding of the first image block.

In the embodiment of the present application, for the image block that meets the preset conditions identified in the high-level syntax, the motion information used in encoding or decoding of the image block is skipped, and the process of updating the HMVP list is reduced. The coding and decoding complexity of the image block is improved, and the throughput rate of the image block during encoding or decoding is improved.

In an implementation manner, after encoding or decoding the first image block according to the motion information in the first motion information candidate list, the HMVP list may be operated based on the prediction mode of the first image block.

Among them, the prediction mode may include inter prediction or IBC. When the prediction mode is inter-frame prediction, the video processing device can update the HMVP list by using the motion information used in encoding or decoding of the first image block. When the prediction mode is IBC, the video processing device can keep the HMVP list unchanged.

Step S502: When the first image block of the current frame does not meet the preset condition identified in the high-level grammar, construct a second motion information candidate list for the first image block using the motion information of the neighboring blocks in the spatial domain and the HMVP.

Taking the prediction mode as IBC as an example, the specific process for the video processing device to construct the second motion information candidate list may be:

1. Determine the motion information of the spatial neighboring block of the first image block, and fill the motion information of the spatial neighboring block into the second motion information candidate list.

Taking the schematic diagram of the image block shown in FIG. 6 as an example, the spatial neighboring blocks of CU1 include CU2 and CU3. In the process of constructing the second motion information candidate list for CU1, the motion information of the encoded CU2 may be filled into the second motion information candidate list, and the motion information of the encoded CU3 may be filled into the second motion information candidate list. It can be seen that before using the motion information of the spatial neighboring blocks and HMVP to construct the second motion information candidate list for the image block, it is necessary to ensure that the spatial neighboring blocks of the image block have been encoded or decoded. If the spatial neighboring blocks are not encoded, If the decoding is completed or the decoding is not completed, the image block cannot be encoded or decoded, resulting in a low throughput rate of the encoding and decoding of the image block.

2. Select HMVP from the HMVP list as the motion information in the motion information candidate list. If the motion information candidate list of the coded neighboring CUs and the selected HMVP are filled into the motion information candidate list, the motion information candidate list is not filled, then the motion information candidate list is filled with a zero motion vector (0, 0), Until it fills up.

It can be understood that, for an image block, the motion information candidate list constructed in the process of encoding or decoding, according to whether the image block meets the preset conditions of the high-level syntax identification, either the first motion information candidate list is constructed, or the first motion information candidate list is constructed. 2. A list of motion information candidates.

For ease of understanding, the motion information candidate list may be an MVP candidate list as an example. The above-mentioned construction method of the MVP candidate list is the construction method of the first motion information candidate list or the second motion information candidate list. Wherein, the motion information candidate list mentioned in the embodiment of the present application (for example, the first motion information candidate list or the second motion information candidate list) may be a set of candidate motion information of the image block, and each of the motion information candidate lists Candidate motion information can be stored in the same buffer or in different buffers, and there is no restriction here. The index of the motion information in the motion information candidate list may be the index of the motion information in the set of candidate motion information of the image block. For example, the set of candidate motion information includes 5 candidate motion information, and the indexes of the 5 candidate motion information in the motion information candidate list may be 0, 1, 2, 3, 4, respectively.

The motion information mentioned in the embodiments of the present application may include a motion vector, or include a motion vector and reference frame information (for example, a reference frame index), and so on.

Step S503: encode or decode the first image block according to the motion information in the first motion information candidate list or the second motion information candidate list.

In specific implementation, when the first image block of the current frame satisfies the preset condition identified in the high-level grammar, the video processing device can use HMVP to construct a first motion information candidate list for the first image block, and then according to the first motion information candidate list The motion information in the first image block is encoded or decoded. When the first image block of the current frame does not meet the preset conditions identified in the high-level syntax, the video processing device can use the motion information of the spatial neighboring blocks and the HMVP to construct a second motion information candidate list for the first image block, and then according to the first image block Second, the motion information in the motion information candidate list encodes or decodes the first image block.

In the embodiment of the present application, when the first image block of the current frame meets the preset condition identified in the high-level syntax, the video processing device uses HMVP to construct the first motion information candidate list for the first image block, which reduces the number of motion information candidate lists. The complexity of the construction process improves the efficiency of image coding and image decoding. In addition, image blocks that meet the preset conditions can be set to be adjustable in size through high-level syntax, which can increase the flexibility and adaptability of coding and decoding.

Please refer to FIG. 7. FIG. 7 is a schematic flowchart of another video processing method provided by an embodiment of the present application. The video processing method may include the following steps S701 to S703:

Step S701: When the first image block of the current frame satisfies a preset condition, use the HMVP to construct a first motion information candidate list for the first image block. That is to say, when the first image block of the current frame meets the preset condition, the video processing device does not fill the motion information of the spatial neighboring blocks into the first motion information candidate list, but can use HMVP to construct the first image block The first motion information candidate list.

Wherein, that the first image block satisfies the preset condition includes: the size of the first image block meets the preset size. The preset size may be a size preset by the video processing device, or a default value of the encoder or decoder, or a size specified by the encoder and decoder at the same time.

In an implementation manner, the size of the first image block meeting the preset size includes: the size of the first image block is less than or equal to the preset size; or, the preset size includes the first image block Size. That is, if the size of the first image block is less than or equal to the preset size, or the preset size includes the size of the first image block, the video processing device may determine the size of the first image block Meet the preset size.

In an implementation manner, the preset size includes at least one, and each preset size is M*N, and both M and N are greater than or equal to 4.

Among them, M and N may be equal, or M and N may not be equal.

If the size of the first image block is less than or equal to the preset size, the video processing device may determine that the first image block meets the preset condition. For example, if the preset size is 64x64, then all image blocks with a size smaller than or equal to 64x64 (for example, 4x8, 8x4, 16x32, etc.) in the current frame meet the preset condition.

Taking the prediction mode of the first image block as inter-frame prediction as an example, M is greater than or equal to 4, N is greater than or equal to 4, and M and N may be equal or not equal. For example, if the preset sizes include 4x4, 8x4, 16x32, 32x16, 64x128, and 128x128, the image blocks with sizes of 4x4, 8x4, 16x32, 32x16, 64x128, and 128x128 in the current frame all meet the preset conditions.

Taking the prediction mode of the first image block as IBC as an example, M is greater than or equal to 4, N is greater than or equal to 4, and M and N may not be equal. For example, if the preset size is 8x4, 16x32, 32x16, 64x128, then the image blocks with sizes of 8x4, 16x32, 32x16, 64x128 in the current frame all meet the preset conditions.

In one implementation, when the prediction mode of the first image block is IBC, the size of the first image block is K*L, one of K and L is greater than or equal to 4, and the other of K and L is greater than 4. .

For example, when the preset mode of the first image block is IBC, even when the size of the first image block is 4x8, 8x4, 16x32, 32x16, 64x128, 128x128, HMVP can be used instead of the motion of adjacent blocks in the spatial domain. The information constructs a first motion information candidate list for the first image block.

In an implementation manner, when the first image block satisfies a preset condition, the type of motion information for constructing the first motion information candidate list for the first image block is determined according to the prediction mode of the first image block. Among them, the prediction mode may include inter prediction or IBC.

For example, when the prediction mode is inter-frame prediction, the motion information for constructing the first motion information candidate list for the first image block may also include motion information of temporal neighboring blocks.

For another example, when the prediction mode is IBC, the motion information for constructing the first motion information candidate list for the first image block may include HMVP.

In an implementation manner, after the video processing device encodes or decodes the first image block according to the motion information in the first motion information candidate list, it does not use the motion information used in the encoding or decoding of the first image block, Update the HMVP list.

It can be understood that the difference between step S701 and step S501 is that the preset condition is not necessarily identified in the high-level syntax, that is, in step S701, the preset condition may not be identified in the high-level syntax, but in the encoding or decoding of the first image block. During the process, it is determined in real time whether the first image meets the preset condition. Therefore, for the same part of step S701 and step S501, reference may be made to the corresponding description in S501 for details, which will not be repeated in this embodiment of the application.

Step S702: When the first image block of the current frame does not meet the preset condition, use the motion information of the neighboring blocks in the spatial domain and the HMVP to construct a second motion information candidate list for the first image block.

It can be understood that the difference between step S702 and step S502 is that the preset condition is not necessarily identified in the high-level syntax, that is, in step S702, the preset condition may not be identified in the high-level syntax, but in the encoding or decoding of the first image block. During the process, it is determined in real time whether the first image meets the preset condition. Therefore, for the same parts of step S702 and step S502, reference may be made to the corresponding description in S502 for details, which will not be repeated in this embodiment of the application.

Step S703: encoding or decoding the first image block according to the motion information in the first motion information candidate list or the second motion information candidate list.

In specific implementation, when the first image block of the current frame meets a preset condition, the video processing device can use HMVP to construct a first motion information candidate list for the first image block, and then, according to the motion information in the first motion information candidate list, Encode or decode the first image block. When the first image block of the current frame does not meet the preset condition, the video processing device can construct a second motion information candidate list for the first image block by using the motion information of the neighboring blocks in the spatial domain and HMVP, and then according to the second motion information candidate list The motion information in the first image block is encoded or decoded.

In the embodiment of the application, the video processing device uses HMVP to construct the first motion information candidate list for the first image block when the first image block of the current frame meets the preset condition, which reduces the complexity of the construction process of the motion information candidate list. , Improve the efficiency of image coding and image decoding.

Please refer to FIG. 8. FIG. 8 is a schematic flowchart of another video processing method provided by an embodiment of the present application. The video processing method may include the following steps S801 and S802:

Step S801: When the first image block of the current frame meets the preset condition identified in the high-level grammar, construct a first motion information candidate list for the first image block according to the preset rule, and the preset rule is used to indicate the first motion information candidate The motion information added to the list for the first time is the motion information of neighboring blocks in the time domain or HMVP.

It can be understood that if the prediction mode is the merge mode of inter-frame prediction, the motion information of the motion information candidate list constructed by the video processing device for the image block may include motion information of spatial neighboring blocks, motion information of temporal neighboring blocks, HMVP, and Average candidate MVs in pairs. Among them, the order in which the video processing device adds the motion information to the motion information candidate list can be: motion information of neighboring blocks in the spatial domain → motion information of neighboring blocks in the temporal domain → HMVP → pairwise average candidate MV, that is, first adjacent in the spatial domain The motion information of the block is filled in the motion information candidate list, and then the motion information of the temporal neighboring blocks is filled in the motion information candidate list, and then the HMVP is filled in the motion information candidate list, and the paired average candidate MV is further filled in the motion information candidate list. . If after filling the motion information, HMVP, and paired average candidate MV of the time domain neighboring blocks into the motion information candidate list, the motion information candidate list is not filled, then use the zero motion vector motion information candidate list to fill it until it is filled .

If the prediction mode is a non-merge mode of inter prediction, the motion information of the motion information candidate list constructed by the video processing device for the image block may include motion information of spatial neighboring blocks and HMVP. Among them, the order in which the video processing device adds the motion information to the motion information candidate list can be: motion information of adjacent blocks in the spatial domain → motion information of adjacent blocks in the time domain → HMVP, that is, the motion information of adjacent blocks in the spatial domain is first filled into In the motion information candidate list, the motion information of the temporal neighboring blocks is filled in the motion information candidate list, and then the HMVP is filled in the motion information candidate list. If the motion information candidate list of the neighboring blocks in the time domain and the HMVP are filled into the motion information candidate list, the motion information candidate list is not filled, and then the motion information candidate list is filled with the zero motion vector until it is filled.

It can be seen from the above that if the prediction mode is the merge mode or non-merge mode of inter-frame prediction, the preset rule is used to indicate that the first motion information added to the first motion information candidate list is the motion information of temporal neighboring blocks, which means The motion information of the neighboring blocks in the spatial domain is excluded from the first motion information candidate list.

At the same time, if the prediction mode is the merge mode or the non-merge mode of the IBC, the motion information of the motion information candidate list constructed by the video processing device for the image block may include motion information of spatial neighboring blocks and HMVP. If the motion information candidate list of the neighboring blocks in the spatial domain and the HMVP are filled into the motion information candidate list, the motion information candidate list is not filled, then the zero motion vector motion information candidate list is used for filling until it is filled. Exemplarily, the order in which the video processing device adds the motion information to the motion information candidate list may be: motion information of neighboring blocks in the spatial domain→HMVP→zero motion vector.

It can be seen from the above that if the prediction mode is the merge mode or non-merge mode of IBC, the preset rule is used to indicate that the first motion information added to the first motion information candidate list is HMVP, which means that it is in the first motion information candidate list The motion information of neighboring blocks in the spatial domain is excluded.

It can be seen from this that in the traditional video processing method, the motion information added to the motion information candidate list for the first time is the motion information of the neighboring blocks in the spatial domain, and the embodiment of the present application expects to exclude the use of neighboring spatial motion blocks to construct the first motion information. Candidate list, then the first motion information added to the first motion information candidate list in this embodiment of the application is motion information of neighboring blocks in the time domain or HMVP.

It can be understood that the difference between step S801 and step S501 is that the method of constructing the first motion information candidate list is not to use the type of motion information, but to limit it according to the order in which the motion information added to the first motion information candidate list is added. The motion information of the neighboring blocks in the spatial domain is not used to construct the first motion information candidate list. Therefore, for the same parts of step S801 and step S501, reference may be made to the corresponding description in S501 for details, which will not be repeated in this embodiment of the application.

Step S802: encode or decode the first image block according to the motion information in the first motion information candidate list.

In one implementation, when the first image block of the current frame does not meet the preset condition identified in the high-level grammar, the video processing device constructs a second motion information candidate list for the first image block according to another preset rule, and A preset rule is used to indicate that the motion information added to the second motion information candidate list for the first time is motion information of neighboring blocks in the spatial domain. Then, the video processing device encodes or decodes the first image block according to the motion information in the second motion information candidate list.

In this embodiment of the application, when the first image block of the current frame meets the preset condition identified in the high-level grammar, the first motion information candidate list is constructed for the first image block according to the preset rule, and the preset rule is used to indicate the first image block. The motion information added to the motion information candidate list for the first time is the motion information or HMVP of the neighboring blocks in the time domain, rather than the motion information of the neighboring blocks in the spatial domain, thus eliminating the use of neighboring motion blocks in the spatial domain to construct the first motion information candidate list. The complexity of the construction process of the motion information candidate list is reduced, and the efficiency of image coding and image decoding is improved.

Please refer to FIG. 9. FIG. 9 is a schematic flowchart of another video processing method provided by an embodiment of the present application. The video processing method may include the following steps S901 and S902:

Step S901: When the first image block of the current frame meets a preset condition, construct a first motion information candidate list for the first image block according to the preset rule, and the preset rule is used to indicate the first motion information candidate list added to the first motion information candidate list. The motion information is the motion information of neighboring blocks in the time domain or HMVP.

It can be understood that the difference between step S901 and step S801 is that the preset condition is not necessarily identified in the high-level syntax, that is, in step S901, the preset condition may not be identified in the high-level syntax, but in the encoding or decoding of the first image block. During the process, it is determined in real time whether the first image meets the preset condition. Therefore, for the same parts of step S901 and step S801, reference may be made to the corresponding description in S801, which will not be repeated in this embodiment of the application.

Step S902: encode or decode the first image block according to the motion information in the first motion information candidate list.

In an implementation manner, when the first image block of the current frame does not meet a preset condition, the video processing device constructs a second motion information candidate list for the first image block according to another preset rule, and another preset rule is used The motion information added for the first time in the second motion information candidate list indicates that the motion information of the neighboring blocks in the spatial domain is the motion information. Then, the video processing device encodes or decodes the first image block according to the motion information in the second motion information candidate list.

In this embodiment of the application, when the first image block of the current frame meets a preset condition, a first motion information candidate list is constructed for the first image block according to a preset rule, and the preset rule is used to indicate that the first motion information candidate list is The motion information added for the first time is the motion information of neighboring blocks in the time domain or HMVP, rather than the motion information of neighboring blocks in the spatial domain, which reduces the complexity of the process of constructing the motion information candidate list and improves the efficiency of image coding and image decoding.

Please refer to FIG. 10, which is a schematic structural diagram of a video processing device according to an embodiment of the present application. The video processing device described in the embodiment of the present application at least includes: a processor 1001 and a memory 1002, where:

The memory 1002 is configured to store a computer program, and the computer program includes program instructions;

The processor 1001 calls the program instructions to execute the following steps:

In an implementation manner, that the first image block satisfies the preset condition identified in the high-level grammar includes: the size of the image block identified in the high-level grammar includes the size of the first image block.

In an implementation manner, that the first image block satisfies the preset condition identified in the high-level grammar includes: the size of the first image block is smaller than or equal to the size of the image block identified in the high-level grammar.

In an implementation manner, the size of the image block identified in the high-level grammar includes at least one, and the size of each identified image block is M*N, and both the M and the N are greater than or equal to 4.

In an implementation manner, the M and the N are not equal.

In an implementation manner, when the first image block satisfies the preset condition identified in the high-level grammar, the type of motion information used to construct the first motion information candidate list for the first image block is based on the first image block. The prediction mode of the image block is determined.

In an implementation manner, the prediction mode includes inter prediction or IBC; when the prediction mode is the inter prediction, the motion information for constructing the first motion information candidate list for the first image block further includes The motion information of the temporal neighboring block; when the prediction mode is the IBC, the motion information for constructing a first motion information candidate list for the first image block includes the HMVP.

In an implementation manner, the encoding of the first image block is synchronized with the encoding of the second image block of the current frame, or the decoding of the first image block is synchronized with the decoding of the second image block; wherein , The second image block is a spatial neighboring block of the first image block.

In an implementation manner, after the processor 1001 encodes or decodes the first image block according to the motion information in the first motion information candidate list, the processor 1001 does not use the first image block to encode or decode the first image block. Or the motion information used in decoding, update the HMVP list.

In an implementation manner, the processor 1001 is further configured to perform the following operations after encoding or decoding the first image block according to the motion information in the first motion information candidate list:

Based on the prediction mode of the first image block, an operation is performed on the HMVP list.

In an implementation manner, the prediction mode includes inter prediction or IBC;

The processor 1001 is specifically configured to perform the following operations when operating the HMVP list based on the prediction mode of the first image block:

When the prediction mode is the inter-frame prediction, use the motion information used during encoding or decoding of the first image block to update the HMVP list;

When the prediction mode is the IBC, keep the HMVP list unchanged.

In specific implementation, the processor 1001 described in the embodiment of the present application may execute the implementation manner described in the video processing method provided in FIG. 5 of the embodiment of the present application, and details are not described herein again.

In another embodiment, the processor 1001 calls the program instructions to perform the following steps:

In an implementation manner, that the first image block satisfies the preset condition includes: the size of the first image block meets the preset size.

In an implementation manner, that the size of the first image block satisfies a preset size includes: the size of the first image block is less than or equal to a preset size; or, the preset size The size includes the size of the first image block.

In an implementation manner, the preset size includes at least one, and each preset size is M*N, and both the M and the N are greater than or equal to 4.

In an implementation manner, the M and the N are not equal.

In an implementation manner, when the first image block satisfies the preset condition, the type of motion information used to construct the first motion information candidate list for the first image block is based on the type of the first image block. The prediction mode is determined.

When the prediction mode is the inter prediction, the motion information for constructing the first motion information candidate list for the first image block also includes motion information of temporal neighboring blocks;

When the prediction mode is the IBC, the motion information for constructing the first motion information candidate list for the first image block includes the HMVP.

When the processor 1001 operates the HMVP list based on the prediction mode of the first image block, it specifically performs the following operations:

When the prediction mode is the inter-frame prediction, update the HMVP list by using the motion information used during encoding or decoding of the first image block;

When the prediction mode is the IBC, keep the HMVP list unchanged.

In an implementation manner, when the prediction mode of the first image block is IBC, the size of the first image block is K*L, one of the K and the L is greater than or equal to 4, so The other of the K and the L is greater than 4.

In specific implementation, the processor 1001 described in the embodiment of the present application may execute the implementation manner described in the video processing method provided in FIG. 7 of the embodiment of the present application, and details are not described herein again.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores program instructions, and the program instructions may include the video processing method in the corresponding embodiments of FIG. 5 and FIG. 7 to FIG. 9 when the program instruction is executed. Part or all of the steps.

The embodiments of the present application also provide a computer program product. When the computer program product is run by a computer device, it can execute part or all of the steps of the video processing method in the embodiment corresponding to FIG. 5, FIG. 7 to FIG. 9.

It can be understood that for the foregoing various method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions, because it is based on In this application, certain steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are part of the embodiments of the application, and the involved actions and modules are not necessarily required by the application.

A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. During execution, the procedures of the foregoing method embodiments may be included. The computer-readable storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

The above provides a detailed introduction to the control method of the pan/tilt, the pan/tilt, and the movable platform provided by the embodiments of the application. In this article, specific examples are used to explain the principles and implementation of the application. The description of the above embodiments It is only used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application. In summary, The content of this specification should not be construed as a limitation to this application.

Claims

A video processing method, characterized in that the method includes:

When the first image block of the current frame satisfies the preset condition identified in the high-level grammar, use the history-based motion vector prediction HMVP to construct a first motion information candidate list for the first image block;

When the first image block of the current frame does not meet the preset condition identified in the high-level grammar, the motion information of the neighboring blocks in the spatial domain and the HMVP are used to construct a second motion information candidate list for the first image block.
The method according to claim 1, wherein the first image block meeting a preset condition identified in a high-level grammar comprises:

The size of the image block identified in the high-level syntax includes the size of the first image block.
The method according to claim 1, wherein the first image block meeting a preset condition identified in a high-level grammar comprises:

The size of the first image block is smaller than or equal to the size of the image block identified in the high-level syntax.
The method according to claim 2 or 3, wherein the size of the image block identified in the high-level grammar includes at least one, and the size of each identified image block is M*N, and the size of the image block is M*N. And the N is greater than or equal to 4.
The method according to claim 4, wherein said M and said N are not equal.
The method according to claim 1, wherein when the first image block satisfies the preset condition identified in the high-level grammar, the type of motion information in the first motion information candidate list is constructed for the first image block Is determined according to the prediction mode of the first image block.
The method according to claim 6, wherein the prediction mode includes inter prediction or intra block copy technology IBC;

When the prediction mode is the inter prediction, the motion information for constructing the first motion information candidate list for the first image block also includes motion information of temporal neighboring blocks;

When the prediction mode is the IBC, the motion information for constructing the first motion information candidate list for the first image block includes the HMVP.
The method according to claim 1, wherein the encoding of the first image block is synchronized with the encoding of the second image block of the current frame, or the decoding of the first image block is synchronized with the second image Block decoding synchronization;

Wherein, the second image block is a spatial neighboring block of the first image block.
The method according to claim 1, wherein the method further comprises:

After the first image block is encoded or decoded according to the motion information in the first motion information candidate list, the HMVP list is performed on the HMVP list without using the motion information used during encoding or decoding of the first image block. Update operation.
The method according to claim 1, wherein the method further comprises:

After encoding or decoding the first image block according to the motion information in the first motion information candidate list, an operation is performed on the HMVP list based on the prediction mode of the first image block.
The method according to claim 10, wherein the prediction mode comprises inter prediction or IBC;

The operating the HMVP list based on the prediction mode of the first image block includes:

When the prediction mode is the inter-frame prediction, use the motion information used during encoding or decoding of the first image block to update the HMVP list;

When the prediction mode is the IBC, keep the HMVP list unchanged.
A video processing method, characterized in that the method includes:

When the first image block of the current frame satisfies a preset condition, use the history-based motion vector prediction HMVP to construct a first motion information candidate list for the first image block;

When the first image block of the current frame does not satisfy the preset condition, a second motion information candidate list is constructed for the first image block by using the motion information of the neighboring blocks in the spatial domain and the HMVP.
The method according to claim 12, wherein the first image block satisfying a preset condition comprises:

The size of the first image block meets the preset size.
The method according to claim 13, wherein the size of the first image block meeting a preset size comprises:

The size of the first image block is less than or equal to a preset size; or,

The preset size includes the size of the first image block.
The method according to claim 14, wherein the preset size includes at least one, each of the preset sizes is M*N, and the M and the N are both greater than or equal to 4. .
The method according to claim 15, wherein said M and said N are not equal.
The method according to claim 12, wherein when the first image block satisfies the preset condition, constructing a first motion information candidate list for the first image block is based on the type of motion information The prediction mode of the first image block is determined.
The method according to claim 17, wherein the prediction mode comprises inter-frame prediction or intra-frame block copy technology (IBC);

When the prediction mode is the inter prediction, the motion information for constructing the first motion information candidate list for the first image block also includes motion information of temporal neighboring blocks;

When the prediction mode is the IBC, the motion information for constructing the first motion information candidate list for the first image block includes the HMVP.
The method according to claim 12, wherein the encoding of the first image block is synchronized with the encoding of the second image block of the current frame, or the decoding of the first image block is synchronized with the encoding of the second image block. Block decoding synchronization;

Wherein, the second image block is a spatial neighboring block of the first image block.
The method according to claim 12, wherein the method further comprises:

After the first image block is encoded or decoded according to the motion information in the first motion information candidate list, the HMVP list is performed on the HMVP list without using the motion information used during encoding or decoding of the first image block. Update operation.
The method according to claim 12, wherein the method further comprises:

After encoding or decoding the first image block according to the motion information in the first motion information candidate list, an operation is performed on the HMVP list based on the prediction mode of the first image block.
The method according to claim 21, wherein the prediction mode comprises inter prediction or IBC;

The operating the HMVP list based on the prediction mode of the first image block includes:

When the prediction mode is the inter-frame prediction, update the HMVP list by using the motion information used during encoding or decoding of the first image block;

When the prediction mode is the IBC, keep the HMVP list unchanged.
The method according to claim 12, wherein when the prediction mode of the first image block is IBC, the size of the first image block is K*L, and the size of the K and the L One is greater than or equal to 4, and the other of the K and L is greater than 4.
A video processing device, characterized in that the video processing device includes:

The memory is used to store a computer program, the computer program including program instructions;

The processor calls the program instructions to execute the following steps:

When the first image block of the current frame satisfies the preset condition identified in the high-level grammar, use the history-based motion vector prediction HMVP to construct a first motion information candidate list for the first image block;

When the first image block of the current frame does not meet the preset condition identified in the high-level grammar, the motion information of the neighboring blocks in the spatial domain and the HMVP are used to construct a second motion information candidate list for the first image block.
The video processing device according to claim 24, wherein the first image block meeting a preset condition identified in a high-level grammar comprises:

The size of the image block identified in the high-level syntax includes the size of the first image block.
The video processing device according to claim 24, wherein the first image block meeting a preset condition identified in a high-level grammar comprises:

The size of the first image block is smaller than or equal to the size of the image block identified in the high-level syntax.
The video processing device according to claim 25 or 26, wherein the size of the image block identified in the high-level syntax includes at least one, and the size of each identified image block is M*N, so Both said M and said N are greater than or equal to 4.
The video processing device according to claim 27, wherein the M and the N are not equal.
The video processing device according to claim 24, wherein when the first image block satisfies the preset condition identified in the high-level grammar, the motion information of the first motion information candidate list is constructed for the first image block The type of is determined according to the prediction mode of the first image block.
The video processing device according to claim 29, wherein the prediction mode comprises inter-frame prediction or intra-frame block copy technology (IBC);

When the prediction mode is the inter prediction, the motion information for constructing the first motion information candidate list for the first image block further includes the motion information of the temporal neighboring block;

When the prediction mode is the IBC, the motion information for constructing the first motion information candidate list for the first image block includes the HMVP.
The video processing device according to claim 24, wherein the encoding of the first image block is synchronized with the encoding of the second image block of the current frame, or the decoding of the first image block is synchronized with the encoding of the first image block. The decoding synchronization of the two image blocks;

Wherein, the second image block is a spatial neighboring block of the first image block.
The video processing device according to claim 24, wherein after the processor encodes or decodes the first image block according to the motion information in the first motion information candidate list, it does not use all The motion information used in the encoding or decoding of the first image block is used to update the HMVP list.
The video processing device according to claim 24, wherein the processor is further configured to encode or decode the first image block according to the motion information in the first motion information candidate list. Do the following:

Based on the prediction mode of the first image block, an operation is performed on the HMVP list.
The video processing device according to claim 33, wherein the prediction mode comprises inter prediction or IBC;

When the processor operates the HMVP list based on the prediction mode of the first image block, it is specifically configured to perform the following operations:

When the prediction mode is the inter-frame prediction, use the motion information used during encoding or decoding of the first image block to update the HMVP list;

When the prediction mode is the IBC, keep the HMVP list unchanged.
A video processing device, characterized in that the video processing device includes:

The memory is used to store a computer program, the computer program including program instructions;

The processor calls the program instructions to execute the following steps:

When the first image block of the current frame satisfies a preset condition, use the history-based motion vector prediction HMVP to construct a first motion information candidate list for the first image block;

When the first image block of the current frame does not satisfy the preset condition, a second motion information candidate list is constructed for the first image block by using the motion information of the neighboring blocks in the spatial domain and the HMVP.
The video processing device according to claim 35, wherein the first image block satisfying a preset condition comprises:

The size of the first image block meets the preset size.
The video processing device according to claim 36, wherein the size of the first image block meeting a preset size comprises:

The size of the first image block is less than or equal to a preset size; or,

The preset size includes the size of the first image block.
The video processing device according to claim 37, wherein the preset size includes at least one, each of the preset sizes is M*N, and the M and the N are both greater than or Equal to 4.
The video processing device according to claim 38, wherein said M and said N are not equal.
The video processing device according to claim 35, wherein when the first image block satisfies the preset condition, the type of the motion information used to construct the first motion information candidate list for the first image block is Determined according to the prediction mode of the first image block.
The video processing device according to claim 40, wherein the prediction mode comprises inter-frame prediction or intra-frame block copy technology (IBC);

When the prediction mode is the inter prediction, the motion information for constructing the first motion information candidate list for the first image block also includes motion information of temporal neighboring blocks;

When the prediction mode is the IBC, the motion information for constructing the first motion information candidate list for the first image block includes the HMVP.
The video processing device according to claim 35, wherein the encoding of the first image block is synchronized with the encoding of the second image block of the current frame, or the decoding of the first image block is synchronized with the encoding of the first image block. The decoding synchronization of the two image blocks;

Wherein, the second image block is a spatial neighboring block of the first image block.
The video processing device according to claim 35, wherein after the processor encodes or decodes the first image block according to the motion information in the first motion information candidate list, it does not use all The motion information used in the encoding or decoding of the first image block is used to update the HMVP list.
The video processing device according to claim 35, wherein the processor is further configured to encode or decode the first image block according to the motion information in the first motion information candidate list. Do the following:

Based on the prediction mode of the first image block, an operation is performed on the HMVP list.
The video processing device according to claim 44, wherein the prediction mode comprises inter prediction or IBC;

When the processor operates the HMVP list based on the prediction mode of the first image block, it specifically performs the following operations:

When the prediction mode is the inter-frame prediction, update the HMVP list by using the motion information used during encoding or decoding of the first image block;

When the prediction mode is the IBC, keep the HMVP list unchanged.
The video processing device according to claim 35, wherein when the prediction mode of the first image block is IBC, the size of the first image block is K*L, and the K and the L One of them is greater than or equal to 4, and the other of the K and the L is greater than 4.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute The video processing method described in any one of 1 to 11 is required.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute The video processing method described in any one of claims 12 to 23.