CN110337810B

CN110337810B - Method and apparatus for video processing

Info

Publication number: CN110337810B
Application number: CN201880012518.3A
Authority: CN
Inventors: 马思伟; 傅天亮; 王苫社; 郑萧桢
Original assignee: Peking University; SZ DJI Technology Co Ltd
Current assignee: Peking University; SZ DJI Technology Co Ltd
Priority date: 2018-04-02
Filing date: 2018-04-02
Publication date: 2022-01-14
Anticipated expiration: 2038-04-02
Also published as: CN110337810A; WO2019191889A1

Abstract

The embodiment of the application provides a method and equipment for video processing, which can reduce hardware resource consumption and occupied storage space in the process of obtaining a motion vector. The method comprises the following steps: in the process of obtaining the motion vector of the current image block, before matching the reconstructed image block for matching, performing down-sampling on the reconstructed image data; matching the reconstructed image data after the downsampling of the reconstructed image block to obtain a matching result; and acquiring the motion vector of the current image block based on the matching result.

Description

Method and apparatus for video processing

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.

Technical Field

The present application relates to the field of video processing, and more particularly, to a method and apparatus for video processing.

Background

Prediction is an important module of the mainstream video coding framework, in which inter-prediction is achieved by means of motion compensation. For a frame of picture in a video, the picture can be divided into equal size Code Tree Units (CTUs), for example, 64x64, 128x128 size. Each CTU may be further divided into square or rectangular Coding Units (CUs), and a most similar block may be found in a reference frame for each CU as a prediction block of a current CU. The relative displacement between the current block and the similar block is a Motion Vector (MV). The process of finding a similar block in the reference frame as the predictor of the current block is motion compensation.

The technology for leading out the motion information by the decoding end is a new technology appearing recently, is mainly used for correcting the decoded motion vector at the decoding end, and can improve the coding quality and further improve the performance of a coder under the condition of not increasing the code rate.

However, when obtaining the motion vector, a large amount of matching cost calculation is performed, and a large amount of hardware resources are consumed to store the reconstruction block required for calculating the matching cost, thereby occupying a large amount of storage space.

Disclosure of Invention

The embodiment of the application provides a method and equipment for video processing, which can reduce hardware resource consumption and occupied storage space in the process of obtaining a motion vector.

In a first aspect, a method for video processing is provided, including:

in the process of obtaining the motion vector of the current image block, before matching the reconstructed image block for matching, performing down-sampling on the reconstructed image data;

matching the reconstructed image data after the downsampling of the reconstructed image block to obtain a matching result;

and acquiring the motion vector of the current image block based on the matching result.

In a second aspect, there is provided an apparatus for video processing, comprising:

the down-sampling unit is used for down-sampling the reconstructed image data before the reconstructed image block used for matching is matched in the process of acquiring the motion vector of the current image block;

a matching unit, configured to perform matching using the down-sampled reconstructed image data of the reconstructed image block to obtain a matching result;

and the acquisition unit is used for acquiring the motion vector of the current image block based on the matching result.

In a third aspect, there is provided a computer system comprising: a memory for storing computer executable instructions; a processor for accessing the memory and executing the computer-executable instructions to perform the operations in the method of the first aspect described above.

In a fourth aspect, a computer storage medium is provided, in which program code is stored, the program code being operable to instruct execution of the method of the first aspect.

In a fifth aspect, a computer program product is provided, which comprises program code that may be used to instruct the execution of the method of the first aspect.

Therefore, in the embodiment of the present application, in the process of obtaining the motion vector MV of the current image block, the reconstructed image is down-sampled before the reconstructed image block for matching is matched, and then the matching cost is calculated after the down-sampling, so that the amount of processed data can be reduced, and thus the hardware resource consumption and the occupied storage space in the data processing process can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a codec system according to an embodiment of the application.

Fig. 2 is a schematic flow diagram of a method for video processing according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of a method for video processing according to an embodiment of the present application.

FIG. 4 is a schematic diagram of obtaining a bidirectional template according to an embodiment of the present application.

Fig. 5 is a schematic diagram of obtaining a motion vector based on a two-way template matching method according to an embodiment of the present application.

Fig. 6 is a schematic diagram of obtaining a motion vector based on a template matching method according to an embodiment of the present application.

Fig. 7 is a schematic diagram of obtaining a motion vector based on a two-way matching method according to an embodiment of the present application.

Fig. 8 is a schematic flow chart diagram of a method for video processing according to an embodiment of the present application.

Fig. 9 is a schematic block diagram of an apparatus for video processing according to an embodiment of the present application.

FIG. 10 is a schematic block diagram of a computer system according to an embodiment of the present application.

Detailed description of the invention

Technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless otherwise defined, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.

Fig. 1 is an architecture diagram of a solution to which an embodiment of the present application is applied.

As shown in FIG. 1, the system 100 can receive the data 102 to be processed, process the data 102 to be processed, and generate processed data 108. For example, the system 100 may receive data to be encoded, encoding the data to be encoded to produce encoded data, or the system 100 may receive data to be decoded, decoding the data to be decoded to produce decoded data. In some embodiments, the components in system 100 may be implemented by one or more processors, which may be processors in a computing device or in a mobile device (e.g., a drone). The processor may be any kind of processor, which is not limited in this embodiment of the present invention. In some possible designs, the processor may include an encoder, a decoder, a codec, or the like. One or more memories may also be included in the system 100. The memory may be used to store instructions and data, such as computer-executable instructions to implement aspects of embodiments of the invention, pending data 102, processed data 108, and the like. The memory may be any kind of memory, which is not limited in this embodiment of the present invention.

The data to be encoded may include text, images, graphical objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensory data from sensors, which may be visual sensors (e.g., cameras, infrared sensors), microphones, near-field sensors (e.g., ultrasonic sensors, radar), position sensors, temperature sensors, touch sensors, and the like. In some cases, the data to be encoded may include information from the user, e.g., biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA samples, and the like.

Wherein, in encoding each image, the image may be initially divided into a plurality of image blocks. In some embodiments, a picture may be divided into a plurality of tiles, which are referred to as macroblocks or Largest Coding Units (LCUs) in some Coding standards. The image blocks may or may not have any overlapping portions. The image may be divided into any number of image blocks. For example, the image may be divided into an array of m × n tiles. The image blocks may have a rectangular shape, a square shape, a circular shape, or any other shape. The image blocks may have any size, such as p × q pixels. In modern video coding standards, pictures of different resolutions can be encoded by first dividing the picture into a plurality of small blocks. For h.264, an image block is called a macroblock, which may be 16 × 16 pixels in size, and for HEVC, an image block is called a largest coding unit, which may be 64 × 64 in size. Each image block may have the same size and/or shape. Alternatively, the two or more image blocks may have different sizes and/or shapes. In some embodiments, an image block may also be not a macroblock or a largest coding unit, but a part comprising a macroblock or a largest coding unit, or comprising at least two complete macroblocks (or largest coding units), or comprising at least one complete macroblock (or largest coding unit) and a part of a macroblock (or largest coding unit), or comprising at least two complete macroblocks (or largest coding units) and a part of macroblocks (or largest coding units). As such, after an image is divided into a plurality of image blocks, the image blocks in the image data may be encoded separately.

In the encoding process, in order to remove redundancy, a picture may be predicted. Different images in the video may use different prediction modes. According to the prediction mode adopted by the image, the image can be divided into an intra-prediction image and an inter-prediction image, wherein the inter-prediction image comprises a forward prediction image and a bidirectional prediction image. I pictures are intra predicted pictures, also called key frames; the P picture is a forward predicted picture, i.e. a previously encoded P picture or I picture is used as a reference picture; b pictures are bi-predictive pictures, i.e. pictures before and after are used as reference pictures. One way to implement this is that an encoding end encodes a plurality of pictures to generate a section of group of pictures (GOP), which is a group of pictures composed of one I picture and a plurality of B pictures (or bidirectional predictive pictures) and/or P pictures (or forward predictive pictures). When playing, the decoding end reads a section of GOP for decoding, then reads the picture for rendering and displaying.

When performing inter-frame prediction, the most similar block in the reference frame (generally, a reconstructed frame near the time domain) for each image block may be found as a prediction block of the current image block. The relative displacement between the current block and the prediction block is a Motion Vector (MV).

In order to reduce the code rate between the encoding end and the decoding end, the motion information may not be transmitted in the code rate, so that the decoding end is required to derive the motion information, i.e. the motion vector. When the decoding end derives the motion information, the data throughput may be too large, which causes a problem that the decoding end occupies a large amount of hardware resources and space.

Therefore, the embodiment of the present application provides a method for video processing, which can reduce the amount of data to be processed when a decoding end derives motion information, so as to avoid the problem that the decoding end occupies a large amount of hardware resources and space. Similarly, when the method of the embodiment of the application is used for the encoding end, the hardware resource and the space occupied by the encoding end can be reduced.

Fig. 2 is a schematic flow diagram of a method for video processing according to an embodiment of the present application. The following method may optionally be implemented by the decoding side, or may also be implemented by the encoding side.

Wherein, when the method is implemented by a decoding side, the current image block mentioned below may be an image block to be decoded (which may also be referred to as an image block to be reconstructed). Alternatively, when the method is implemented by an encoding side, the current image block mentioned below may be an image block to be encoded.

In 210, the processing device performs downsampling on the reconstructed image data before matching the reconstructed image block for matching in the process of obtaining the motion vector MV of the current image block.

The processing device may be a device at an encoding end or a device at a decoding end.

And, the MV of the current image block may be understood as the MV between the current image block and the selected prediction block.

Alternatively, in the embodiment of the present application, the reconstructed image block may also be referred to as a reference block.

Optionally, in the embodiment of the present application, the downsampling of the reconstructed image data may be implemented by the following two implementations.

In one implementation, the reconstructed image data is down-sampled by a sampling interval of a certain number of pixels. The sampling modes of the pixels with a certain number of intervals can be adopted with a certain number of intervals in the horizontal direction and the vertical direction respectively.

For example, assuming that the downsampled object is a reconstructed image block of 128 × 128, some columns or some rows of pixels may be taken as the downsampled reconstructed image block.

Alternatively, the reconstructed image data may be down-sampled with the same number of pixel intervals. The sampling manner of the pixels with the same number of intervals may refer to that the pixels with the same number of intervals are adopted in the horizontal direction and/or the vertical direction, respectively.

For example, assuming that the down-sampled object is a reconstructed image block, the down-sampling is performed on the reconstructed image block at an interval of 2 in the horizontal and vertical directions, and a pixel point at the upper left corner can be taken as a down-sampling result; of course, the other three points of the four pixel points may also be taken as the down-sampling result.

For example, assuming that the down-sampled object is a reconstructed image block, the horizontal direction interval of the reconstructed image block is 2 down-sampled, and the vertical direction is not down-sampled.

For example, assuming that the down-sampled object is a reconstructed image block, the vertical interval of the reconstructed image block is 2 down-sampled, and the horizontal direction is not down-sampled.

In one implementation, the reconstructed image data is downsampled by averaging a plurality of pixels. Wherein the plurality of pixels may be adjacent pixels.

For example, assuming that the downsampled object is a reconstructed image block, for a reconstructed image block of 12 × 12, the pixel of four pixels may be averaged to downsample the reconstructed image block, where the four pixels may be adjacent pixels, for example, may be pixels in a 2 × 2 image block.

Optionally, the downsampled reconstructed image data may include downsampled reconstructed image data for the matched reconstructed image block.

In one implementation, the entire frame of image to which the reconstructed image block for matching belongs may be downsampled, that is, when downsampling is performed, each reconstructed image block is not distinguished, and at this time, the downsampled reconstructed image data may include reconstructed image data of the reconstructed image block for matching.

In another implementation, a reconstructed image block for matching may be determined and the determined reconstructed image block may be downsampled.

How to downsample the reconstructed image block for matching will be described in detail below.

Optionally, in this embodiment of the present application, the reconstructed image data of the reconstructed image block is downsampled according to the content of the reconstructed image block. Down-sampling the reconstructed image data of the reconstructed image block may be referred to as down-sampling the reconstructed image block.

In particular, the processing device may determine the proportion of downsampling according to the content of the reconstructed image block; and utilizing the down-sampling proportion to down-sample the reconstructed image data of the reconstructed image block.

The down-sampling ratio mentioned in the embodiment of the present application may refer to a ratio between the number of pixels included in the image block after down-sampling and the number of pixels included in the image block before sampling.

The sampling interval is small (that is, the down-sampling ratio is large) when the complexity of the reconstructed image block is high, and the sampling interval is large (that is, the down-sampling ratio is small) when the complexity of the image block is low, so that the adaptive down-sampling is performed according to the image content, and the performance loss caused by data sampling can be reduced.

Optionally, the content of the reconstructed image block mentioned in the embodiment of the present application may include: the reconstructed image block includes at least one of a number of pixels, a pixel gray scale, and an edge feature.

Specifically, the processing device may determine a down-sampling ratio according to at least one of the number of pixels, the pixel gray scale, and the edge feature included in the reconstructed image block; and performing down-sampling on the reconstructed image block by using the down-sampling proportion.

Alternatively, in the embodiment of the present application, the pixel gray scale of the reconstructed image block may be characterized by the variance of the gray scale histogram of the reconstructed image block.

Alternatively, in the embodiment of the present application, the edge feature of the reconstructed image block may be characterized by the number of pixels belonging to an edge point of the texture in the pixels included in the reconstructed image block.

Optionally, in this embodiment of the present application, when the reconstructed image blocks used for matching include at least two reconstructed image blocks, the reconstructed image data of the at least two reconstructed image blocks are downsampled according to the same downsampling ratio.

Specifically, in the process of determining the MV once, in the matching process, if at least two reconstructed image blocks need to be used, the reconstructed image data of the at least two reconstructed image blocks may be downsampled by using the same downsampling ratio.

For example, when it is determined that the at least two reconstructed image blocks need to respectively adopt different down-sampling ratios according to the pixel gray scales of the at least two reconstructed image blocks and/or the number of pixels belonging to edge points of the texture in the included pixels, the different down-sampling ratios may be averaged, and the average value is used for down-sampling the at least two reconstructed image blocks, or the reconstructed image data of the at least two reconstructed image blocks may be down-sampled with the highest down-sampling ratio or the lowest down-sampling ratio.

For example, when the values characterizing the pixel gray scale of the at least two reconstructed image blocks and/or the values characterizing the edge features of the at least two reconstructed image blocks are different, the values may be averaged (if the values characterizing the pixel gray scale and the values characterizing the edge features are used simultaneously, the values characterizing the pixel gray scale and the values characterizing the edge features may be averaged, respectively), a down-sampling ratio is calculated using the averaged values, and the reconstructed image data of the at least two reconstructed image blocks is down-sampled using the down-sampling ratio, respectively; alternatively, a maximum value (a maximum value of the values representing the pixel grayscales and a maximum value of the values representing the edge features may be taken if the values representing the pixel grayscales and the values representing the edge features are simultaneously used) or a minimum value (a minimum value of the values representing the pixel grayscales and a minimum value of the values representing the edge features may be taken if the values representing the pixel grayscales and the values representing the edge features are simultaneously used) among the values may be taken, one down-sampling ratio may be calculated, and the reconstructed image data of the at least two reconstructed image blocks may be down-sampled using the one down-sampling ratio, respectively.

It should be understood that, in the embodiment of the present application, the reconstructed image block for matching may include the same number of pixels as the current image block, and then determining the downsampling ratio according to the number of pixels included in the reconstructed image block for matching at this time may be implemented by determining the downsampling ratio according to the number of pixels included in the current image block.

Optionally, in this embodiment of the present application, the processing device determines to downsample the reconstructed image block in the matching process when at least one of the following conditions is satisfied:

the reconstructed image block comprises a number of pixels greater than or equal to a first predetermined value;

the variance of the gray level histogram of the reconstructed image block is greater than or equal to a second preset value;

the reconstructed image block includes pixels whose number of edge pixels belonging to a texture is greater than or equal to a third predetermined value.

That is to say, when the above conditions are met, the down-sampling is performed on the reconstructed image block, otherwise, the down-sampling is not performed, so that the problem of poor coding and decoding performance caused by blind down-sampling can be avoided.

When the reconstructed image blocks used for matching include at least two reconstructed image blocks, the number of pixels included in each reconstructed image block, the variance of the gray histogram, and the number of edge pixels belonging to the texture in the included pixels may all satisfy the above condition, or the average of the number of pixels included in the at least two reconstructed image blocks, the variance of the gray histogram, and the average of the number of edge pixels belonging to the texture in the included pixels may also satisfy the above condition.

It should be understood that, in the embodiment of the present application, the reconstructed image block for matching may include the same number of pixels as the current image block, and then determining whether to downsample the reconstructed image block according to the number of pixels included in the reconstructed image block for matching at this time may be performed by determining whether to downsample the reconstructed image block according to the number of pixels included in the current image block.

In the above, whether to down-sample the reconstructed image block and the proportion of down-sampling the reconstructed image block are determined according to the content of the reconstructed image block, but it should be understood that the embodiments of the present application are not limited thereto, and when the processing device performs down-sampling processing on the reconstructed image frame, the processing device may also determine whether to down-sample and/or down-sample the reconstructed image frame according to the content of the reconstructed image frame.

Specifically, the down-sampling ratio may be determined according to at least one of the number of pixels, the pixel gray scale, and the edge feature included in the reconstructed image frame; and performing down-sampling on the reconstructed image frame by using the down-sampling ratio.

Alternatively, before downsampling the reconstructed image frame, the following conditions need to be satisfied:

the reconstructed image frame comprises a pixel number which is greater than or equal to a specific value;

the variance of the gray level histogram of the reconstructed image frame is greater than or equal to a specific value;

the reconstructed image frame includes pixels in which the number of edge pixels belonging to a texture is greater than or equal to a specific value.

At 220, the processing device performs matching using the down-sampled reconstructed image data of the reconstructed image block for matching to obtain a matching result.

Optionally, in this embodiment of the present application, matching may also be referred to as distortion matching, and the matching result may be a matching cost obtained by performing distortion matching between the reconstructed image blocks.

At 230, the processing device obtains the MV for the current image block based on the matching result.

Optionally, in this embodiment of the present application, when the processing device is a device at an encoding end, the MV may be used to encode or reconstruct the current image block.

The encoding end may use the reconstructed image block corresponding to the MV as a prediction block, and encode or reconstruct the current image block based on the prediction block.

In an implementation manner, an encoding end may directly use a pixel of the prediction block as a reconstructed pixel of a current image block, and this mode may be referred to as a skip mode, where the skip mode is characterized in that a reconstructed pixel value of the current image block may be equal to a pixel value of the prediction block, and when the encoding end adopts the skip mode, an identifier may be transmitted in a code stream to indicate that the adopted mode is the skip mode to a decoding end.

In another implementation, the encoding end may subtract a pixel of the current image block from a pixel of the prediction block to obtain a pixel residual, and transmit the pixel residual to the decoding end in the code stream.

It should be understood that after obtaining the MV, the code end may use other manners to encode and reconstruct the current image block, which is not specifically limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, the present application may be used in an Advanced Motion Vector Prediction (AMVP) mode, that is, a result obtained by matching may be a Prediction value (MVP) of a Motion Vector, after obtaining the MVP, the encoding end may determine a starting point of Motion estimation according to the MVP, perform Motion search near the starting point, obtain an optimal MV after the search is completed, determine a position of a reference block in a reference image by the MV, subtract the current block from the reference block to obtain a residual block, subtract the MVP from the MV to obtain a Motion Vector Difference (MVD), and transmit the Motion Vector Difference (MVD) to the decoding end.

Optionally, in this embodiment of the present application, the present application may be implemented in a Merge mode, that is, a result obtained by performing matching may be an MVP, and the encoding end may directly determine the MVP as an MV, in other words, a result obtained by performing matching is an MV. For the encoding side, after obtaining the MVP (i.e., MV), the encoding side does not need to transmit the MVD, since the MVD defaults to 0.

Optionally, in this embodiment of the present application, when the processing device is a device at a decoding end, the MV may be used to decode the current image block.

The decoding end may use the reconstructed image block corresponding to the MV as a prediction block, and decode the current image block based on the prediction block.

In an implementation manner, a decoding end may directly use a pixel of the prediction block as a pixel of a current image block, and this mode may be referred to as a skip mode, where the skip mode is characterized in that a reconstructed pixel value of the current image block may be equal to a pixel value of the prediction block, and when the coding end adopts the skip mode, an identifier may be transmitted in a code stream to indicate that the adopted mode is the skip mode to the decoding end.

In another implementation, the decoding end may obtain a pixel residual from the code stream transmitted by the encoding end, and add the pixel of the prediction block and the pixel residual to obtain a pixel of the current image block.

It should be understood that after obtaining the MV, the current image block may be decoded in other manners, which is not specifically limited in this embodiment of the present application.

Optionally, in this embodiment of the present application, the embodiment of the present application may be used in an AMVP mode, that is, a result obtained by performing matching may be an MVP, and a decoding end may obtain an MV of a current image block by combining an MVD in a code stream transmitted by an encoding end.

Optionally, in this embodiment of the present application, the present application may be implemented in a Merge mode, that is, a result obtained by performing matching may be MVP, and a decoding end may directly determine the MVP as an MV, in other words, a result obtained by performing matching is an MV.

Optionally, in this embodiment of the present application, the initial MV of the current image block is modified based on the matching result to obtain the MV of the current image block.

That is, the processing device may obtain an initial MV, but the initial MV may not be the optimal MV or MVP, and the processing device may modify the initial MV to obtain the MV of the current image block.

For the encoding side, the index of the initial MV can be encoded and transmitted to the decoding side, and the index can enable the decoding side to select the initial MV from the initial MV list, wherein the index points to the following information: the index of the reference frame and the offset of the reference block in the spatial domain with respect to the current image block, on the basis of which the decoding end can select the initial MV.

For the decoding end, the initial MV may be obtained based on a code stream sent by the encoding end, the code stream may include an index, and the decoding end may obtain the initial MV based on the index.

Alternatively, the initial MV may include a plurality of initial MVs, and the plurality of initial MVs may respectively belong to different frames. The frame to which the initial MV belongs is a frame to which the reconstructed image block corresponding to the MV belongs.

Assuming that the plurality of initial MVs include a first MV and a second MV, a frame to which the first MV belongs and a frame to which the second MV belongs are different frames.

For example, the reconstructed image block corresponding to the first MV belongs to a forward frame of the current image block, and the reconstructed image block corresponding to the second MV belongs to a backward frame of the current image block.

Or the reconstructed image block corresponding to the first MV belongs to the forward frame of the current image block, and the reconstructed image block corresponding to the second MV belongs to the forward frame of the current image block.

Of course, the reconstructed image block corresponding to the first MV and the reconstructed image block corresponding to the second MV belong to different backward frames of the current image block, which is not specifically limited in this embodiment of the application.

For a more clear understanding of the present application, how the initial MV is modified will be described below with reference to implementation a.

Implementation A

Specifically, the processing device may generate a template (e.g., a mode of averaging pixels) based on the downsampled reconstructed image data of the reconstructed image block corresponding to the plurality of initial MVs, and correct each of the plurality of initial MVs using the generated template.

It should be understood that, in addition to generating the template by using the downsampled reconstructed image data of the plurality of reconstructed image blocks, a template may also be generated by using the non-downsampled reconstructed image data of the reconstructed image blocks corresponding to the plurality of initial MVs, and the template is downsampled, which is not specifically limited in this embodiment of the application.

Specifically, assuming that the initial MVs include a first MV and a second MV, the reconstructed image block corresponding to the first MV is a first reconstructed image block belonging to a first frame, the reconstructed image block corresponding to the second MV belongs to a second reconstructed image block belonging to a second frame, and the template is generated based on the downsampled reconstructed image data of the first reconstructed image block and the downsampled reconstructed image data of the second reconstructed image block. Wherein the template may be referred to as a bi-directional template.

Then, the downsampled reconstructed image data of N third reconstructed image blocks (which may be referred to as N downsampled third reconstructed image blocks) may be used to respectively match with the template, where the N third reconstructed image blocks correspond to N third MVs; matching the downsampled reconstructed image data of the M fourth reconstructed image blocks (which may be referred to as M downsampled fourth reconstructed image blocks) with the template respectively, wherein the M fourth reconstructed image blocks correspond to M fourth MVs; based on the matching result, one third MV is selected from the N third MVs, and one fourth MV is selected from the M fourth MVs.

Alternatively, the selected third MV may be the MV corresponding to the smallest distortion cost. Alternatively, the selected third MV may be an MV corresponding to a distortion cost smaller than a certain value.

Alternatively, the selected fourth MV may be the MV corresponding to the smallest distortion cost. Alternatively, the selected fourth MV may be an MV less than a certain value corresponding to a distortion cost.

The one third MV and the one fourth MV are used as MVs of the current image block, and at this time, a reconstructed image block corresponding to the one third MV and the one fourth MV may be weighted-averaged to obtain a prediction block

Alternatively, the one third MV and the one fourth MV may be used to determine an MV of the current image block, that is, the one third MV and the one fourth MV may be MVPs, respectively. At this time, the motion search and motion compensation processes may be performed based on the third MVP and the fourth MVP, respectively, to obtain the final MV.

Optionally, in this embodiment of the present application, the N third reconstructed image blocks may belong to the first frame, and the M fourth reconstructed image blocks may belong to the second frame.

Alternatively, N and M may be equal.

Optionally, the third MV includes the first MV, and the fourth MV includes the second MV, that is, the reconstructed image block corresponding to the first MV and the reconstructed image block corresponding to the second MV used for generating the template also need to be matched with the template respectively.

Optionally, in this embodiment of the application, at least a part of the N third MVs are obtained by performing a shift based on the first MV, and at least a part of the M fourth MVs are obtained by performing a shift based on the second MV.

For example, the MVs other than the first MV among the N third MVs may be obtained by shifting based on the first MV, for example, N may be equal to 9, and 8 MVs among the N third MVs may be obtained by shifting based on the first MV, for example, eight directions, or different pixels in a vertical direction or a horizontal direction.

For another example, the MVs other than the second MV among the M fourth MVs may be obtained by shifting based on the second MV, for example, M may be equal to 9, and 8 MVs among the M fourth MVs may be obtained by shifting based on the second MV, for example, obtained by shifting in eight directions or obtained by shifting different pixels in a vertical direction or a horizontal direction.

Alternatively, the method in implementation a may be referred to as MV selection by a bidirectional template matching method.

In order to more clearly understand the present application, the following describes the implementation a in detail with reference to fig. 3 to 5.

In 310 it is determined whether the width and height, respectively, of the size of the current image block is smaller than 8 pixels (of course, other numbers of pixels are possible). At 321, if yes, the reconstructed image blocks corresponding to MV0 and MV1 in reference list 0 and column reference table 1 are down-sampled and averaged to obtain the bi-directional template. The MV in the reference list 0 may be a motion vector between the current image block and a reconstructed image block in the forward reference frame, and the MV in the reference list 1 may be a motion vector between the current image block and a reconstructed image block in the backward reference frame.

Specifically, as shown in fig. 4, for a current image block, down-sampling is performed on a reference block 0 (reconstructed image block) corresponding to MV0 and a reference block 1 (reconstructed image block) corresponding to MV1, and then two reference blocks after down-sampling are averaged to obtain a bi-directional template after down-sampling.

At 322, the downsampled reconstructed image block corresponding to MV0 in list 0 is matched to the template. At 323, the MV0 is shifted to obtain multiple MVs 0'. At 324, the reconstructed image blocks corresponding to the plurality of MVs 0' are down-sampled and matched to the templates, respectively.

For example, as shown in fig. 5, surrounding pixels of the reference block corresponding to MV0 (specifically, pixels included in the reference block corresponding to MV 0') may be downsampled. Specifically, as shown in fig. 5, pixel values around the reference block corresponding to MV0 may be padded to obtain a reference block corresponding to MV 0' (an offset reference block), and the offset reference block may be downsampled. Finally, when calculating the matching cost, the bidirectional template after down-sampling and the reference block after down-sampling are used.

At 325, the MV0 'with the minimum matching cost is obtained, wherein the MV 0' with the minimum matching cost may be MV 0.

At 331, the downsampled reconstructed image block corresponding to MV1 in list 1 is matched to the template.

At 332, the MV1 is shifted to obtain multiple MVs 1'. At 333, the reconstructed image blocks corresponding to the MVs 1' are down-sampled and matched to the templates, respectively. At 334, the MV1 'with the minimum matching cost is obtained, wherein the MV 1' with the minimum matching cost may be MV 1.

For example, as shown in fig. 5, surrounding pixels of the reference block corresponding to MV1 (specifically, pixels included in the reference block corresponding to MV 1') may be downsampled. Specifically, as shown in fig. 5, pixel values around the reference block corresponding to MV1 may be padded to obtain a reference block corresponding to MV 1' (an offset reference block), and the offset reference block may be downsampled. Finally, when calculating the matching cost, the bidirectional template after down-sampling and the reference block after down-sampling are used.

At 335, a prediction block is generated from the reconstructed image block corresponding to MV0 'and MV 1' with the smallest matching cost.

At 336, the current image block is decoded based on the prediction block.

The implementation of the bidirectional template matching method according to the embodiment of the present application should not be limited to the above description.

Optionally, the above implementation a and its optional implementations may be implemented by DMVR technology.

Optionally, in this embodiment of the present application, the processing device obtains an initial motion vector MV corresponding to a current image block; for the initial MV, the reconstructed image block for matching is determined.

Wherein the initial MV may be an MV to be selected. Alternatively, the MV to be selected may be referred to as an MV candidate list.

How to select an MV from the MVs to be selected will be described below in conjunction with implementation B and implementation C.

Implementation B

Specifically, the initial MVs include K fifth MVs, and the downsampled reconstructed image data of an adjacent reconstructed image block of the K fifth reconstructed image blocks is matched with the downsampled reconstructed image data of an adjacent reconstructed image block of the current image block to obtain the matching result, where the K fifth reconstructed image blocks are in one-to-one correspondence with the K fifth MVs, and K is an integer greater than or equal to 1; one of the fifth MVs is selected from the K fifth MVs based on the matching result.

Alternatively, the selected fifth MV may be the MV corresponding to the smallest distortion cost. Alternatively, the selected fifth MV may be an MV less than a certain value corresponding to a distortion cost.

Wherein the selected one fifth MV can be used as the MV of the current image block. At this time, the reconstructed image block corresponding to the fifth MV may be used as a prediction block of the current image block.

Alternatively, the one fifth MV selected may be used to determine the MV of the current image block.

For example, the one fifth MV may be an MVP. At this time, motion search and motion compensation may be further performed according to the MVP to obtain a final MV. And taking the reconstructed image block corresponding to the optimized MV as a prediction block.

As another example, the first fifth MV is a Coding Unit (CU) level-based MV mentioned below, and the MV may be used to determine a Sub-CU (Sub-CU) level MV.

Alternatively, the K fifth MVs may be referred to as MV candidate lists.

Alternatively, the neighboring reconstructed image block of the current image block may be referred to as a template of the current image block. Among them, this implementation B may be referred to as MV selection based on a template matching method.

Alternatively, as shown in fig. 6, the neighboring reconstructed block of the fifth reconstructed block may include an upper neighboring block and/or a left neighboring block, and the neighboring reconstructed block of the current image block may include an upper neighboring block and/or a left neighboring block.

Implementation C

Specifically, the initial MV includes W sixth MVs, where W is an integer greater than or equal to 1; for two reconstructed image blocks corresponding to each MV pair of W MV pairs, matching the downsampled reconstructed image data of one of the reconstructed image blocks with the downsampled reconstructed image data of the other reconstructed image block to obtain the matching result, wherein each MV pair includes a sixth MV and a seventh MV determined based on the sixth MV; one MV pair is selected based on the matching results corresponding to the W MV pairs.

And determining the sixth MV in the selected MV pair as the MV of the current image block. At this time, the reconstructed image block corresponding to the sixth MV in the selected MV pair may be taken as a prediction block of the current image block.

Alternatively, the sixth MV of the selected MV pair may be used to determine the MV of the current image block.

For example, the sixth MV may be an MVP. At this time, motion search and motion compensation may be further performed according to the MVP to obtain a final MV. And taking the reconstructed image block corresponding to the final MV as a prediction block.

For another example, the first sixth MV is a CU level-based MV as mentioned below, and the MV may be used to determine a sub-CU level MV.

Alternatively, in the embodiment of the present application, the seventh MV is determined based on the sixth MV on the assumption that the motion trajectory is continuous.

Alternatively, the W sixth MVs may be MV candidate lists.

Optionally, in this embodiment of the present application, the sixth reconstructed block belongs to a forward frame of a frame to which the current image block belongs, and the seventh reconstructed block belongs to a backward frame of the frame to which the current image block belongs.

Optionally, in an embodiment of the present application, a temporal distance between the sixth reconstructed image block and the current image block may be equal to a temporal distance between the current image block and the seventh reconstructed image block.

Alternatively, for implementation C, each of the W sixth MVs may be used as an input, and based on the assumption of the bidirectional matching method, a MV pair is obtained. For example, if the reference block corresponding to a valid MVa in the MV candidate list belongs to the reference frame a in the reference list a, and the reference frame B corresponding to the MVb of the MV candidate list is in the reference list B, the reference frame a and the reference frame B are located on two sides of the current frame in the time domain. If such a reference frame B does not exist in the reference list B, the reference frame B is a reference frame different from the reference frame a and its temporal distance from the current frame is the smallest in the reference list B. After the reference frame b is determined, the MVa is scaled based on the time domain distances between the current frame and the reference frame a and between the current frame and the reference frame b respectively, and then the MVb can be obtained.

For example, as shown in fig. 7, for the bidirectional matching method, MV pairs may be generated from the respective candidate MVs, and distortion between two reference blocks corresponding to two MVs (MV0 and MV1) in each MV pair may be calculated. In the embodiment of the present application, both reference blocks may be down-sampled, and then distortion may be calculated for the down-sampled two reference blocks. When the distortion is minimum, the corresponding candidate MV (MV0) is the final MV.

Among them, this implementation C may be referred to as MV selection based on a bidirectional matching method.

Alternatively, implementations B and C above may be used for AMVP mode; the method can also be used for merge mode, and specifically, a motion vector derivation technique of mode matching can be adopted, where the PMMVD technique is a special merge mode based on a Frame Rate Up Conversion (FRUC) technique. In this mode, the motion information of a block is not encoded in the bitstream, but is generated directly at the decoding end.

The encoding end may select from a plurality of encoding modes, and specifically, may perform common merge mode encoding to obtain a minimum Rate Distortion Cost (RD-Cost), that is, Cost 0; then, encoding is performed by using a PMMVD mode, and RD-Cost is obtained, wherein RD-Cost corresponding to the MV obtained by the bidirectional matching method is Cost1, RD-Cost corresponding to the MV obtained by the template matching method is Cost2, and Cost3 is min (Cost1, Cost 2).

If cost0< cost3, the FRUC flag bit is false; otherwise, the FRUC flag is true, and an extra FRUC mode flag is used to indicate which mode (bi-directional matching or template matching) is used.

The RD-Cost is a criterion used in the encoder to measure which mode is used for decision, and takes into account both the video quality and the coding rate. RD-Cost + lambda bitrate, where Cost represents the loss of video quality by calculating the similarity (SAD, SSD, etc. index) between the original and reconstructed pixel blocks; bitrate represents the number of bits that need to be consumed using this mode.

Since the RD-Cost calculation requires the use of the original pixel values, which are not available at the decoding end, an extra FRUC mode flag bit is required to indicate which way to use to obtain the motion information.

Alternatively, in the embodiment of the present application, the motion information derivation process in the FRUC merge mode may be divided into two steps. Wherein the first step is a CU level based motion information derivation process and the second step is a Sub-CU level based motion information derivation process.

In the deriving process based on the motion information at the CU level, an initial MV of the entire CU may be derived, that is, an MV candidate list at the CU level, where the MV candidate list may include:

1) the original AMVP candidate MV is included if the current CU uses the AMVP mode, and specifically, may be added to the MV candidate list at the CU level if the current CU uses the AMVP mode.

2) If the current CU uses the merge mode, all merge candidate MVs are included.

3) And in the interpolation motion vector field, the number of the MVs in the interpolation motion vector field can be 4, and the four interpolated MVs are optionally respectively positioned at the (0,0), (W/2,0), (0, H/2) and (W/2, H/2) positions of the current CU.

4) Adjacent MVs above and to the left.

Optionally, in the candidate list of AMVP mode (the length of the list is optionally 2), the establishing procedure may include establishing a spatial domain list and establishing a temporal domain list.

In the spatial list of AMVP, it is assumed that the bottom left corner of the current PU is a0, the left side is a1, the top left corner is B2, the top is B1, and the top right corner is B0. The left and top sides of the current PU may each generate one candidate MV. For the left candidate MV screening, the processing order is a0- > a1- > scaled a0- > scaled a1, where scaled a0 indicates scaling the MV at a0 and scaled a1 indicates scaling the MV at a 1. For the screening of candidate MVs on the top side, the order of processing is B0- > B1- > B2 (if none of these exist, processing is continued- > scaled B0- > scaled B2), scaled B0 indicates scaling the MV of B0, and scaled B2 indicates scaling the MV of B2. For the left side (top), as long as one candidate MV is found, the following candidates are not processed further. And in the establishment of the time domain list of the AMVP, the time domain candidate list may not directly use the motion information of the candidate block, and may perform corresponding scaling adjustment according to the time domain position relationship between the current frame and the reference frame. The time domain may provide at most one candidate MV. If the number of candidate MVs of the candidate list is less than 2 at this time, the zero vector may be padded.

Optionally, in the candidate list of AMVP mode (the length of the list is optionally 5), the establishing procedure may include establishing a spatial domain list and establishing a temporal domain list.

In the building of the spatial domain list in merge mode, it is assumed that the lower left corner of the current PU is a0, the left side is a1, the upper left corner is B2, the upper side is B1, and the upper right corner is B0. The spatial domain can provide 4 candidate MVs at most, the order of candidates is a1- > B1- > B0- > a0- > B2, the first four are processed preferentially, and if one or more of the first four are not present, then B2 is processed. In the establishment of the time domain list in the merge mode, the time domain candidate list cannot directly use the motion information of the candidate block, and corresponding telescopic adjustment can be performed according to the position relationship between the current frame and the reference frame. The time domain can provide at most one candidate MV, which means that zero vectors can be filled if the number of MVs in the list has not reached five after the spatial and time domains have been processed.

In other words, the merge candidate MVP may be selected by traversing MVs of neighboring CUs in the spatial domain according to the order of left- > upper-right- > lower-left > upper-left, then processing predicted MVs referred to in the temporal domain, and finally sorting and combining.

And in the process of deriving the motion information based on the Sub-CU level, the MV obtained based on the CU level is used as a starting point, and the motion information is further refined at the Sub-CU level. Wherein, the MV refined at Sub-CU level is the MV of the whole CU, and the MV candidate list based on Sub-CU level may include:

1) based on the resulting MVs at the CU level.

2) The MV adjacent to the CU level based MV at the top, left, top left, and top right.

3) And obtaining the scaled MV of the corresponding temporal neighboring CU in the reference frame, wherein the scaled MV of the corresponding temporal neighboring CU in the reference frame can be obtained as follows: all the reference frames in the two reference lists are traversed once, and the MVs of the CUs adjacent to the Sub-CU time domain in the reference frames are scaled to the reference frames where the MVs obtained based on the CU levels are located.

4) Up to 4 optional temporal motion vector prediction (ATMVP) candidate MVs, where ATMVP allows each CU to derive multiple sets of motion information from multiple blocks in the reference frame that are smaller than the current CU size.

5) Up to 4 Spatial Temporal Motion Vector Prediction (STMVP) candidates MVs, wherein in STMVP the motion vector of a sub-CU is obtained by reusing the temporal prediction motion vector and the spatial neighboring motion vectors.

Alternatively, the above implementation method B and implementation method C may be used for obtaining the MV at the CU level, and may also be used for obtaining the MV at the sub-CU level.

For a clearer understanding of PMMVD techniques, reference will be made to fig. 8.

At 410, it is determined whether the current CU employs the Merge mode, and if not, the AMVP mode (not shown) is employed.

In 420 it is determined whether the current CU uses a bi-directional matching method, if yes, 431 is performed, if no, 441 is performed.

In 431, a MV candidate list is generated.

At 432, an optimal MV is selected from the candidate list, wherein a bidirectional matching method may be used for preference, which may specifically refer to the description in the above implementation C.

At 433, a local search is performed around the optimal MV and the optimal MV is further refined. Specifically, the optimal MV may be shifted to obtain a plurality of initial MVs, and one MV is selected from the plurality of initial MVs, wherein a bidirectional matching method may be used for preference, which may specifically refer to the description in the implementation C.

At 434, if MV is obtained at CU level, MV can be further refined at sub-CU level by using the bi-directional matching method in implementation C above.

At 441, an MV candidate list is generated.

At 442, an optimal MV is selected from the candidate list, wherein a template matching method may be used to select the optimal MV, which may specifically refer to the description in the above implementation B.

At 443, a local search is performed around the optimal MV, which is further refined. Specifically, the optimal MV may be shifted to obtain a plurality of initial MVs, and one MV is selected from the plurality of initial MVs, wherein the optimal MV may be selected by using a template matching method, which may specifically refer to the description in the implementation B.

At 444, if the MV at CU level is obtained, the MV can be further refined at sub-CU level by using the template matching method in implementation B.

It can be seen that the data sampling method for deriving a Decoder Motion Vector Refinement (DMVR) technique and Pattern-matched Motion Vector Derivation (PMMVD) according to the embodiment of the present application can greatly reduce hardware resource consumption and space occupation in a decoder, and at the same time, only brings about a small coding performance loss.

Therefore, in the embodiment of the present application, in the process of obtaining the motion vector MV of the current image block, the reconstructed image is down-sampled before the reconstructed image block for matching is matched, and then the matching cost is calculated after the down-sampling, so that the amount of processed data can be reduced, and the hardware resource consumption and the occupied space are greatly reduced.

Fig. 9 is a schematic block diagram of an apparatus 500 for video processing according to an embodiment of the present application. The apparatus 500 comprises:

a down-sampling unit 510, configured to down-sample reconstructed image data before matching a reconstructed image block for matching in a process of obtaining a motion vector of a current image block;

a matching unit 520, configured to perform matching using the down-sampled reconstructed image data of the reconstructed image block to obtain a matching result;

an obtaining unit 530, configured to obtain a motion vector of the current image block based on the matching result.

Optionally, in this embodiment of the present application, the apparatus 500 is used at a decoding end, and the apparatus 500 further includes:

a decoding unit, configured to decode the current image block based on the motion vector of the current image block.

Optionally, the apparatus 500 is used for an encoding end, and the apparatus 500 further includes:

and an encoding unit for encoding the current image block based on the motion vector of the current image block.

Optionally, in this embodiment of the present application, the down-sampling unit 510 is further configured to:

determining the reconstructed image block for matching;

downsampling the reconstructed image data for the reconstructed image block.

down-sampling the reconstructed image data of the reconstructed image block according to the content of the reconstructed image block.

and performing down-sampling on the reconstructed image data of the reconstructed image block according to at least one of the number of pixels, the gray scale of the pixels and the edge characteristics of the reconstructed image block.

determining a down-sampling proportion according to at least one of the number of pixels, the gray level of the pixels and the edge characteristics of the reconstructed image block;

down-sampling the reconstructed image data of the reconstructed image block using the down-sampling ratio.

determining that the reconstructed image block includes a number of pixels greater than or equal to a first predetermined value; and/or the presence of a gas in the gas,

determining that the variance of the gray level histogram of the reconstructed image block is greater than or equal to a second predetermined value; and/or

It is determined that the number of pixels belonging to an edge point of the texture among the pixels comprised by the reconstructed image block is greater than or equal to a third predetermined value.

down-sampling the reconstructed image data by using a sampling mode of pixels with the same number of intervals; or the like, or, alternatively,

the reconstructed image data is down-sampled by averaging a plurality of pixels.

Optionally, in an embodiment of the present application, the reconstructed image block for matching includes at least two reconstructed image blocks;

the down-sampling unit 510 is further configured to:

the reconstructed image data of the at least two reconstructed image blocks is down-sampled at the same sampling scale.

Optionally, in this embodiment of the application, the obtaining unit 530 is further configured to:

and correcting the initial motion vector of the current image block based on the matching result to obtain the motion vector of the current image block.

acquiring an initial motion vector corresponding to a current image block;

for the initial motion vector, the reconstructed image block for matching is determined.

Optionally, in this embodiment of the present application, the initial motion vector includes a first motion vector and a second motion vector;

the matching unit 520 is further configured to:

generating a template based on the downsampled reconstructed image data of a first reconstructed image block and the downsampled reconstructed image data of a second reconstructed image block, wherein the first reconstructed image block corresponds to the first motion vector and belongs to a first frame, and the second reconstructed image block corresponds to the second motion vector and belongs to a second frame;

and matching based on the template and the reconstructed image data after the down sampling to obtain a matching result.

Optionally, in this embodiment of the present application, the matching unit 520 is further configured to:

matching the reconstructed image data after down-sampling of N third reconstructed image blocks with the template respectively, wherein the N third reconstructed image blocks correspond to N third motion vectors and belong to the first frame;

matching the reconstructed image data after the down-sampling of M fourth reconstructed image blocks with the template respectively, wherein the M fourth reconstructed image blocks correspond to M fourth motion vectors and belong to the second frame;

the obtaining unit 530 is further configured to:

based on the matching result, selecting a third motion vector from the N third motion vectors and selecting a fourth motion vector from the M fourth motion vectors, the third motion vector and the fourth motion vector being used as motion vectors of the current image block or for determining motion vectors of the current image block.

Optionally, in this embodiment of the present application, the third motion vector includes the first motion vector, and the fourth motion vector includes the second motion vector.

Optionally, in this embodiment of the present application, at least a part of the N third motion vectors is obtained by shifting based on the first motion vector, and at least a part of the M fourth motion vectors is obtained by shifting based on the second motion vector.

Optionally, in this embodiment of the present application, N is equal to M.

Optionally, in this embodiment of the present application, the first frame is a forward frame of the current image block, and the second frame is a backward frame of the current image block; or the like, or, alternatively,

the first frame is a forward frame of the current image block, and the second frame is a forward frame of the current image block.

Optionally, in this embodiment of the present application, the initial motion vector includes K fifth motion vectors, and the matching unit 520 is further configured to:

matching the down-sampled reconstructed image data of the adjacent reconstructed image block of the K fifth reconstructed image blocks with the down-sampled reconstructed image data of the adjacent reconstructed image block of the current image block respectively to obtain a matching result, wherein the K fifth reconstructed image blocks correspond to the K fifth motion vectors one by one;

the obtaining unit 530 is further configured to:

and based on the matching result, selecting one fifth motion vector from the K fifth motion vectors as the motion vector of the current image block or used for determining the motion vector of the current image block.

Optionally, in this embodiment of the present application, the initial motion vector includes W sixth motion vectors;

the matching unit 520 is further configured to:

for two reconstructed image blocks corresponding to each motion vector pair of the W motion vector pairs, matching the downsampled reconstructed image data of one of the reconstructed image blocks with the downsampled reconstructed image data of the other reconstructed image block to obtain the matching result, wherein each motion vector pair includes a sixth motion vector and a seventh motion vector determined based on the sixth motion vector;

the obtaining unit 530 is further configured to:

and selecting one motion vector pair based on the matching results corresponding to the W motion vector pairs, wherein the sixth motion vector in the selected motion vector pair is used as the motion vector of the current image block or is used for determining the motion vector of the current image block.

Alternatively, in the embodiment of the present application, the seventh motion vector is determined based on the sixth motion vector on the assumption that the motion trajectory is continuous.

Optionally, the device 500 may implement the operation of the processing device in the above method, and for brevity, will not be described herein again.

It should be understood that the apparatus for video processing in the embodiment of the present application may be a chip, which may be specifically implemented by a circuit, but the embodiment of the present application does not limit a specific implementation form.

The embodiment of the present application further provides an encoder, where the encoder is configured to implement the function of an encoding end in the embodiment of the present application, and the encoder may include the module for the encoding end in the apparatus for video processing in the embodiment of the present application.

The embodiment of the present application further provides a decoder, where the decoder is configured to implement the function of a decoding end in the embodiment of the present application, and the decoder may include the module for the decoding end in the apparatus for video processing in the embodiment of the present application.

The embodiment of the present application further provides a codec, which includes the above-mentioned apparatus for video processing according to the embodiment of the present application.

FIG. 10 shows a schematic block diagram of a computer system 600 of an embodiment of the present application.

As shown in fig. 10, the computer system 600 may include a processor 610 and a memory 620.

It should be understood that the computer system 600 may also include other components commonly included in computer systems, such as input/output devices, communication interfaces, etc., which are not limited in this application.

The memory 620 is used to store computer executable instructions.

The Memory 620 may be various types of memories, and may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory, which is not limited in this embodiment of the present invention.

The processor 610 is configured to access the memory 620 and execute the computer-executable instructions to perform the operations of the method for video processing of the embodiments of the present application described above.

The processor 610 may include a microprocessor, a Field-Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and the like, which is not limited in the embodiments.

The apparatus and the computer system for video processing in the embodiments of the present application may correspond to an execution main body of the method for video processing in the embodiments of the present application, and the above and other operations and/or functions of each module in the apparatus and the computer system for video processing are respectively for implementing corresponding flows of the foregoing methods, and are not described herein again for brevity.

The embodiment of the present application also provides an electronic device, which may include the device or the computer system for video processing according to the various embodiments of the present application.

The embodiment of the present application further provides a computer storage medium, in which a program code is stored, where the program code may be used to instruct to perform the loop filtering method according to the embodiment of the present application.

It should be understood that, in the embodiment of the present application, the term "and/or" is only one kind of association relation describing an associated object, and means that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for video processing, comprising:

based on the matching result, obtaining a motion vector of the current image block;

wherein the method further comprises: acquiring an initial motion vector corresponding to a current image block; determining the reconstructed image block for matching aiming at the initial motion vectors, wherein the number of the initial motion vectors is multiple, and the multiple initial motion vectors belong to different frames respectively;

the matching with the down-sampled reconstructed image data of the reconstructed image block comprises: generating a template based on the reconstructed image data after down-sampling of the reconstructed image block corresponding to the plurality of initial motion vectors, or the down-sampling of the reconstructed image data comprises: and generating a template based on the non-downsampling reconstructed image data of the reconstructed image block corresponding to the plurality of initial motion vectors, and downsampling the template.

2. The method of claim 1, wherein the method is used at a decoding end, and wherein the method further comprises:

and decoding the current image block based on the motion vector of the current image block.

3. The method of claim 1, wherein the method is used at an encoding end, and wherein the method further comprises:

and encoding the current image block based on the motion vector of the current image block.

4. The method of any of claims 1 to 3, wherein the downsampling the reconstructed image data comprises:

determining the reconstructed image block for matching;

downsampling the reconstructed image data of the reconstructed image block.

5. The method of claim 4, wherein the downsampling the reconstructed image data of the reconstructed image block comprises:

and according to the content of the reconstructed image block, performing down-sampling on the reconstructed image data of the reconstructed image block.

6. The method of claim 5, wherein the downsampling the reconstructed image data of the reconstructed image block according to the content of the reconstructed image block comprises:

and performing down-sampling on the reconstructed image data of the reconstructed image block according to at least one of the number of pixels, the gray level of the pixels and the edge characteristics of the reconstructed image block.

7. The method of claim 6, wherein the down-sampling the reconstructed image data of the reconstructed image block according to at least one of a number of pixels, a pixel gray scale, and an edge feature included in the reconstructed image block comprises:

and utilizing the down-sampling proportion to down-sample the reconstructed image data of the reconstructed image block.

8. The method of any of claims 1-3, 5-7, wherein prior to said downsampling said reconstructed image data of said reconstructed image block, said method further comprises:

Determining that the number of pixels belonging to an edge point of a texture among pixels included in the reconstructed image block is greater than or equal to a third predetermined value.

9. The method of any of claims 1 to 3, 5 to 7, wherein the downsampling the reconstructed image data comprises:

downsampling the reconstructed image data by averaging a plurality of pixels.

10. The method according to any of claims 1 to 3, 5 to 7, wherein the reconstructed image blocks for matching comprise at least two reconstructed image blocks;

the downsampling the reconstructed image data, comprising:

down-sampling the reconstructed image data of the at least two reconstructed image blocks at the same sampling scale.

11. The method according to any of claims 1 to 3 and 5 to 7, wherein the obtaining a motion vector of the current image block based on the matching result comprises:

12. The method of claim 1, wherein the initial motion vector comprises a first motion vector and a second motion vector;

the matching by using the down-sampled reconstructed image data of the reconstructed image block includes:

13. The method of claim 12, wherein the matching based on the template and the downsampled reconstructed image data to obtain a matching result comprises:

the modifying the initial motion vector based on the matching result includes:

and based on the matching result, selecting a third motion vector from the N third motion vectors and selecting a fourth motion vector from the M fourth motion vectors, wherein the third motion vector and the fourth motion vector are used as motion vectors of the current image block or used for determining the motion vector of the current image block.

14. The method of claim 13, wherein the third motion vector comprises the first motion vector and the fourth motion vector comprises the second motion vector.

15. The method according to claim 13 or 14, wherein at least some of the N third motion vectors are offset based on the first motion vector, and wherein at least some of the M fourth motion vectors are offset based on the second motion vector.

16. The method of claim 15, wherein N is equal to M.

17. The method according to claim 13 or 14, wherein the first frame is a forward frame of the current tile, and the second frame is a backward frame of the current tile; or the like, or, alternatively,

18. The method of claim 11, wherein the initial motion vectors comprise K fifth motion vectors, and wherein the matching using the downsampled reconstructed image data for the reconstructed image block comprises:

matching the reconstructed image data subjected to downsampling of an adjacent reconstructed image block of K fifth reconstructed image blocks with the reconstructed image data subjected to downsampling of the adjacent reconstructed image block of the current image block respectively to obtain a matching result, wherein the K fifth reconstructed image blocks correspond to the K fifth motion vectors in a one-to-one mode;

the obtaining of the motion vector of the current image block based on the matching result includes:

and based on the matching result, selecting one fifth motion vector from the K fifth motion vectors as the motion vector of the current image block or determining the motion vector of the current image block.

19. The method of claim 11, wherein the initial motion vector comprises W sixth motion vectors;

the matching using the down-sampled reconstructed image data of the reconstructed image comprises:

for two reconstructed image blocks corresponding to each motion vector pair of the W motion vector pairs, matching the downsampled reconstructed image data of one of the reconstructed image blocks with the downsampled reconstructed image data of the other reconstructed image block to obtain a matching result, where each motion vector pair includes a sixth motion vector and a seventh motion vector determined based on the sixth motion vector;

20. The method according to claim 19, wherein the seventh motion vector is determined based on the sixth motion vector on the assumption that the motion trajectory is continuous.

21. The method according to claim 19 or 20, wherein a sixth reconstructed image block belongs to a forward frame of a frame to which said current image block belongs, said sixth reconstructed image block corresponding to said sixth motion vector, a seventh reconstructed image block belongs to a backward frame of a frame to which said current image block belongs, said seventh reconstructed image block corresponding to said seventh motion vector.

22. An apparatus for video processing, comprising:

the obtaining unit is used for obtaining a motion vector of the current image block based on the matching result;

acquiring an initial motion vector corresponding to a current image block; determining the reconstructed image block for matching aiming at the initial motion vectors, wherein the number of the initial motion vectors is multiple, and the multiple initial motion vectors belong to different frames respectively;

the matching unit is further configured to: generating a template based on the down-sampled reconstructed image data of the reconstructed image block corresponding to the plurality of initial motion vectors; or, the down-sampling unit is further configured to: and generating a template based on the non-downsampling reconstructed image data of the reconstructed image block corresponding to the plurality of initial motion vectors, and downsampling the template.

23. The apparatus of claim 22, wherein the apparatus is for a decoding side, and wherein the apparatus further comprises:

and the decoding unit is used for decoding the current image block based on the motion vector of the current image block.

24. The apparatus of claim 22, wherein the apparatus is for an encoding side, the apparatus further comprising:

and the encoding unit is used for encoding the current image block based on the motion vector of the current image block.

25. The apparatus of any of claims 22 to 24, wherein the down-sampling unit is further configured to:

determining the reconstructed image block for matching;

downsampling the reconstructed image data of the reconstructed image block.

26. The apparatus of claim 25, wherein the downsampling unit is further configured to:

27. The apparatus of claim 26, wherein the downsampling unit is further configured to:

28. The apparatus of claim 27, wherein the downsampling unit is further configured to:

29. The apparatus of any of claims 22 to 24, 26 to 28, wherein the down-sampling unit is further configured to:

30. The apparatus of any of claims 22 to 24, 26 to 28, wherein the down-sampling unit is further configured to:

downsampling the reconstructed image data by averaging a plurality of pixels.

31. The apparatus according to any of claims 22 to 24, 26 to 28, wherein said reconstructed image blocks for matching comprise at least two reconstructed image blocks;

the down-sampling unit is further configured to:

32. The apparatus according to any one of claims 22 to 24, 26 to 28, wherein the obtaining unit is further configured to:

33. The apparatus of claim 22, wherein the initial motion vector comprises a first motion vector and a second motion vector;

the matching unit is further configured to:

34. The apparatus of claim 33, wherein the matching unit is further configured to:

the acquisition unit is further configured to:

35. The apparatus of claim 34, wherein the third motion vector comprises the first motion vector and the fourth motion vector comprises the second motion vector.

36. The apparatus according to claim 34 or 35, wherein at least some of said N third motion vectors are offset based on said first motion vector, and wherein at least some of said M fourth motion vectors are offset based on said second motion vector.

37. The apparatus of claim 36, wherein N is equal to M.

38. The apparatus according to any of the claims 33 to 35, wherein said first frame is a forward frame of said current tile and said second frame is a backward frame of said current tile; or the like, or, alternatively,

39. The device of claim 32, wherein the initial motion vectors comprise K fifth motion vectors, and wherein the matching unit is further configured to:

the acquisition unit is further configured to:

40. The apparatus of claim 32, wherein the initial motion vector comprises W sixth motion vectors;

the matching unit is further configured to:

the acquisition unit is further configured to:

41. The apparatus according to claim 40, wherein the seventh motion vector is determined based on the sixth motion vector on the assumption that the motion trajectory is continuous.

42. The apparatus of claim 40 or 41, wherein a sixth reconstructed image block belongs to a forward frame of a frame to which the current image block belongs, the sixth reconstructed image block corresponds to the sixth motion vector, a seventh reconstructed image block belongs to a backward frame of the frame to which the current image block belongs, and the seventh reconstructed image block corresponds to the seventh motion vector.