WO2019191889A1 - 用于视频处理的方法和设备 - Google Patents

用于视频处理的方法和设备 Download PDF

Info

Publication number
WO2019191889A1
WO2019191889A1 PCT/CN2018/081651 CN2018081651W WO2019191889A1 WO 2019191889 A1 WO2019191889 A1 WO 2019191889A1 CN 2018081651 W CN2018081651 W CN 2018081651W WO 2019191889 A1 WO2019191889 A1 WO 2019191889A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstructed image
motion vector
image block
downsampled
image data
Prior art date
Application number
PCT/CN2018/081651
Other languages
English (en)
French (fr)
Inventor
马思伟
傅天亮
王苫社
郑萧桢
Original Assignee
北京大学
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学, 深圳市大疆创新科技有限公司 filed Critical 北京大学
Priority to PCT/CN2018/081651 priority Critical patent/WO2019191889A1/zh
Priority to CN201880012518.3A priority patent/CN110337810B/zh
Publication of WO2019191889A1 publication Critical patent/WO2019191889A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures

Definitions

  • the present application relates to the field of video processing and, more particularly, to a method and apparatus for video processing.
  • Prediction is an important module of the mainstream video coding framework, in which inter prediction is implemented by motion compensation.
  • a frame of video in the video it can be divided into equal-sized Coding Tree Units (CTUs), such as 64x64 and 128x128.
  • CTUs Coding Tree Units
  • Each CTU may be further divided into square or rectangular coding units (CUs), and the most similar block may be found in the reference frame for each CU as the prediction block of the current CU.
  • the relative displacement between the current block and the similar block is a motion vector (Motion Vector, MV).
  • MV Motion Vector
  • the process of finding a similar block in the reference frame as the predicted value of the current block is motion compensation.
  • Deriving the motion information technology at the decoding end is a new technology that has recently appeared. It is mainly used to correct the decoded motion vector at the decoding end. When the code rate is not increased, the coding quality can be improved, and the performance of the encoder is improved.
  • the embodiment of the present application provides a method and device for video processing, which can reduce hardware resource consumption and occupied storage space in the process of acquiring a motion vector.
  • a method for video processing comprising:
  • the reconstructed image data is downsampled before the matched reconstructed image block is matched;
  • an apparatus for video processing comprising:
  • a downsampling unit configured to downsample the reconstructed image data before performing matching on the reconstructed image block for matching in the process of acquiring the motion vector of the current image block;
  • a matching unit configured to perform matching by using the downsampled image data of the reconstructed image block to obtain a matching result
  • an acquiring unit configured to acquire a motion vector of the current image block based on the matching result.
  • a computer system comprising: a memory for storing computer executable instructions; a processor for accessing the memory and executing the computer executable instructions to perform the method of the first aspect above operating.
  • a computer storage medium having stored therein program code, the program code being operative to indicate a method of performing the first aspect described above.
  • a computer program product comprising program code, the program code being operative to indicate a method of performing the first aspect described above.
  • the reconstructed image is downsampled, and then downsampled.
  • the calculation of the matching cost can reduce the amount of data processed, thereby reducing hardware resource consumption and occupied storage space in the data processing process.
  • FIG. 1 is a schematic diagram of a codec system in accordance with an embodiment of the present application.
  • FIG. 2 is a schematic flow chart of a method for video processing according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for video processing according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of acquiring a bidirectional template according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of acquiring a motion vector based on a bidirectional template matching method according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of acquiring a motion vector based on a template matching method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of acquiring a motion vector based on a bidirectional matching method according to an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for video processing according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an apparatus for video processing in accordance with an embodiment of the present application.
  • Figure 10 is a schematic block diagram of a computer system in accordance with an embodiment of the present application.
  • FIG. 1 is an architectural diagram of a technical solution to which an embodiment of the present application is applied.
  • system 100 can receive data to be processed 102 and process data to be processed 102 to produce processed data 108.
  • system 100 can receive data to be encoded, encode the data to be encoded to produce encoded data, or system 100 can receive the data to be decoded and decode the data to be decoded to produce decoded data.
  • components in system 100 may be implemented by one or more processors, which may be processors in a computing device or processors in a mobile device (eg, a drone).
  • the processor may be any type of processor, which is not limited in this embodiment of the present invention.
  • the processor may include an encoder, a decoder or a codec, and the like.
  • One or more memories may also be included in system 100.
  • the memory can be used to store instructions and data, such as computer executable instructions to implement the technical solution of the embodiments of the present invention, data to be processed 102, processed data 108, and the like.
  • the memory may be any kind of memory, which is not limited in this embodiment of the present invention.
  • the data to be encoded may include text, images, graphic objects, animation sequences, audio, video, or any other data that needs to be encoded.
  • the data to be encoded may include sensory data from sensors, which may be vision sensors (eg, cameras, infrared sensors), microphones, near field sensors (eg, ultrasonic sensors, radar), position sensors, temperature Sensors, touch sensors, etc.
  • the data to be encoded may include information from a user, such as biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA sampling, and the like.
  • the image when encoding each image, the image may be initially divided into a plurality of image blocks.
  • an image may be divided into a plurality of image blocks, which are referred to as macroblocks or Largest Coding Units (LCUs) in some coding standards.
  • the image block may or may not have any overlapping portions.
  • the image can be divided into any number of image blocks.
  • the image can be divided into an array of m x n image blocks.
  • the image block may have a rectangular shape, a square shape, a circular shape, or any other shape.
  • the image block can have any size, such as p x q pixels.
  • images of different resolutions can be encoded by first dividing the image into small pieces.
  • an image block is referred to as a macroblock, which may be 16 x 16 pixels in size, and for HEVC, an image block is referred to as a maximum coding unit, which may be 64 x 64 in size.
  • a macroblock which may be 16 x 16 pixels in size
  • a maximum coding unit which may be 64 x 64 in size.
  • Each image block can have the same size and/or shape. Alternatively, two or more image blocks may have different sizes and/or shapes.
  • an image block may also be not a macroblock or a maximum coding unit, but a portion containing one macroblock or maximum coding unit, or at least two complete macroblocks (or maximum coding units), or A portion including at least one complete macroblock (or maximum coding unit) and one macroblock (or maximum coding unit), or at least two complete macroblocks (or maximum coding units) and some macroblocks (or maximum coding units) )part.
  • the image blocks in the image data can be separately encoded.
  • the image in order to remove redundancy, can be predicted. Different images in the video can be predicted differently.
  • the image may be classified into an intra predicted image and an inter predicted image according to a prediction manner employed by the image, wherein the inter predicted image includes a forward predicted image and a bidirectional predicted image.
  • the I image is an intra prediction image, also referred to as a key frame;
  • the P image is a forward predicted image, that is, a P image or an I image that has been previously encoded is used as a reference image;
  • the B image is a bidirectional predicted image, that is, before and after The image is used as a reference image.
  • the encoding end encodes a plurality of images to generate a segment of a group of pictures (GOP), the GOP is composed of an I image, and a plurality of B images (or bidirectional predicted images) and / Or a group of images composed of P images (or forward predicted images).
  • the decoder When the decoder is playing, it reads a segment of the GOP and decodes it to read the picture and then render the display.
  • the most similar block when performing inter prediction, the most similar block may be found as a prediction block of the current image block in the reference frame (generally the reconstructed frame near the time domain) for each image block.
  • the relative displacement between the current block and the predicted block is a motion vector (Motion Vector, MV).
  • motion information may not be transmitted in the code rate, thereby requiring the decoding end to derive motion information, that is, motion vector.
  • motion information that is, motion vector.
  • the data throughput may be too large, which will cause the decoder to occupy a large amount of hardware resources and space.
  • the embodiment of the present application provides a method for video processing, which can reduce the amount of data that needs to be processed when the decoding end derives motion information, thereby avoiding the problem that the decoding end occupies a large amount of hardware resources and space.
  • the method in the embodiment of the present application is used in the encoding end, the hardware resources and space occupied by the encoding end can be reduced.
  • FIG. 2 is a schematic flow chart of a method for video processing according to an embodiment of the present application.
  • the following method may alternatively be implemented by the decoding side or may also be implemented by the encoding side.
  • the current image block mentioned below may be an image block to be decoded (also referred to as an image block to be reconstructed).
  • the current image block mentioned below may be an image block to be encoded.
  • the processing device downsamples the reconstructed image data before acquiring the reconstructed image block for matching in the process of acquiring the motion vector MV of the current image block.
  • the processing device may be a device at the encoding end or a device at the decoding end.
  • the MV of the current image block can be understood as the MV between the current image block and the selected prediction block.
  • the reconstructed image block may also be referred to as a reference block.
  • downsampling the reconstructed image data may be implemented by the following two implementation manners.
  • the reconstructed image data is downsampled by sampling a number of pixels.
  • the sampling manner of a certain number of pixels may be separated by a certain number in the horizontal direction and the vertical direction.
  • the downsampled object is a block with a reconstructed image block of 128 ⁇ 128, some of the columns or pixels of some rows may be taken as the downsampled reconstructed image block.
  • the reconstructed image data may be downsampled using a sampling manner that is spaced apart by the same number of pixels.
  • the sampling manner of the same number of pixels may be referred to as using the same number of pixels in the horizontal direction and/or the vertical direction.
  • the downsampled object is a reconstructed image block
  • the horizontal and vertical intervals of the reconstructed image block are downsampled by 2, and the pixel point in the upper left corner can be taken as a downsampling result;
  • the remaining three points of the four pixels are used as downsampled results.
  • the horizontal direction interval of the reconstructed image block is 2 downsampled, and the vertical direction is not downsampled.
  • the vertical direction interval of the reconstructed image block is 2 downsampled, and the horizontal direction is not downsampled.
  • the reconstructed image data is downsampled by averaging a plurality of pixels.
  • the plurality of pixels may be adjacent pixels.
  • the reconstructed image block may be downsampled by averaging the pixels of the four pixels.
  • four pixels may be adjacent pixels, for example, may be pixels in a 2 ⁇ 2 image block.
  • the downsampled reconstructed image data may include downsampled reconstructed image data for the matched reconstructed image block.
  • the entire frame image to which the reconstructed image block for matching belongs may be downsampled, that is, when the downsampling is performed, the reconstructed image blocks are not distinguished.
  • the downsampled reconstructed image data may include reconstructed image data for the matched reconstructed image block.
  • the reconstructed image block for matching may be determined and the determined reconstructed image block may be downsampled.
  • the reconstructed image data of the reconstructed image block is downsampled according to the content of the reconstructed image block.
  • downsampling the reconstructed image data of the reconstructed image block may be referred to as downsampling the reconstructed image block.
  • the processing device may determine a ratio of downsampling according to the content of the reconstructed image block; and use the downsampling ratio to downsample the reconstructed image data of the reconstructed image block.
  • the downsampling ratio mentioned in the embodiment of the present application may refer to a ratio between the number of pixels included in the downsampled image block and the number of pixels included in the image block before sampling.
  • the complexity of the reconstructed image block is high, the sampling interval is small (that is, the downsampling ratio is large), and the image block complexity is low, the sampling interval is large (that is, the downsampling ratio is small), thereby according to the image content.
  • Adaptive downsampling reduces the performance penalty caused by data sampling.
  • the content of the reconstructed image block mentioned in the embodiment of the present application may include at least one of a number of pixels, a pixel gray level, and an edge feature included in the reconstructed image block.
  • the processing device may determine a downsampling ratio according to at least one of a number of pixels, a pixel grayscale, and an edge feature included in the reconstructed image block; and use the downsampling ratio to decrease the reconstructed image block. sampling.
  • the pixel gray level of the reconstructed image block may be represented by the variance of the gray histogram of the reconstructed image block.
  • the edge feature of the reconstructed image block may be represented by the number of pixels of the edge points belonging to the texture among the pixels included in the reconstructed image block.
  • the reconstructed image block used for matching when the reconstructed image block used for matching includes at least two reconstructed image blocks, the at least two reconstructed image blocks are at the same downsampling ratio.
  • the reconstructed image data is downsampled.
  • the same downsampling ratio may be used to reconstruct the reconstructed image blocks of at least two reconstructed image blocks.
  • the image data is downsampled.
  • the different downsampling ratios may be averaged, and the average value is used to downsample the at least two reconstructed image blocks, or the highest downsampling ratio or the lowest downsampling ratio may be used.
  • the reconstructed image data of the at least two reconstructed image blocks is downsampled.
  • the values of the pixel gradations characterizing the at least two reconstructed image blocks and/or the values characterizing the edge features of the at least two reconstructed image blocks are different, the values may be averaged (if the pixels are characterized)
  • the value of the gray scale and the value of the edge feature are simultaneously used, and the values of the gray level of the pixel and the value of the edge feature are respectively averaged, and a downsampling ratio is calculated by using the averaged value, and the one is used.
  • the sampling ratio respectively downsamples the reconstructed image data of the at least two reconstructed image blocks; or, the maximum value of the values may be taken (if the value of the gray level of the pixel and the value of the edge feature are used simultaneously) , then the maximum value in the value of the gray level of the pixel and the maximum value in the value representing the edge feature or the minimum value may be taken (if the value representing the gray level of the pixel and the value characterizing the edge feature are simultaneously adopted, then Determining the minimum of the values of the pixel grayscale and taking the minimum of the values characterizing the edge features), calculating a downsampling ratio, and using the one downsampling ratio, respectively At least two reconstructed image block data reconstructed image downsampling.
  • the reconstructed image block used for matching may be the same number of pixels as the current image block, and then the down sampling is determined according to the number of pixels included in the reconstructed image block for matching.
  • the ratio can be achieved by determining the downsampling ratio based on the number of pixels included in the current image block.
  • the processing device determines to downsample the reconstructed image block in the matching process:
  • the reconstructed image block includes a number of pixels greater than or equal to a first predetermined value
  • the variance of the gray histogram of the reconstructed image block is greater than or equal to a second predetermined value
  • the number of edge pixels belonging to the texture among the pixels included in the reconstructed image block is greater than or equal to a third predetermined value.
  • the reconstructed image block is downsampled, otherwise the downsampling is not performed, thereby avoiding the problem of poor coding and decoding performance caused by blind downsampling.
  • the number of pixels included in each reconstructed image block, the variance of the grayscale histogram, and the included pixels in the texture may be The number of edge pixels satisfies the above conditions, or may be the average of the number of pixels included in the at least two reconstructed image blocks, the variance of the grayscale histogram, and the average of the number of edge pixels belonging to the texture among the included pixels. Meet the above conditions.
  • the reconstructed image block used for matching may be the same number of pixels as the current image block, and then it is determined according to the number of pixels included in the reconstructed image block used for matching.
  • the reconstructed image block is downsampled, which may be implemented by determining whether to downsample the reconstructed image block according to the number of pixels included in the current image block.
  • the content of the reconstructed image block is determined to be downsampled and downsampled according to the content of the reconstructed image block, but it should be understood that the embodiment of the present application is not limited thereto, and the processing device performs downsampling on the reconstructed image frame. During processing, it may also be determined whether to downsample and/or downsample the reconstructed image frame based on the content of the reconstructed image frame.
  • a downsampling ratio may be determined according to at least one of a number of pixels included in the reconstructed image frame, a pixel gray level, and an edge feature; and the reconstructed image frame is downsampled by using the downsampling ratio .
  • the reconstructed image frame includes a number of pixels greater than or equal to a specific value
  • the variance of the gray histogram of the reconstructed image frame is greater than or equal to a specific value
  • the number of edge pixels belonging to the texture among the pixels included in the reconstructed image frame is greater than or equal to a specific value.
  • the processing device performs matching using the downsampled reconstructed image data for the matched reconstructed image block to obtain a matching result.
  • the matching may also be referred to as distortion matching, and the matching result may be a matching cost obtained by performing distortion matching between the reconstructed image blocks.
  • the processing device acquires the MV of the current image block based on the matching result.
  • the processing device when the processing device is a device at the encoding end, the current image block may be encoded or reconstructed by using the MV.
  • the encoding end may use the reconstructed image block corresponding to the MV as a prediction block, and encode or reconstruct the current image block based on the prediction block.
  • the encoding end may directly use the pixel of the prediction block as the reconstructed pixel of the current image block.
  • This mode may be referred to as a skip mode, and the skip mode is characterized in that the reconstructed pixel value of the current image block may be equal to
  • the pixel value of the prediction block may be transmitted in the code stream when the encoding side adopts the skip mode, and is used to indicate to the decoding end that the mode adopted is the skip mode.
  • the encoding end may subtract the pixels of the current image block from the pixels of the prediction block to obtain a pixel residual, and transmit the pixel residual to the decoding end in the code stream.
  • the coded end may encode and reconstruct the current image block in other manners, which is not specifically limited in this embodiment of the present application.
  • the embodiment of the present application may be used in an Advanced Motion Vector Prediction (AMVP) mode, that is, the result obtained by performing the matching may be a motion vector prediction value (Motion Vector).
  • AMVP Advanced Motion Vector Prediction
  • MVP Motion Vector
  • the encoder can determine the starting point of the motion estimation according to the MVP, and perform motion search near the starting point. After the search is completed, the optimal MV is obtained, and the MV determines the reference block in the reference image. Position, the reference block is subtracted from the current block to obtain a residual block, and the MV is subtracted from the MVP to obtain a Motion Vector Difference (MVD), and the MVD is transmitted to the decoding end through the code stream.
  • MVP Motion Vector Difference
  • the implementation of the present application may be used in a Merge mode, that is, the result obtained by matching may be an MVP, and the encoding end may directly determine the MVP as an MV, in other words, The result of matching is MV.
  • the encoding end does not need to transmit the MVD after obtaining the MVP (ie, MV) because the MVD defaults to 0.
  • the current image block may be decoded by using the MV.
  • the decoding end may use the reconstructed image block corresponding to the MV as a prediction block, and decode the current image block based on the prediction block.
  • the decoding end may directly use the pixel of the prediction block as the pixel of the current image block.
  • This mode may be referred to as a skip mode, and the skip mode is characterized in that the reconstructed pixel value of the current image block may be equal to the prediction block.
  • the pixel value may be transmitted in the code stream when the encoding side adopts the skip mode, and is used to indicate to the decoding end that the mode adopted is the skip mode.
  • the decoding end may obtain the pixel residuals in the code stream transmitted from the encoding end, and add the pixels of the prediction block to the pixel residuals to obtain pixels of the current image block.
  • the current image block may be decoded in other manners, which is not specifically limited in this embodiment of the present application.
  • the embodiment of the present application may be used in the AMVP mode, that is, the result obtained by the matching may be an MVP, and the decoding end may combine the MVD in the code stream transmitted by the encoding end to obtain the current image.
  • the MV of the block may be used in the AMVP mode, that is, the result obtained by the matching may be an MVP, and the decoding end may combine the MVD in the code stream transmitted by the encoding end to obtain the current image.
  • the MV of the block may be used in the AMVP mode, that is, the result obtained by the matching may be an MVP, and the decoding end may combine the MVD in the code stream transmitted by the encoding end to obtain the current image.
  • the MV of the block may be used in the AMVP mode, that is, the result obtained by the matching may be an MVP, and the decoding end may combine the MVD in the code stream transmitted by the encoding end to obtain the current image.
  • the MV of the block may be used in the AMVP mode, that is,
  • the implementation of the present application may be used in a Merge mode, that is, the result obtained by matching may be MVP, and the decoding end may directly determine the MVP as an MV, in other words, The result of the matching is MV.
  • the initial MV of the current image block is corrected based on the matching result, and the MV of the current image block is obtained.
  • the processing device can obtain the initial MV, but the initial MV may not be the optimal MV or MVP, and the processing device may modify the initial MV to obtain the MV of the current image block.
  • the index of the initial MV may be encoded and passed to the decoding end, the index may cause the decoding end to select an initial MV from the initial MV list, wherein the index points to the following information: reference frame The index and the offset of the reference block relative to the current image block in the spatial domain, the decoding end can select the initial MV based on the information.
  • the initial MV may be obtained based on the code stream sent by the encoding end, and the code stream may include an index, and based on the index, the decoding end may obtain the initial MV.
  • the initial MV may include multiple initial MVs, which may belong to different frames respectively.
  • the frame to which the initial MV belongs refers to the frame to which the reconstructed image block corresponding to the MV belongs.
  • the frame to which the first MV belongs and the frame to which the second MV belongs are different frames.
  • the reconstructed image block corresponding to the first MV belongs to a forward frame of the current image block
  • the reconstructed image block corresponding to the second MV belongs to a backward frame of the current image block
  • the reconstructed image block corresponding to the first MV belongs to a forward frame of the current image block
  • the reconstructed image block corresponding to the second MV belongs to a forward frame of the current image block
  • the reconstructed image block corresponding to the first MV and the reconstructed image block corresponding to the second MV respectively belong to different backward frames of the current image block, which is not specifically limited in this embodiment of the present application.
  • the processing device may generate a template (for example, a method for averaging pixels) based on the downsampled reconstructed image data of the reconstructed image block corresponding to the plurality of initial MVs, and use the generated template to respectively The plurality of initial MVs are corrected.
  • a template for example, a method for averaging pixels
  • the template may also be generated using the unsampled reconstructed image data of the reconstructed image blocks corresponding to the plurality of initial MVs.
  • the template is down-sampled, which is not specifically limited in this embodiment of the present application.
  • the initial MV includes a first MV and a second MV
  • the reconstructed image block corresponding to the first MV is a first reconstructed image block belonging to the first frame
  • the reconstructed image block corresponding to the second MV belongs to a second reconstructed image block of the second frame, generating a template based on the downsampled reconstructed image data of the first reconstructed image block and the downsampled reconstructed image data of the second reconstructed image block .
  • the template can be called a two-way template.
  • the downsampled reconstructed image data (which may be referred to as N downsampled third reconstructed image blocks) of the N third reconstructed image blocks may be used to match the template respectively, where
  • the N third reconstructed image blocks correspond to N third MVs;
  • the downsampled reconstructed image data of the M fourth reconstructed image blocks (which may be referred to as M after downsampling)
  • the reconstructed image blocks are respectively matched with the template, wherein the M fourth reconstructed image blocks correspond to M fourth MVs; and based on the matching result, one of the N third MVs is selected Three MVs, and selecting a fourth MV from the M fourth MVs.
  • the selected third MV may be the MV corresponding to the minimum distortion cost.
  • the selected third MV may be an MV corresponding to a distortion cost less than a certain value.
  • the selected fourth MV may be the MV corresponding to the smallest distortion cost.
  • the selected fourth MV may be an MV corresponding to a distortion cost less than a certain value.
  • the first MV and the one fourth MV are used as the MV of the current image block.
  • the reconstructed image block corresponding to the one third MV and the one fourth MV may be Weighted average to get the prediction block
  • the one third MV and the one fourth MV may be used to determine an MV of the current image block, that is, the one third MV and the one fourth MV may be respectively used as MVPs.
  • the final MV can be obtained by performing motion search and motion compensation processes based on the third MVP and the fourth MVP, respectively.
  • the N third reconstructed image blocks may belong to the first frame
  • the M fourth reconstructed image blocks may belong to the second frame.
  • the N and M may be equal.
  • the third MV includes the first MV
  • the fourth MV includes the second MV, that is, the reconstructed image block corresponding to the first MV used to generate the template and the corresponding corresponding to the second MV Reconstructing image blocks also requires matching with the template separately.
  • At least part of the MVs of the N third MVs are obtained by performing offset based on the first MV, and at least part of the MVs of the M fourth MVs are based on the second MV. Obtained by the offset.
  • the MVs other than the first MV in the N third MVs may be obtained by performing offset based on the first MV.
  • N may be equal to 9, and 8 of the MVs may be biased based on the first MV.
  • the obtained for example, can be obtained by shifting in eight directions, or by shifting different pixels in the vertical or horizontal direction.
  • the MVs of the M fourth MVs other than the second MV may be obtained by performing offset based on the second MV.
  • M may be equal to 9, and 8 of the MVs may be based on the second MV.
  • the offset is obtained, for example, by shifting in eight directions or by shifting different pixels in the vertical or horizontal direction.
  • the method in implementation A can be referred to as the MV selection of the bidirectional template matching method.
  • the width and height of the size of the current image block are less than 8 pixels, respectively (of course, other numbers of pixels).
  • the reconstructed image blocks corresponding to MV0 and MV1 in the reference list 0 and the column reference table 1 are downsampled, and averaged to obtain a bidirectional template.
  • the MV in the reference list 0 may be a motion vector between the current image block and the reconstructed image block in the forward reference frame
  • the MV in the reference list 1 may be the current image block and the backward reference frame. Reconstruct the motion vector between image blocks.
  • the reference block 0 (the reconstructed image block) corresponding to MV0 and the reference block 1 (the reconstructed image block) corresponding to MV1 are downsampled, and then downsampled.
  • the next two reference blocks are averaged to obtain a bidirectional template after downsampling.
  • the downsampled reconstructed image block corresponding to MV0 in list 0 is matched to the template.
  • offsetting MV0 yields a plurality of MV0's.
  • the reconstructed image blocks corresponding to the plurality of MV0' are downsampled and matched with the template respectively.
  • the surrounding pixels of the reference block corresponding to MV0 may be downsampled.
  • the pixel values around the reference block corresponding to MV0 may be padded, the reference block corresponding to MV0' (the reference block after the offset) is obtained, and the offset reference block is downsampled.
  • the bidirectional template after downsampling and the reference block after downsampling are used.
  • MV0' with the smallest matching cost is obtained, wherein MV0' with the smallest matching cost may be MV0.
  • the downsampled reconstructed image block corresponding to MV1 in Listing 1 is matched with the template.
  • offsetting MV1 yields a plurality of MV1's.
  • the reconstructed image blocks corresponding to the plurality of MV1's are downsampled and matched with the template respectively.
  • MV1' with the smallest matching cost is obtained, wherein MV1' with the smallest matching cost may be MV1.
  • the surrounding pixels of the reference block corresponding to MV1 may be downsampled.
  • the pixel values around the reference block corresponding to MV1 may be padded, the reference block corresponding to MV1' (the reference block after the offset) is obtained, and the offset reference block is downsampled.
  • the bidirectional template after downsampling and the reference block after downsampling are used.
  • a prediction block is generated based on the reconstructed image blocks corresponding to MV0' and MV1' with the smallest matching cost.
  • the current image block is decoded based on the predicted block.
  • the above implementation manner A and its optional implementation may be implemented by a DMVR technology.
  • the processing device acquires an initial motion vector MV corresponding to the current image block; and for the initial MV, determines the reconstructed image block used for matching.
  • the initial MV may be the MV to be selected.
  • the MV to be selected may be referred to as an MV candidate list.
  • the initial MV includes K fifth MVs
  • the downsampled reconstructed image data of the adjacent reconstructed image blocks of the K fifth reconstructed image blocks and the neighboring of the current image block are reconstructed
  • the downsampled reconstructed image data of the image block is matched to obtain the matching result, wherein the K fifth reconstructed image blocks are in one-to-one correspondence with the K fifth MVs, and K is greater than or equal to 1
  • An integer based on the matching result selecting one of the K MVs from the K MVs.
  • the selected fifth MV may be the MV corresponding to the minimum distortion cost.
  • the selected fifth MV may be an MV corresponding to a distortion cost less than a particular value.
  • the selected one fifth MV can be used as the MV of the current image block.
  • the reconstructed image block corresponding to the one fifth MV may be used as the prediction block of the current image block.
  • the selected one of the fifth MVs can be used to determine the MV of the current image block.
  • the one fifth MV can be used as an MVP.
  • the motion search and motion compensation can be further performed according to the MVP to obtain the final MV.
  • the reconstructed image block corresponding to the optimized MV is used as a prediction block.
  • the first fifth MV is a Coding Unit (CU)-based MV mentioned below, and the MV can be used to determine a sub-CU (Sub-CU) level MV.
  • CU Coding Unit
  • Sub-CU sub-CU
  • the K fifth MVs may be referred to as an MV candidate list.
  • the adjacent reconstructed image block of the current image block may be referred to as a template of the current image block.
  • the implementation B can be referred to as MV selection based on the template matching method.
  • the adjacent reconstructed image block of the fifth reconstructed image block may include an upper neighboring block and/or a left neighboring block
  • the adjacent reconstructed image block of the current image block may include Neighbor blocks and/or left neighbor blocks.
  • the initial MV includes W sixth MVs, where W is an integer greater than or equal to 1; for each of the two reconstructed image blocks corresponding to each MV pair of the W MV pairs, one of the The downsampled reconstructed image data of the reconstructed image block is matched with the downsampled reconstructed image data of another of the reconstructed image blocks to obtain the matching result,
  • each MV pair includes a sixth MV and a seventh MV determined based on the sixth MV; and based on the matching result of the W MV pairs, an MV pair is selected.
  • the sixth MV of the selected MV pair is determined as the MV of the current image block.
  • the reconstructed image block corresponding to the sixth MV of the selected MV pair may be used as the prediction block of the current image block.
  • the sixth MV of the selected MV pair can be used to determine the MV of the current image block.
  • the sixth MV can be used as an MVP.
  • the motion search and motion compensation can be further performed according to the MVP to obtain the final MV.
  • the reconstructed image block corresponding to the final MV is used as a prediction block.
  • the first sixth MV is the CU-based MV mentioned below, and the MV can be used to determine the MV of the sub-CU level.
  • the seventh MV is determined based on the sixth MV under the assumption that the motion trajectory is continuous.
  • the W sixth MVs may be the MV candidate list.
  • the sixth reconstructed image block belongs to a forward frame of a frame of the current image block
  • the seventh reconstructed image block belongs to a frame of the current image block. To the frame.
  • a time domain distance between the sixth reconstructed image block and the current image block may be equal to a time domain distance between the current image block and the seventh reconstructed image block.
  • each of the sixth sixth MVs may be used as an input, and based on the assumption of the two-way matching method, an MV pair is obtained.
  • the reference block corresponding to one valid MVa in the MV candidate list belongs to the reference frame a in the reference list A, and the reference frame b in which the reference block corresponding to the MVb corresponding thereto is located in the reference list B, then the reference frame a And the reference frame b is located on both sides of the current frame in the time domain. If such a reference frame b does not exist in the reference list B, the reference frame b is a reference frame different from the reference frame a and its time domain distance from the current frame is the smallest in the reference list B.
  • the MVb is obtained by scaling the MVa based on the current frame and the time domain distance of the reference frame a and the reference frame b, respectively.
  • MV pairs may be separately generated according to each candidate MV, and distortion between two reference blocks corresponding to two MVs (MV0 and MV1) in each MV pair may be calculated.
  • both reference blocks may be downsampled, and then the two reference blocks after downsampling are calculated to be distorted.
  • the distortion is the smallest, the corresponding candidate MV (MV0) is the final MV.
  • implementation C can be referred to as MV selection based on the bidirectional matching method.
  • the foregoing implementation manners B and C may be used in the AMVP mode, and may also be used in the merge mode.
  • a pattern matching motion vector derivation technique may be adopted, where the PMMVD technology is based on a frame rate up conversion (Frame Rate Up). Conversion, FRUC)
  • FRUC Frame Rate Up Conversion
  • the encoding end can be selected in multiple encoding modes. Specifically, the normal merge mode encoding can be performed to obtain a minimum Rate Distortion Cost (RD-Cost), that is, cost0; then, using the PMMVD mode.
  • RD-Cost minimum Rate Distortion Cost
  • cost1 Cost1
  • the FRUC flag is false; otherwise, the FRUC flag is true, and an additional FRUC mode flag is used to indicate which method is used (bidirectional matching or template matching).
  • RD-Cost is a criterion used in the encoder to measure which mode is used in decision making, considering both the video quality and the coding rate.
  • RD-Cost Cost+lambda*bitrate, where cost represents the loss of video quality, by calculating the similarity between the original pixel block and the reconstructed pixel block (SAD, SSD, etc.); bitrate indicates the bit to be consumed by using the mode number.
  • the process of deriving the motion information of the FRUCmerge mode may be divided into two steps.
  • the first step is based on the CU-level motion information derivation process
  • the second step is based on the Sub-CU-level motion information derivation process.
  • the initial MV of the entire CU may be derived, that is, a CU-level MV candidate list, where the MV candidate list may include:
  • the original AMVP candidate MV is included. Specifically, if the current CU uses the AMVP mode, the original AMVP candidate MV may be added to the CU-level MV candidate list.
  • MV of the interpolated motion vector field can be four, and the four MVs of the interpolation are optionally located at (0, 0), (W/2, 0) of the current CU, respectively. ), (0, H/2) and (W/2, H/2) positions.
  • the establishment process may include establishment of an airspace list and establishment of a time domain list.
  • a candidate MV can be generated for each of the left and top of the current PU.
  • the processing order is A0->A1->scaled A0->scaled A1, where scaled A0 represents the scaling of the MV of A0, and scaled A1 represents the scaling of the MV of A1.
  • the order of processing is B0->B1->B2 (if these few are not present, then continue processing -> scaled B0->scaled B2)
  • scaled B0 means that the MV of B0 is performed
  • scaled B2 means scaling the MV of B2.
  • the time domain candidate list may not directly use the motion information of the candidate block, and may perform corresponding scaling adjustment according to the time domain position relationship between the current frame and the reference frame. The time domain can provide up to one candidate MV. If the number of candidate MVs of the candidate list is less than two at this time, the zero vector can be filled.
  • the establishment process may include establishment of an airspace list and establishment of a time domain list.
  • the airspace list of the merge mode In the establishment of the airspace list of the merge mode, it is assumed that the lower left corner of the current PU is A0, the left side is A1, the upper left corner is B2, the upper side is B1, and the upper right corner is B0.
  • the airspace can provide up to 4 candidate MVs.
  • the candidate order is A1->B1->B0->A0->B2, and the first four are processed first. If one or more of the first four do not exist, then B2 is processed. .
  • the time domain candidate list cannot directly use the motion information of the candidate block, and the corresponding telescopic adjustment can be performed according to the positional relationship between the current frame and the reference frame.
  • the time domain can provide up to one candidate MV, which means that if the number of MVs in the list has not reached five after processing the airspace and time domain, the zero vector can be filled.
  • the selection of the merge candidate MVP can traverse the MV of the adjacent CU on the airspace in the order of left->upper->upper right->lower left-left>upper left corner, and then process the predicted MV of the reference in the time domain, and finally Organize the merge.
  • the MV based on the CU level is used as a starting point, and the motion information is further refined at the Sub-CU level.
  • the MV that is refined at the Sub-CU level is the MV of the entire CU, and the MV candidate list based on the sub-CU level may include:
  • MV obtained after MV scaling of the corresponding time domain neighboring CU in the reference frame wherein the scaling MV of the corresponding time domain neighboring CU in the reference frame can be obtained as follows: all reference frames in the two reference lists Both are traversed, and the MV of the CU adjacent to the Sub-CU time domain in the reference frame is scaled to the reference frame in which the MV obtained based on the CU level is located.
  • ATMVP temporal motion vector prediction
  • the foregoing implementation method B and implementation C may be used for acquiring the MV of the CU level, and may also be used for acquiring the MV of the sub-CU level.
  • an MV candidate list is generated.
  • the optimal MV is selected from the candidate list, and the bidirectional matching method may be used to select the MV.
  • the bidirectional matching method may be used to select the MV.
  • a local search is performed around the optimal MV, and the optimal MV is further refined.
  • the optimal MV may be offset to obtain a plurality of initial MVs, and one of the plurality of initial MVs may be selected, wherein the two-way matching method may be used.
  • the two-way matching method may be used.
  • the bidirectional matching method in the above implementation C can be used, and the MV is further refined at the sub-CU level.
  • an MV candidate list is generated.
  • the optimal MV is selected from the candidate list, and the template matching method may be used to select the MV.
  • the template matching method may be used to select the MV.
  • a local search is performed around the optimal MV, and the optimal MV is further refined.
  • the optimal MV may be offset to obtain a plurality of initial MVs, and one of the plurality of initial MVs may be selected, wherein the template matching method may be used.
  • the template matching method may be used.
  • MV is further refined at the sub-CU level.
  • the data sampling method for deriving the Decode Motion Vector Refinement (DMVR) technology and the Pattern Matching Motion Vector Derivation (PMMVD) of the embodiment of the present application can greatly reduce the decoding thereof.
  • the reconstructed image is downsampled, and then downsampled.
  • the calculation of the matching cost can reduce the amount of data processed, and greatly reduce the hardware resource consumption and space occupied.
  • FIG. 9 is a schematic block diagram of an apparatus 500 for video processing in accordance with an embodiment of the present application.
  • the device 500 includes:
  • the downsampling unit 510 is configured to perform downsampling on the reconstructed image data before performing matching on the reconstructed image block for matching in the process of acquiring the motion vector of the current image block;
  • the matching unit 520 is configured to perform matching by using the downsampled reconstructed image data of the reconstructed image block to obtain a matching result
  • the obtaining unit 530 is configured to acquire a motion vector of the current image block based on the matching result.
  • the device 500 is used in a decoding end, and the device 500 further includes:
  • a decoding unit configured to decode the current image block based on a motion vector of the current image block.
  • the device 500 is used in an encoding end, and the device 500 further includes:
  • a coding unit configured to encode the current image block based on a motion vector of the current image block.
  • the downsampling unit 510 is further configured to:
  • the reconstructed image data of the reconstructed image block is downsampled.
  • the downsampling unit 510 is further configured to:
  • the reconstructed image data of the reconstructed image block is downsampled according to the content of the reconstructed image block.
  • the downsampling unit 510 is further configured to:
  • the reconstructed image data of the reconstructed image block is downsampled according to at least one of a number of pixels, a pixel grayscale, and an edge feature included in the reconstructed image block.
  • the downsampling unit 510 is further configured to:
  • the reconstructed image data of the reconstructed image block is downsampled using the downsampling ratio.
  • the downsampling unit 510 is further configured to:
  • the downsampling unit 510 is further configured to:
  • the reconstructed image data is downsampled by averaging a plurality of pixels.
  • the reconstructed image block for matching includes at least two reconstructed image blocks
  • the downsampling unit 510 is further configured to:
  • the reconstructed image data of the at least two reconstructed image blocks are downsampled according to the same sampling ratio.
  • the obtaining unit 530 is further configured to:
  • the initial motion vector of the current image block is corrected based on the matching result to obtain a motion vector of the current image block.
  • the obtaining unit 530 is further configured to:
  • the reconstructed image block for matching is determined for the initial motion vector.
  • the initial motion vector includes a first motion vector and a second motion vector
  • the matching unit 520 is further configured to:
  • Matching is performed based on the template and the downsampled reconstructed image data to obtain a matching result.
  • the matching unit 520 is further configured to:
  • the downsampled reconstructed image data of the N third reconstructed image blocks are respectively matched with the template, wherein the N third reconstructed image blocks correspond to N third motion vectors and Belongs to the first frame;
  • the obtaining unit 530 is further configured to:
  • the third motion vector includes the first motion vector
  • the fourth motion vector includes the second motion vector
  • At least part of the motion vectors of the N third motion vectors are obtained by performing offset based on the first motion vector, and at least part of the motion vectors of the M fourth motion vectors are Obtained based on the second motion vector.
  • the N is equal to the M.
  • the first frame is a forward frame of the current image block
  • the second frame is a backward frame of the current image block
  • the first frame is a forward frame of the current image block
  • the second frame is a forward frame of the current image block
  • the initial motion vector includes K fifth motion vectors
  • the matching unit 520 is further configured to:
  • the obtaining unit 530 is further configured to:
  • one of the K fifth motion vectors is selected as the motion vector of the current image block, or used to determine the motion vector of the current image block.
  • the initial motion vector includes W sixth motion vectors
  • the matching unit 520 is further configured to:
  • each motion vector pair includes a sixth motion vector and a seventh motion vector determined based on the sixth motion vector
  • the obtaining unit 530 is further configured to:
  • the seventh motion vector is determined based on the sixth motion vector under the assumption that the motion trajectory is continuous.
  • the sixth reconstructed image block belongs to a forward frame of a frame to which the current image block belongs
  • the seventh reconstructed image block belongs to a frame to which the current image block belongs. To the frame.
  • the device 500 can implement the operations of the processing device in the foregoing method.
  • the device 500 can implement the operations of the processing device in the foregoing method.
  • details are not described herein again.
  • the device for video processing in the foregoing embodiment of the present application may be a chip, which may be specifically implemented by a circuit, but the specific implementation manner is not limited in the embodiment of the present application.
  • the embodiment of the present application further provides an encoder, which is used to implement the function of the encoding end in the embodiment of the present application, and may include the module for the encoding end in the device for video processing in the foregoing embodiment of the present application.
  • the embodiment of the present application further provides a decoder, which is used to implement the function of the decoding end in the embodiment of the present application, and may include the module for the decoding end in the device for video processing in the foregoing embodiment of the present application.
  • the embodiment of the present application further provides a codec, which includes the device for video processing in the foregoing embodiment of the present application.
  • FIG. 10 shows a schematic block diagram of a computer system 600 of an embodiment of the present application.
  • the computer system 600 can include a processor 610 and a memory 620.
  • the computer system 600 may also include components that are generally included in other computer systems, such as input and output devices, communication interfaces, and the like, which are not limited by the embodiments of the present application.
  • Memory 620 is used to store computer executable instructions.
  • the memory 620 may be various types of memory, for example, may include a high speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory. This example is not limited to this.
  • RAM high speed random access memory
  • non-volatile memory such as at least one disk memory. This example is not limited to this.
  • the processor 610 is configured to access the memory 620 and execute the computer executable instructions to perform the operations in the method for video processing of the embodiments of the present application described above.
  • the processor 610 may include a microprocessor, a field-programmable gate array (FPGA), a central processing unit (CPU), a graphics processing unit (GPU), etc., and is implemented in the present application. This example is not limited to this.
  • the apparatus and computer system for video processing of the embodiments of the present application may correspond to an execution body of a method for video processing of the embodiments of the present application, and the above and other components of the video processing apparatus and various modules in the computer system
  • the operations and/or functions are respectively implemented in order to implement the corresponding processes of the foregoing various methods, and are not described herein for brevity.
  • the embodiment of the present application further provides an electronic device, which may include the device or computer system for video processing of the various embodiments of the present application described above.
  • the embodiment of the present application further provides a computer storage medium, where the program code is stored in the computer storage medium, and the program code may be used to indicate a method for performing loop filtering in the embodiment of the present application.
  • the term "and/or” is merely an association relationship describing an associated object, indicating that there may be three relationships.
  • a and/or B may indicate that A exists separately, and A and B exist simultaneously, and B cases exist alone.
  • the character "/" in this article generally indicates that the contextual object is an "or" relationship.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例提供一种用于视频处理的方法和设备,可以减少在获取运动矢量的过程中的硬件资源消耗和占用的存储空间。该方法包括:在获取当前图像块的运动矢量的过程中,对用于匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样;利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到匹配结果;基于所述匹配结果,获取所述当前图像块的运动矢量。

Description

用于视频处理的方法和设备
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及视频处理领域,并且更具体地,涉及一种用于视频处理的方法和设备。
背景技术
预测是主流视频编码框架的重要模块,其中,帧间预测通过运动补偿的方式来实现。对于视频中的一帧图像,可以先分成等大的编码树单元(Coding Tree Unit,CTU),例如64x64、128x128大小。每个CTU可以进一步划分成方形或矩形的编码单元(Coding Unit,CU),可以针对每个CU在参考帧中寻找最相似块作为当前CU的预测块。当前块与相似块之间的相对位移为运动矢量(Motion Vector,MV)。在参考帧中寻找相似块作为当前块的预测值的过程就是运动补偿。
解码端导出运动信息技术是近来出现的新技术,主要用于在解码端对解码出的运动矢量进行修正,在不增加码率的情况下,可以提升编码质量,进而提高编码器的性能。
然而,在获取运动矢量时,会进行大量的匹配代价计算,并且需要消耗大量硬件资源对计算匹配代价时所需的重构块进行存储,从而占用大量存储空间。
发明内容
本申请实施例提供一种用于视频处理的方法和设备,可以减少在获取运动矢量的过程中的硬件资源消耗和占用的存储空间。
第一方面,提供了一种用于视频处理的方法,包括:
在获取当前图像块的运动矢量的过程中,对用于匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样;
利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到匹配结果;
基于所述匹配结果,获取所述当前图像块的运动矢量。
第二方面,提供了一种用于视频处理的设备,包括:
降采样单元,用于在获取当前图像块的运动矢量的过程中,对用于匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样;
匹配单元,用于利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到匹配结果;
获取单元,用于基于所述匹配结果,获取所述当前图像块的运动矢量。
第三方面,提供了一种计算机系统,包括:存储器,用于存储计算机可执行指令;处理器,用于访问该存储器,并执行该计算机可执行指令,以进行上述第一方面的方法中的操作。
第四方面,提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码可以用于指示执行上述第一方面的方法。
第五方面,提供了一种计算机程序产品,该程序产品包括程序代码,该程序代码可以用于指示执行上述第一方面的方法。
因此,在本申请实施例中,在获取当前图像块的运动矢量MV的过程中,对用于匹配的已重构图像块进行匹配之前,对该已重构图像进行降采样,降采样之后再进行匹配代价的计算,可以减少处理的数据量,从而可以降低数据处理过程中的硬件资源消耗和占用的存储空间。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本申请实施例的编解码系统的示意性图。
图2是根据本申请实施例的用于视频处理的方法的示意性流程图。
图3是根据本申请实施例的用于视频处理的方法的示意性流程图。
图4是根据本申请实施例的获取双向模板的示意图。
图5是根据本申请实施例的基于双向模板匹配法获取运动矢量的示意性 图。
图6是根据本申请实施例的基于模板匹配法获取运动矢量的示意性图。
图7是根据本申请实施例的基于双向匹配法获取运动矢量的示意性图。
图8是根据本申请实施例的用于视频处理的方法的示意性流程图。
图9是根据本申请实施例的用于视频处理的设备的示意性框图。
图10是根据本申请实施例的计算机系统的示意性框图。
具体实现方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
除非另有说明,本申请实施例所使用的所有技术和科学术语与本申请的技术领域的技术人员通常理解的含义相同。本申请中所使用的术语只是为了描述具体的实施例的目的,不是旨在限制本申请的范围。
图1是应用本申请实施例的技术方案的架构图。
如图1所示,系统100可以接收待处理数据102,对待处理数据102进行处理,产生处理后的数据108。例如,系统100可以接收待编码数据,对待编码数据进行编码以产生编码后的数据,或者,系统100可以接收待解码数据,对待解码数据进行解码以产生解码后的数据。在一些实施例中,系统100中的部件可以由一个或多个处理器实现,该处理器可以是计算设备中的处理器,也可以是移动设备(例如无人机)中的处理器。该处理器可以为任意种类的处理器,本发明实施例对此不做限定。在一些可能的设计中,该处理器可以包括编码器、解码器或编解码器等。系统100中还可以包括一个或多个存储器。该存储器可用于存储指令和数据,例如,实现本发明实施例的技术方案的计算机可执行指令,待处理数据102、处理后的数据108等。该存储器可以为任意种类的存储器,本发明实施例对此也不做限定。
待编码数据可以包括文本,图像,图形对象,动画序列,音频,视频,或者任何需要编码的其他数据。在一些情况下,待编码数据可以包括来自传感器的传感数据,该传感器可以为视觉传感器(例如,相机、红外传感器), 麦克风,近场传感器(例如,超声波传感器、雷达),位置传感器,温度传感器,触摸传感器等。在一些情况下,待编码数据可以包括来自用户的信息,例如,生物信息,该生物信息可以包括面部特征,指纹扫描,视网膜扫描,嗓音记录,DNA采样等。
其中,在对每个图像进行编码时,图像可以最初被分成多个图像块。在一些实施例中,图像可以被分成多个图像块,所述图像块在一些编码准中被称为宏块或最大编码单元(LCU,Largest Coding Unit)。图像块可以具有或者可以不具有任何重叠部分。该图像可以被分成任何数量的图像块。举例而言,该图像可以被分成一个m×n图像块阵列。图像块可以具有矩形形状、正方形形状、圆形形状或任何其他形状。图像块可以具有任何尺寸,如p×q像素。在现代视频编码标准中,可以通过首先将该图像分成多个小块来对不同分辨率的图像进行编码。对于H.264,图像块被称为宏块,其大小可以是16×16像素,并且对于HEVC,图像块被称为最大编码单元,其大小可以是64×64。每个图像块都可以具有相同尺寸和/或形状。替代地,两个或更多图像块可以具有不同的尺寸和/或形状。在一些实施例中,一个图像块也可以不是一个宏块或最大编码单元,而是包含一个宏块或最大编码单元的部分,或者包含至少两个完整的宏块(或最大编码单元),或者包含至少一个完整的宏块(或最大编码单元)和一个宏块(或最大编码单元)的部分,或者包含至少两个完整的宏块(或最大编码单元)和一些宏块(或最大编码单元)的部分。如此,在图像被分成多个图像块之后,可以分别对图像数据中的这些图像块进行编码。
在编码过程中,为了去除冗余,可以对图像进行预测。视频中不同的图像可采用不同的预测方式。根据图像所采用的预测方式,可以将图像区分为帧内预测图像和帧间预测图像,其中帧间预测图像包括前向预测图像和双向预测图像。I图像是帧内预测图像,也称为关键帧;P图像是前向预测图像,也即采用之前已编码的一个P图像或者I图像作为参考图像;B图像是双向预测图像,也即采用前后的图像作为参考图像。一种实现方式是编码端将多张图像进行编码后产生一段一段的图像组(group of picture,GOP),该GOP是由一张I图像,以及多张B图像(或双向预测图像)和/或P图像(或前向预测图像)构成的图像组。解码端在播放时则是读取一段一段的GOP进 行解码后读取画面再渲染显示。
其中,在进行帧间预测时,可以为每个图像块在参考帧中(一般为时域附近的已重构帧)寻找最相似块作为当前图像块的预测块。当前块与预测块之间的相对位移为运动矢量(Motion Vector,MV)。
为了减少编码端与解码端之间的码率,可以在码率中不传输运动信息,由此需要解码端导出运动信息,也即运动矢量。解码端在导出运动信息时,可能会导致数据吞吐量过大,这样将会引起解码端占用大量硬件资源和空间的问题。
为此,本申请实施例提出一种用于视频处理的方法,可以减少解码端导出运动信息时所需处理的数据量,从而可以避免解码端占用大量的硬件资源和空间的问题。同样,在本申请实施例的方法用于编码端时,可以减少编码端占用的硬件资源和空间。
图2是根据本申请实施例的用于视频处理的方法的示意性流程图。以下方法可选地可以由解码端实现,或者也可以由编码端实现。
其中,在该方法由解码端实现时,以下提到的当前图像块可以是待解码的图像块(也可以称为待重构的图像块)。或者,在该方法由编码端实现时,以下提到的当前图像块可以是待编码的图像块。
在210中,处理设备在获取当前图像块的运动矢量MV的过程中,对用于匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样。
其中,该处理设备可以是编码端的设备,也可以是解码端的设备。
以及,当前图像块的MV可以理解为当前图像块与选择的预测块之间的MV。
可选地,在本申请实施例中,已重构图像块还可以称为参考块。
可选地,在本申请实施例中,对已重构图像数据降采样可以通过以下两种实现方式来实现。
在一种实现方式中,通过间隔一定数量的像素的采样方式,对该已重构图像数据进行降采样。其中,间隔一定数量的像素的采样方式可以在水平方向和垂直方向上分别间隔一定的数量的采用方式。
例如,假设降采样的对象为已重构图像块为128×128的块,则可以取其中的一些列或一些行的像素作为降采样后的已重构图像块。
可选地,可以利用间隔相同数量的像素的采样方式,对该已重构图像数据进行降采样。其中,间隔相同数量的像素的采样方式可以指水平方向和/或垂直方向上分别间隔相同数量的像素进行采用。
例如,假设降采样的对象为已重构图像块,对该已重构图像块的水平和垂直方向间隔为2进行降采样,可以取左上角的像素点作为降采样结果;当然,也可以取四个像素点的其余三个点作为降采样结果。
例如,假设降采样的对象为已重构图像块,对该已重构图像块的水平方向间隔为2降采样,垂直方向不进行降采样。
例如,假设降采样的对象为已重构图像块,对该已重构图像块的垂直方向间隔为2降采样,水平方向不进行降采样。
在一种实现方式中,对多个像素进行取平均的方式,对该已重构图像数据进行降采样。其中,该多个像素可以是相邻的像素。
例如,假设降采样的对象为已重构图像块,对于12×12的已重构图像块,则可以对四个像素的像素进行取平均的方式,对该已重构图像块进行降采样,其中,四个像素可以是相邻的像素,例如,可以是一个2×2的图像块中的像素。
可选地,降采样的已重构图像数据可以包括用于匹配的已重构图像块的降采样的已重构图像数据。
在一种实现方式中,可以对用于匹配的已重构图像块所属的整帧图像进行降采样,也就是说,在进行降采样时,不对各个已重构图像块进行区分,则此时,降采样的已重构图像数据可以包括用于匹配的已重构图像块的已重构图像数据。
在另一种实现方式中,可以确定用于匹配的已重构图像块,并对确定的该已重构图像块进行降采样。
以下将具体介绍如何对用于匹配的已重构图像块进行降采样。
可选地,在本申请实施例中,根据该已重构图像块的内容,对该已重构图像块的已重构图像数据进行降采样。其中,对已重构图像块的已重构图像数据进行降采样可以称为对已重构图像块进行降采样。
具体地,处理设备可以根据已重构图像块的内容,确定降采样的比例;利用该降采样比例,对该已重构图像块的已重构图像数据进行降采样。
其中,本申请实施例提到的降采样比例可以是指降采样后的图像块包括 的像素数量与采样之前的图像块包括的像素数量之间的比例。
其中,已重构图像块的复杂度较高则采样间隔小(也即,降采样比例大),图像块复杂度较低则采样间隔大(也即,降采样比例小),从而根据图像内容进行自适应降采样,可以降低数据采样带来的性能损失。
可选地,本申请实施例提到的已重构图像块的内容可以包括:已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项。
具体地,处理设备可以根据该已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,确定降采样比例;利用该降采样比例,对该已重构图像块进行降采样。
可选地,在本申请实施例中,已重构图像块的像素灰度可以通过已重构图像块的灰度直方图的方差来表征。
可选地,在本申请实施例中,已重构图像块的边缘特征可以通过已重构图像块包括的像素中属于纹理的边缘点的像素数量来表征。
可选地,在本申请实施例中,在用于匹配的已重构图像块包括至少两个已重构图像块时,按照相同的降采样比例,对该至少两个已重构图像块的已重构图像数据进行降采样。
具体地,在一次确定MV的过程中,在匹配过程中,如果需要采用至少两个已重构图像块,则可以采用相同的降采样比例,对至少两个已重构图像块的已重构图像数据进行降采样。
例如,在依据该至少两个已重构图像块的像素灰度和/或包括的像素中属于纹理的边缘点的像素数量,确定出该至少两个已重构图像块需要分别采用不同的降采样比例时,则可以对该不同的降采样比例取平均,平均值用于对该至少两个已重构图像块进行降采样,或者,可以采用最高的降采样比例或最低的降采样比例,对该至少两个已重构图像块的已重构图像数据进行降采样。
例如,在表征该至少两个已重构图像块的像素灰度的值和/或表征该至少两个已重构图像块的边缘特征的值不同时,可以将这些值取平均(如果表征像素灰度的值和表征边缘特征的值同时采用,则可以分别对表征像素灰度的值和表征边缘特征的值进行取平均),利用取平均的值计算一个降采样比例,并利用该一个降采样比例分别对该至少两个已重构图像块的已重构图像数据进行降采样;或者,也可以取这些值中的最大值(如果表征像素灰度的值 和表征边缘特征的值同时采用,则可以取表征像素灰度的值中的最大值和以及取表征边缘特征的值中的最大值)或最小值(如果表征像素灰度的值和表征边缘特征的值同时采用,则可以取表征像素灰度的值中的最小值和以及取表征边缘特征的值中的最小值),计算一个降采样比例,并利用该一个降采样比例,分别对该至少两个已重构图像块的已重构图像数据进行降采样。
应理解,在本申请实施例中,用于匹配的已重构图像块可以与当前图像块包括的像素数量相同,则此时根据用于匹配的已重构图像块包括的像素数量确定降采样比例,可以是通过根据当前图像块包括的像素数量确定降采样比例来实现。
可选地,在本申请实施例中,在满足以下条件中的至少一个时,处理设备确定对匹配过程中的已重构图像块进行降采样:
该已重构图像块包括的像素数量大于或等于第一预定值;
该已重构图像块的灰度直方图的方差大于或等于第二预定值;
该已重构图像块包括的像素中属于纹理的边缘像素的数量大于或等于第三预定值。
也就是说,在满足以上条件时,对已重构图像块进行降采样,否则不进行降采样,由此可以避免盲目进行降采样造成的编解码性能较差的问题。
其中,在用于匹配的已重构图像块包括至少两个已重构图像块时,可以是每个已重构图像块包括的像素数量、灰度直方图的方差和包括的像素中属于纹理的边缘像素的数量均满足以上条件,或者也可以是该至少两个已重构图像块包括的像素数量的平均、灰度直方图的方差和包括的像素中属于纹理的边缘像素的数量的平均满足以上条件。
应理解,在本申请实施例中,用于匹配的已重构图像块可以与当前图像块包括的像素数量相同,则此时根据用于匹配的已重构图像块包括的像素数量确定是否对已重构图像块进行降采样,可以是通过根据当前图像块包括的像素数量确定是否对已重构图像块进行降采样来实现。
以上依据已重构图像块的内容确定是否对已重构图像块进行降采样以及降采样的比例,但应理解本申请实施例并不限于此,处理设备在对已重构图像帧进行降采样处理时,也可以根据该已重建图像帧的内容确定是否对该已重建图像帧进行降采样和/或降采样的比例。
具体地,可以根据该已重构图像帧的包括的像素数量、像素灰度、边缘 特征中的至少一项,确定降采样比例;利用该降采样比例,对该已重构图像帧进行降采样。
或者,在对该已重构图像帧进行降采样之前,需要满足以下条件:
该已重构图像帧包括的像素数量大于或等于一特定值;
该已重构图像帧的灰度直方图的方差大于或等于一特定值;
该已重构图像帧包括的像素中属于纹理的边缘像素的数量大于或等于一特定值。
在220中,处理设备利用用于匹配的已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到匹配结果。
可选地,在本申请实施例中,匹配还可以称为失真匹配,匹配结果可以为已重构图像块之间进行失真匹配所得到的匹配代价。
在230中,处理设备基于该匹配结果获取该当前图像块的MV。
可选地,在本申请实施例中,在该处理设备为编码端的设备时,则可以利用该MV,对该当前图像块进行编码或重构。
其中,编码端可以将该MV对应的已重构图像块作为预测块,基于该预测块对当前图像块进行编码或重构。
在一种实现方式中,编码端可以直接将该预测块的像素作为当前图像块的重构像素,此种模式可以称为skip模式,skip模式的特点是当前图像块的重构像素值可以等于预测块的像素值,在编码端采用skip模式时,可以在码流中传输一个标识,用于向解码端指示采用的模式为skip模式。
在另一种实现方式中,编码端可以将当前图像块的像素与预测块的像素相减,得到像素残差,并在码流中向解码端传递该像素残差。
应理解,在得到MV之后,便码端可以采用其他的方式对当前图像块进行编码和重构,本申请实施例对此不做具体限定。
可选地,在本申请实施例,本申请实施例可以用于高级运动矢量预测(Advanced Motion Vector Prediction,AMVP)模式,也就是说,进行匹配得到的结果可以是运动矢量的预测值(Motion Vector Prediction,MVP),编码端在得到MVP之后,可以根据MVP确定运动估计的起始点,在起始点附近,进行运动搜索,搜索完毕之后得到最优的MV,由MV确定参考块在参考图像中的位置,参考块减去当前块得到残差块,MV减去MVP得到运动矢量差值(Motion Vector Difference,MVD),并将该MVD通过码流传输 给解码端。
可选地,在本申请实施例中,本申请实施可以用于Merge(合并)模式,也就是说,进行匹配得到的结果可以为MVP,编码端可以直接将该MVP确定为MV,换句话说,进行匹配得到的结果是MV。对于编码端而言,编码端在得到MVP(也即MV)之后,无需传输MVD,因为MVD默认为0。
可选地,在本申请实施例中,在该处理设备为解码端的设备时,则可以利用该MV,对该当前图像块进行解码。
其中,解码端可以将该MV对应的已重构图像块作为预测块,基于该预测块对当前图像块进行解码。
在一种实现方式中,解码端可以直接将该预测块的像素作为当前图像块的像素,此种模式可以称为skip模式,skip模式的特点是当前图像块的重构像素值可以等于预测块的像素值,在编码端采用skip模式时,可以在码流中传输一个标识,用于向解码端指示采用的模式为skip模式。
在另一种实现方式中,解码端可以从编码端传送的码流中的获取像素残差,将预测块的像素与该像素残差相加,得到当前图像块的像素。
应理解,在得到MV之后,可以采用其他的方式对该当前图像块进行解码,本申请实施例对此不做具体限定。
可选地,在本申请实施例,本申请实施例可以用于AMVP模式,也就是说,进行匹配得到的结果可以是MVP,解码端可以结合编码端传送的码流中的MVD,得到当前图像块的MV。
可选地,在本申请实施例中,本申请实施可以用于Merge(合并)模式,也就是说,进行匹配得到的结果可以MVP,解码端可以直接将该MVP确定为MV,换句话说,进行匹配得到的结果是MV。
可选地,在本申请实施例中,在获取基于该匹配结果对该当前图像块的初始MV进行修正,得到该当前图像块的MV。
也就是说,处理设备可以得到初始MV,但是该初始MV可能并不是最优的MV或MVP,处理设备可以对该初始MV进行修正,来得到当前图像块的MV。
对于编码端而语言,可以将该初始MV的索引进行编码,并传递给解码端,该索引可以使得解码端从初始MV列表中选择初始MV,其中,该索引指向了以下的信息:参考帧的索引以及参考块相对于当前图像块在空域上的 偏移,解码端基于这些信息可以选择初始MV。
对于解码端而言,该初始MV可以是基于编码端发送的码流得到的,该码流中可以包括索引,基于该索引,解码端可以得到该初始MV。
可选地,该初始MV可以包括多个初始MV,该多个初始MV可以分别属于不同的帧。其中,初始MV所属的帧是指该MV对应的已重构图像块所属的帧。
假设该多个初始MV包括第一MV和第二MV,则第一MV所属的帧和第二MV所属的帧为不同的帧。
例如,该第一MV对应的已重构图像块属于当前图像块的前向帧,该第二MV对应的已重构图像块属于当前图像块的后向帧。
或者,该第一MV对应的已重构图像块属于当前图像块的前向帧,该第二MV对应的已重构图像块属于当前图像块的前向帧。
当然,该第一MV对应的已重构图像块和该第二MV对应的已重构图像块分别属于该当前图像块的不同的后向帧,本申请实施例对此不做具体限定。
为了更加清楚地理解本申请,以下将结合实现方式A对初始MV如何进行修正进行说明。
实现方式A
具体地,处理设备可以基于该多个初始MV对应的已重构图像块的降采样的已重构图像数据,生成模板(例如,对像素进行求平均的方式),利用生成的模板,分别对该多个初始MV进行修正。
应理解,除了利用多个已重构图像块的降采样的已重构图像数据生成模板,还可以利用多个初始MV对应的已重构图像块的未降采样的已重构图像数据生成模板,并对该模板进行降采样,本申请实施例对此不做具体限定。
具体地,假设初始MV包括第一MV和第二MV,第一MV对应的已重构图像块为属于第一帧的第一已重构图像块,第二MV对应的已重构图像块属于第二帧的第二已重构图像块,基于该第一已重构图像块的降采样的已重构图像数据和该第二已重构图像块的降采样的已重构图像数据生成模板。其中,该模板可以称为双向模板。
则可以利用N个第三已重构图像块的降采样后的已重构图像数据(可以称为N个降采样后的第三已重构图像块),分别与该模板进行匹配,其中,该N个第三已重构图像块对应于N个第三MV;利用M个第四已重构图像 块的降采样后的已重构图像数据(可以称为M个降采样后的第四已重构图像块),分别与该模板进行匹配,其中,该M个第四已重构图像块对应于M个第四MV;基于该匹配结果,从该N个第三MV中选择一个第三MV,以及从该M个第四MV中选择一个第四MV。
可选地,该选择的第三MV可以为最小的失真代价对应的MV。或者,该选择的第三MV可以为小于某一特定的值的失真代价对应的MV。
可选地,该选择的第四MV可以为最小的失真代价对应的MV。或者,该选择的第四MV可以为小于某一特定的值失真代价对应的MV。
其中,所述一个第三MV和所述一个第四MV作为所述当前图像块的MV,此时,可以将所述一个第三MV和所述一个第四MV对应的已重构图像块可以进行加权平均得到预测块
或者,所述一个第三MV和所述一个第四MV可以用于确定所述当前图像块的MV,也即所述一个第三MV和所述一个第四MV可以分别作为MVP。此时,可以基于该第三MVP和第四MVP分别进行运动搜索和运动补偿过程得到最终的MV。
可选地,在本申请实施例中,该N个第三已重构图像块可以属于该第一帧,以及该M个第四已重构图像块可以属于第二帧。
可选地,该N和M可以相等。
可选地,该第三MV包括该第一MV,该第四MV包括该第二MV,也就是说,用于生成模板的第一MV对应的已重构图像块和第二MV对应的已重构图像块,也需要分别与模板进行匹配。
可选地,在本申请实施例中,该N个第三MV中的至少部分MV是基于该第一MV进行偏移得到,该M个第四MV中的至少部分MV是基于该第二MV进行偏移得到的。
例如,该N个第三MV中除第一MV之外的MV可以是基于该第一MV进行偏移得到,例如,N可以等于9,则其中的8个MV可以是基于第一MV进行偏移得到的,例如,可以在八个方向上进行偏移得到的,或者在垂直方向或水平方向上偏移不同的像素得到的。
再例如,该M个第四MV中除第二MV之外的MV可以是基于该第二MV进行偏移得到,例如,M可以等于9,则其中的8个MV可以是基于第二MV进行偏移得到的,例如,可以在八个方向上进行偏移得到的或者在垂 直方向或水平方向上偏移不同的像素得到的。
可选地,可以将实现方式A中的方法称为双向模板匹配法的MV选择。
为了更加清楚地理解本申请,以下将结合图3至图5对实现方式A进行详细说明。
在310中,确定当前图像块的大小的宽度和高度是否分别小于8个像素(当然,也可以是其他数量的像素)。在321,如果是,在参考列表0和列参考表1中的MV0和MV1分别对应的已重构图像块进行降采样,并求平均得到双向模板。其中,参考列表0中的MV可以是当前图像块与前向参考帧中的已重构图像块之间的运动矢量,参考列表1中的MV可以是当前图像块与后向参考帧中的已重构图像块之间的运动矢量。
具体地,如图4所示,针对当前图像块,将MV0对应的参考块0(已重构图像块)和MV1对应的参考块1(已重构图像块)进行降采样,再对降采样之后的两个参考块求平均得到降采样之后的双向模板。
在322中,对列表0中的MV0对应的降采样后的已重构图像块与模板进行匹配。在323中,对MV0进行偏移得到多个MV0′。在324中,将多个MV0′对应的已重构图像块进行降采样,并分别与模板进行匹配。
例如,如图5所示,可以对MV0对应的参考块的周围像素(具体可以包括MV0′对应的参考块包括的像素)进行降采样。具体地,如图5所示,可以对MV0对应的参考块周围的像素值进行填充,获取MV0′对应的参考块(偏移后的参考块),并对偏移后的参考块进行降采样。最终在计算匹配代价时,使用的是降采样之后的双向模板和降采样之后的参考块。
在325中,得到匹配代价最小的MV0′,其中,匹配代价最小的MV0′可以是MV0。
在331中,对列表1中的MV1对应的降采样后的已重构图像块与模板进行匹配。
在332中,对MV1进行偏移得到多个MV1′。在333中,将多个MV1′对应的已重构图像块进行降采样,并分别与模板进行匹配。在334中,得到匹配代价最小的MV1′,其中,匹配代价最小的MV1′可以是MV1。
例如,如图5所示,可以对MV1对应的参考块的周围像素(具体可以包括MV1′对应的参考块包括的像素)进行降采样。具体地,如图5所示,可以对MV1对应的参考块周围的像素值进行填充,获取MV1′对应的参考块 (偏移后的参考块),并对偏移后的参考块进行降采样。最终在计算匹配代价时,使用的是降采样之后的双向模板和降采样之后的参考块。
在335中,根据匹配代价最小的MV0′和MV1′对应的已重构图像块,生成预测块。
在336中,基于该预测块,对当前图像块进行解码。
本申请实施例的双向模板匹配法的实现不应仅限于上述的描述。
可选地,以上实现方式A以及其可选实现方式可由DMVR技术来实现。
可选地,在本申请实施例中处理设备获取当前图像块对应的初始运动矢量MV;针对所述初始MV,确定所述用于匹配的已重构图像块。
其中,该初始MV可以是待选择的MV。可选地,可以将该待选择的MV称为MV候选列表。
以下将结合实现方式B和实现方式C描述如何从待选择的MV中选择MV。
实现方式B
具体地,该初始MV包括K个第五MV,利用K个第五已重构图像块的邻近已重构图像块的降采样后的已重构图像数据与该当前图像块的邻近已重构图像块的降采样后的已重构图像数据进行匹配,以得到该匹配结果,其中,该K个第五已重构图像块与该K个第五MV一一对应,K为大于或等于1的整数;基于该匹配结果,从该K个第五MV中,选择一个该第五MV。
可选地,该选择的第五MV可以为最小的失真代价对应的MV。或者,该选择的第五MV可以为小于某一特定的值失真代价对应的MV。
其中,选择的该一个第五MV可以作为该当前图像块的MV。此时,可以将该一个第五MV对应的已重构图像块作为当前图像块的预测块。
或者,选择的该一个第五MV可以用于确定当前图像块的MV。
例如,该一个第五MV可以作为MVP。此时,可以根据该MVP进一步进行运动搜索和运动补偿,得到最终的MV。将该优化后的MV对应的已重构图像块作为预测块。
再例如,该第一个第五MV是下文提到的基于编码单元(CodingUnit,CU)级的MV,则MV可以用于确定子CU(Sub-CU)级的MV。
可选地,可以将该K个第五MV称为MV候选列表。
可选地,可以将当前图像块的邻近已重构图像块称为该当前图像块的模板。其中,该实现方式B可以称为基于模板匹配法的MV选择。
可选地,如图6所示,第五已重构图像块的邻近已重构图像块可以包括上邻块和/或左邻块,以及当前图像块的邻近已重构图像块可以包括上邻块和/或左邻块。
实现方式C
具体地,该初始MV包括W个第六MV,其中,W为大于或等于1的整数;针对W个MV对中每个MV对对应的两个所述已重构图像块,将其中一个所述已重构图像块的降采样后的所述已重构图像数据与另一个所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到所述匹配结果,其中,每个MV对包括一个第六MV以及一个基于所述第六MV确定的第七MV;基于该W个MV对对应的匹配结果,选择一个MV对。
其中,选择的MV对中的第六MV确定为该当前图像块的MV。此时,可以将选择的MV对中的第六MV对应的已重构图像块作为当前图像块的预测块。
或者,选择的MV对中的第六MV可以用于确定当前图像块的MV。
例如,该第六MV可以作为MVP。此时,可以根据该MVP进一步进行运动搜索和运动补偿,得到最终的MV。将该最终的MV对应的已重构图像块作为预测块。
再例如,该第一个第六MV是下文提到的基于CU级的MV,则MV可以用于确定sub-CU级的MV。
可选地,在本申请实施例中,该第七MV是在运动轨迹是连续的假定下基于该第六MV确定的。
可选地,可以将该W个第六MV为MV候选列表。
可选地,在本申请实施例中,该第六已重构图像块属于当前图像块的所属的帧的前向帧,该第七已重构图像块属于当前图像块的所属的帧的后向帧。
可选地,在本申请实施例中,第六已重构图像块与当前图像块之间的时域距离可以等于当前图像块与第七已重构图像块之间的时域距离。
可选地,对实现方式C,W个第六MV中的每个第六MV可以作为输入,并基于双向匹配法的假设,得到一个MV对。例如,MV候选列表中的一个有效MVa对应的参考块属于在参考列表A中的参考帧a,而与之配对的MVb 对应的参考块所在的参考帧b在参考列表B中,那么参考帧a和参考帧b在时域上位于当前帧的两边。如果在参考列表B中不存在这样的一个参考帧b,则参考帧b为一个不同于参考帧a的参考帧且它与当前帧的时域距离在参考列表B中是最小的。确定参考帧b之后,基于当前帧分别与参考帧a和参考帧b的时域距离对MVa进行缩放即可得到MVb。
例如,如图7所示,对于双向匹配法,可以根据各个候选MV分别生成MV对,计算每个MV对中的两个MV(MV0和MV1)对应的两个参考块之间的失真。其中,在本申请的实施例中,可以对两个参考块都进行降采样,再对降采样后的两个参考块计算失真。失真最小时,对应的候选MV(MV0)即为最终的MV。
其中,该实现方式C可以称为基于双向匹配法的MV选择。
可选地,以上实现方式B和C可以用于AMVP模式;也可以用于merge模式,具体地可以采用模式匹配的运动矢量导出技术,其中,该PMMVD技术是基于帧率上转换(Frame Rate Up Conversion,FRUC)技术的一种特殊的merge模式。这种模式下,一个块的运动信息不会在码流中进行编码,而是直接在解码端生成。
其中,编码端可以在多个编码模式中进行选择,具体地,可以进行普通的merge模式编码,得到最小的率失真代价(Rate Distortion Cost,RD-Cost),即cost0;然后,使用PMMVD模式进行编码,得到RD-Cost,其中,双向匹配法得到的MV对应的RD-Cost为cost1,模板匹配法得到的MV对应的RD-Cost为cost2,cost3=min(cost1,cost2)。
若cost0<cost3,则FRUC标志位为假;否则,FRUC标志位为真,同时使用一个额外的FRUC模式标志位表示使用哪种方式(双向匹配法或模板匹配法)。
其中,RD-Cost是编码器中用来衡量决策使用哪种模式的一种准则,既考虑了视频质量,又考虑了编码码率。RD-Cost=Cost+lambda*bitrate,其中cost表示视频质量的损失,通过计算原始像素块和重构像素块之间的相似性(SAD,SSD等指标);bitrate表示使用该模式需要消耗的比特数。
由于计算RD-Cost需要使用到原始像素值,而在解码端原始像素值是不可得的,因此需要传递一个额外的FRUC模式标志位表示使用哪种方式获取运动信息。
可选地,在本申请实施例中,FRUCmerge模式的运动信息的导出过程可以分为两步。其中,第一步是基于CU级的运动信息的导出过程,第二步是基于Sub-CU级的运动信息的导出过程。
其中,在基于CU级的运动信息的导出过程中,可以导出整个CU的初始MV,也即一个CU级的MV候选列表,其中,该MV候选列表可以包括:
1)若当前CU使用的是AMVP模式,则包含原始的AMVP候选MV,具体地,若当前CU使用的是AMVP模式,则可以将原始的AMVP候选MV添加到CU级的MV候选列表中。
2)若当前CU使用的是merge模式,则包含所有的merge候选MV。
3)在插值运动矢量场中的MV,其中,插值运动矢量场的MV可以为4个,插值的这四个MV可选地分别位于当前CU的(0,0),(W/2,0),(0,H/2)和(W/2,H/2)位置。
4)上方和左方的相邻MV。
可选地,在AMVP模式的候选列表中(列表的长度可选地是2),建立流程可以包括空域列表的建立和时域列表的建立。
其中,在AMVP的空域列表的建立中,假设当前PU的左下角是A0,左侧是A1,左上角是B2,上方是B1,右上角是B0。当前PU的左侧和上方可以各产生一个候选MV。对于左侧的候选MV的筛选,处理顺序是A0->A1->scaled A0->scaled A1其中,scaled A0表示将A0的MV进行比例伸缩,scaled A1表示将A1的MV进行比例伸缩。对上侧的候选MV的筛选,处理的顺序是B0->B1->B2(如果这几个都不存在,那么继续处理->scaled B0->scaled B2),scaled B0表示将B0的MV进行比例伸缩,scaled B2表示将B2的MV进行比例伸缩。对于左侧(上方)来说,只要找到一个候选MV,就不继续处理后面的候选者了。以及,在AMVP的时域列表的建立中,时域候选列表可以不直接使用候选块的运动信息,可以根据当前帧和参考帧之间的时域位置关系做相应的伸缩调整。时域最多可以提供一个候选MV。如果此时候选列表的候选MV的数量还不足2个,那么可以填充零向量。
可选地,在AMVP模式的候选列表中(列表的长度可选地是5),建立流程可以包括空域列表的建立和时域列表的建立。
其中,在merge模式的空域列表的建立中,假设当前PU的左下角是A0,左侧是A1,左上角是B2,上方是B1,右上角是B0。空域最多可以提供4 个候选MV,候选的顺序是A1->B1->B0->A0->B2,优先处理前面四个,如果前面四个当中有一个或者多个不存在,那么才处理B2。在merge模式的时域列表的建立中,时域候选列表不能直接使用候选块的运动信息,可以根据当前帧和参考帧之间的位置关系做相应的伸缩调整。时域最多可以提供一个候选MV,这就意味着,如果处理完空域和时域之后,如果列表中的MV数量还没有达到五个,可以填充零向量。
换句话说,merge候选MVP的选取,可以按照左边->上边->右上角->左下角>左上角的顺序遍历空域上相邻的CU的MV,然后处理时域上参考的预测MV,最后整理合并。
其中,在基于sub-CU级的运动信息的导出过程中,把基于CU级得到的MV作为起始点,在Sub-CU级对运动信息进行进一步的求精。其中,在Sub-CU级求精后的MV就是整个CU的MV,其中,基于子CU级的MV候选列表可以包括:
1)基于CU级得到的MV。
2)该基于CU级得到的MV的上、左、左上和右上相邻的MV。
3)参考帧中的对应时域相邻CU的MV缩放后所得的MV,其中,参考帧中对应时域相邻CU的缩放MV可以按如下方式得到:在两个参考列表中的所有参考帧都遍历一遍,将参考帧中与Sub-CU时域相邻的CU的MV缩放到基于CU级得到的MV所在的参考帧中去。
4)至多4个可选时域运动矢量预测(alternative temporal motion vector prediction,ATMVP)候选MV,其中,ATMVP允许每个CU从参考帧中的小于当前CU尺寸的多个块中得到多个运动信息集。
5)至多4个时空运动矢量预测(spatial temporal motion vector prediction,STMVP)候选MV,其中,在STMVP中,子CU的运动矢量通过重复使用时域预测运动矢量和空域相邻的运动矢量得到。
可选地,以上实现方法B和实现方式C可以用于CU级的MV的获取,也可以用于sub-CU级的MV的获取。
为了更加清楚地理解PMMVD技术,以下将结合图8进行说明。
在410中,确定当前CU是否采用Merge模式,如果否,则采用AMVP模式(未示出)。
在420中,确定当前CU是否使用双向匹配法,如果是,执行431,如果 否,执行441。
在431中,生成MV候选列表。
在432中,从候选列表中选出最优的MV,其中,可以采用双向匹配法择优,具体可以参照上述实现方式C中的描述。
在433中,在最优的MV周围进行局部搜索,对该最优MV进一步求精。具体地,可以对最优的MV进行偏移得到多个初始MV,对该多个初始MV中选择一个MV,其中,可以采用双向匹配法择优,具体可以参照上述实现方式C中的描述。
在434中,若得到CU级的MV,则可以采用上述实现方式C中的双向匹配法,在子CU级对MV进一步求精。
在441中,生成MV候选列表。
在442中,从候选列表中选出最优的MV,其中,可以采用模板匹配法择优,具体可以参照上述实现方式B中的描述。
在443中,在最优的MV周围进行局部搜索,对该最优MV进一步求精。具体地,可以对最优的MV进行偏移得到多个初始MV,对该多个初始MV中选择一个MV,其中,可以采用模板匹配法择优,具体可以参照上述实现方式B中的描述。
在444中,若得到CU级的MV,则可以采用上述实现方式B中的模板匹配法,在子CU级对MV进一步求精。
可见,本申请实施例的用于导出解码端运动矢量求精(Decode Motion Vector Refinement,DMVR)技术和模式匹配的运动矢量导出(Pattern Matching Motion VectorDerivation,PMMVD)的数据采样方法能够大大减少其在解码器中的硬件资源消耗和空间占用,同时,只带来较小的编码性能损失。
因此,在本申请实施例中,在获取当前图像块的运动矢量MV的过程中,对用于匹配的已重构图像块进行匹配之前,对该已重构图像进行降采样,降采样之后再进行匹配代价的计算,可以减少处理的数据量,大大降低了硬件资源消耗和占用的空间。
图9是根据本申请实施例的用于视频处理的设备500的示意性框图。该设备500包括:
降采样单元510,用于在获取当前图像块的运动矢量的过程中,对用于 匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样;
匹配单元520,用于利用该已重构图像块的降采样后的该已重构图像数据进行匹配,以得到匹配结果;
获取单元530,用于基于该匹配结果,获取该当前图像块的运动矢量。
可选地,在本申请实施例中,该设备500用于解码端,该设备500还包括:
解码单元,用于基于该当前图像块的运动矢量,对该当前图像块进行解码。
可选地,该设备500用于编码端,该设备500还包括:
编码单元,用于基于该当前图像块的运动矢量,对该当前图像块进行编码。
可选地,在本申请实施例中,该降采样单元510进一步用于:
确定用于匹配的该已重构图像块;
对该已重构图像块的该已重构图像数据进行降采样。
可选地,在本申请实施例中,该降采样单元510进一步用于:
根据该已重构图像块的内容,对该已重构图像块的该已重构图像数据进行降采样。
可选地,在本申请实施例中,该降采样单元510进一步用于:
根据该已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,对该已重构图像块的该已重构图像数据进行降采样。
可选地,在本申请实施例中,该降采样单元510进一步用于:
根据该已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,确定降采样比例;
利用该降采样比例,对该已重构图像块的该已重构图像数据进行降采样。
可选地,在本申请实施例中,该降采样单元510进一步用于:
确定该已重构图像块包括的像素数量大于或等于第一预定值;和/或,
确定该已重构图像块的灰度直方图的方差大于或等于第二预定值;和/或
确定该已重构图像块包括的像素中属于纹理的边缘点的像素数量大于或等于第三预定值。
可选地,在本申请实施例中,该降采样单元510进一步用于:
利用间隔相同数量的像素的采样方式,对该已重构图像数据进行降采样;或,
对多个像素进行取平均的方式,对该已重构图像数据进行降采样。
可选地,在本申请实施例中,该用于匹配的已重构图像块包括至少两个已重构图像块;
该降采样单元510进一步用于:
按照相同的采样比例,对该至少两个已重构图像块的该已重构图像数据进行降采样。
可选地,在本申请实施例中,该获取单元530进一步用于:
基于该匹配结果对该当前图像块的初始运动矢量进行修正,得到该当前图像块的运动矢量。
可选地,在本申请实施例中,该获取单元530进一步用于:
获取当前图像块对应的初始运动矢量;
针对该初始运动矢量,确定该用于匹配的已重构图像块。
可选地,在本申请实施例中,该初始运动矢量包括第一运动矢量和第二运动矢量;
该匹配单元520进一步用于:
基于第一已重构图像块的降采样后的该已重构图像数据和第二已重构图像块的降采样后的该已重构图像数据生成模板,其中,该第一已重构图像块对应于该第一运动矢量且属于第一帧,该第二已重构图像块对应于该第二运动矢量且属于第二帧;
基于该模板和降采样后的该已重构图像数据进行匹配,以得到匹配结果。
可选地,在本申请实施例中,该匹配单元520进一步用于:
利用N个第三已重构图像块的降采样后的该已重构图像数据,分别与该模板进行匹配,其中,该N个第三已重构图像块对应于N个第三运动矢量且属于该第一帧;
利用M个第四已重构图像块的降采样后的该已重构图像数据,分别与该模板进行匹配,其中,该M个第四已重构图像块对应于M个第四运动矢量且属于该第二帧;
该获取单元530进一步用于:
基于该匹配结果,从该N个第三运动矢量中选择一个第三运动矢量,以 及从该M个第四运动矢量中选择一个第四运动矢量,该一个第三运动矢量和该一个第四运动矢量作为该当前图像块的运动矢量,或者用于确定该当前图像块的运动矢量。
可选地,在本申请实施例中,该第三运动矢量包括该第一运动矢量,该第四运动矢量包括该第二运动矢量。
可选地,在本申请实施例中,该N个第三运动矢量中的至少部分运动矢量是基于该第一运动矢量进行偏移得到,该M个第四运动矢量中的至少部分运动矢量是基于该第二运动矢量进行偏移得到的。
可选地,在本申请实施例中,该N等于该M。
可选地,在本申请实施例中,该第一帧为该当前图像块的前向帧,该第二帧是该当前图像块的后向帧;或,
该第一帧为该当前图像块的前向帧,该第二帧是该当前图像块的前向帧。
可选地,在本申请实施例中,该初始运动矢量包括K个第五运动矢量,该匹配单元520进一步用于:
利用K个第五已重构图像块的邻近已重构图像块的降采样后的该已重构图像数据,分别与该当前图像块的邻近已重构图像块的降采样后的该已重构图像数据进行匹配,以得到该匹配结果,其中,该K个第五已重构图像块与该K个第五运动矢量一一对应;
该获取单元530进一步用于:
基于该匹配结果,从该K个第五运动矢量中,选择一个该第五运动矢量作为该当前图像块的运动矢量,或者用于确定该当前图像块的运动矢量。
可选地,在本申请实施例中,该初始运动矢量包括W个第六运动矢量;
该匹配单元520进一步用于:
针对W个运动矢量对中每个运动矢量对对应的两个该已重构图像块,将其中一个该已重构图像块的降采样后的该已重构图像数据与另一个该已重构图像块的降采样后的该已重构图像数据进行匹配,以得到该匹配结果,其中,每个运动矢量对包括一个第六运动矢量以及一个基于该第六运动矢量确定的第七运动矢量;
该获取单元530进一步用于:
基于该W个运动矢量对对应的匹配结果,选择一个运动矢量对,其中,选择的运动矢量对中的第六运动矢量作为该当前图像块的运动矢量,或者用 于确定该当前图像块的运动矢量。
可选地,在本申请实施例中,该第七运动矢量是在运动轨迹是连续的假定下基于该第六运动矢量确定的。
可选地,在本申请实施例中,该第六已重构图像块属于该当前图像块所属的帧的前向帧,该第七已重构图像块属于该当前图像块所属的帧的后向帧。
可选地,该设备500可以实现上述方法中的处理设备的操作,为了简洁,在此不再赘述。
应理解,上述本申请实施例的用于视频处理的设备可以是芯片,其具体可以由电路实现,但本申请实施例对具体的实现形式不做限定。
本申请实施例还提供了一种编码器,该编码器用于实现本申请实施例中编码端的功能,可以包括上述本申请实施例的用于视频处理的设备中用于编码端的模块。
本申请实施例还提供了一种解码器,该解码器用于实现本申请实施例中解码端的功能,可以包括上述本申请实施例的用于视频处理的设备中用于解码端的模块。
本申请实施例还提供了一种编解码器,该编解码器包括上述本申请实施例的用于视频处理的设备。
图10示出了本申请实施例的计算机系统600的示意性框图。
如图10所示,该计算机系统600可以包括处理器610和存储器620。
应理解,该计算机系统600还可以包括其他计算机系统中通常所包括的部件,例如,输入输出设备、通信接口等,本申请实施例对此并不限定。
存储器620用于存储计算机可执行指令。
存储器620可以是各种种类的存储器,例如可以包括高速随机存取存储器(Random Access Memory,RAM),还可以包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器,本申请实施例对此并不限定。
处理器610用于访问该存储器620,并执行该计算机可执行指令,以进行上述本申请实施例的用于视频处理的方法中的操作。
处理器610可以包括微处理器,现场可编程门阵列(Field-Programmable Gate Array,FPGA),中央处理器(Central Processing unit,CPU),图形处理器(Graphics Processing Unit,GPU)等,本申请实施例对此并不限定。
本申请实施例的用于视频处理的设备和计算机系统可对应于本申请实施例的用于视频处理的方法的执行主体,并且用于视频处理的设备和计算机系统中的各个模块的上述和其它操作和/或功能分别为了实现前述各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例还提供了一种电子设备,该电子设备可以包括上述本申请各种实施例的用于视频处理的设备或者计算机系统。
本申请实施例还提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码可以用于指示执行上述本申请实施例的环路滤波的方法。
应理解,在本申请实施例中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (44)

  1. 一种用于视频处理的方法,其特征在于,包括:
    在获取当前图像块的运动矢量的过程中,对用于匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样;
    利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到匹配结果;
    基于所述匹配结果,获取所述当前图像块的运动矢量。
  2. 根据权利要求1所述的方法,其特征在于,所述方法用于解码端,所述方法还包括:
    基于所述当前图像块的运动矢量,对所述当前图像块进行解码。
  3. 根据权利要求1所述的方法,其特征在于,所述方法用于编码端,所述方法还包括:
    基于所述当前图像块的运动矢量,对所述当前图像块进行编码。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述对已重构图像数据进行降采样,包括:
    确定用于匹配的所述已重构图像块;
    对所述已重构图像块的所述已重构图像数据进行降采样。
  5. 根据权利要求4所述的方法,其特征在于,所述对所述已重构图像块的所述已重构图像数据进行降采样,包括:
    根据所述已重构图像块的内容,对所述已重构图像块的所述已重构图像数据进行降采样。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述已重构图像块的内容,对所述已重构图像块的所述已重构图像数据进行降采样,包括:
    根据所述已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,对所述已重构图像块的所述已重构图像数据进行降采样。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,对所述已重构图像块的所述已重构图像数据进行降采样,包括:
    根据所述已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,确定降采样比例;
    利用所述降采样比例,对所述已重构图像块的所述已重构图像数据进行降采样。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,在所述对所述已重构图像块的所述已重构图像数据进行降采样之前,所述方法还包括:
    确定所述已重构图像块包括的像素数量大于或等于第一预定值;和/或,
    确定所述已重构图像块的灰度直方图的方差大于或等于第二预定值;和/或
    确定所述已重构图像块包括的像素中属于纹理的边缘点的像素数量大于或等于第三预定值。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述对已重构图像数据进行降采样,包括:
    利用间隔相同数量的像素的采样方式,对所述已重构图像数据进行降采样;或,
    对多个像素进行取平均的方式,对所述已重构图像数据进行降采样。
  10. 根据权利要求1至9中任一项所述的方法,其特征在于,所述用于匹配的已重构图像块包括至少两个已重构图像块;
    所述对所述已重构图像数据进行降采样,包括:
    按照相同的采样比例,对所述至少两个已重构图像块的所述已重构图像数据进行降采样。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述基于所述匹配结果获取所述当前图像块的运动矢量,包括:
    基于所述匹配结果对所述当前图像块的初始运动矢量进行修正,得到所述当前图像块的运动矢量。
  12. 根据权利要求1至10中任一项所述的方法,其特征在于,所述获取当前图像块的运动矢量,还包括:
    获取当前图像块对应的初始运动矢量;
    针对所述初始运动矢量,确定所述用于匹配的已重构图像块。
  13. 根据权利要求11所述的方法,其特征在于,所述初始运动矢量包括第一运动矢量和第二运动矢量;
    所述利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,包括:
    基于第一已重构图像块的降采样后的所述已重构图像数据和第二已重构图像块的降采样后的所述已重构图像数据生成模板,其中,所述第一已重构图像块对应于所述第一运动矢量且属于第一帧,所述第二已重构图像块对应于所述第二运动矢量且属于第二帧;
    基于所述模板和降采样后的所述已重构图像数据进行匹配,以得到匹配结果。
  14. 根据权利要求13所述的方法,其特征在于,所述基于所述模板和降采样后的所述已重构图像数据进行匹配,以得到匹配结果,包括:
    利用N个第三已重构图像块的降采样后的所述已重构图像数据,分别与所述模板进行匹配,其中,所述N个第三已重构图像块对应于N个第三运动矢量且属于所述第一帧;
    利用M个第四已重构图像块的降采样后的所述已重构图像数据,分别与所述模板进行匹配,其中,所述M个第四已重构图像块对应于M个第四运动矢量且属于所述第二帧;
    所述基于所述匹配结果对所述初始运动矢量进行修正,包括:
    基于所述匹配结果,从所述N个第三运动矢量中选择一个第三运动矢量,以及从所述M个第四运动矢量中选择一个第四运动矢量,所述一个第三运动矢量和所述一个第四运动矢量作为所述当前图像块的运动矢量,或者用于确定所述当前图像块的运动矢量。
  15. 根据权利要求14所述的方法,其特征在于,所述第三运动矢量包括所述第一运动矢量,所述第四运动矢量包括所述第二运动矢量。
  16. 根据权利要求14或15所述的方法,其特征在于,所述N个第三运动矢量中的至少部分运动矢量是基于所述第一运动矢量进行偏移得到,所述M个第四运动矢量中的至少部分运动矢量是基于所述第二运动矢量进行偏移得到的。
  17. 根据权利要求14至16中任一项所述的方法,其特征在于,所述N等于所述M。
  18. 根据权利要求13至17中任一项所述的方法,其特征在于,所述第一帧为所述当前图像块的前向帧,所述第二帧是所述当前图像块的后向帧;或,
    所述第一帧为所述当前图像块的前向帧,所述第二帧是所述当前图像块 的前向帧。
  19. 根据权利要求12所述的方法,其特征在于,所述初始运动矢量包括K个第五运动矢量,所述利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,包括:
    利用K个第五已重构图像块的邻近已重构图像块的降采样后的所述已重构图像数据,分别与所述当前图像块的邻近已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到所述匹配结果,其中,所述K个第五已重构图像块与所述K个第五运动矢量一一对应;
    所述基于所述匹配结果获取所述当前图像块的运动矢量,包括:
    基于所述匹配结果,从所述K个第五运动矢量中,选择一个所述第五运动矢量作为所述当前图像块的运动矢量,或者用于确定所述当前图像块的运动矢量。
  20. 根据权利要求12所述的方法,其特征在于,所述初始运动矢量包括W个第六运动矢量;
    所述利用所述已重构图像的降采样后的所述已重构图像数据进行匹配,包括:
    针对W个运动矢量对中每个运动矢量对对应的两个所述已重构图像块,将其中一个所述已重构图像块的降采样后的所述已重构图像数据与另一个所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到所述匹配结果,其中,每个运动矢量对包括一个第六运动矢量以及一个基于所述第六运动矢量确定的第七运动矢量;
    所述基于所述匹配结果获取所述当前图像块的运动矢量,包括:
    基于所述W个运动矢量对对应的匹配结果,选择一个运动矢量对,其中,选择的运动矢量对中的第六运动矢量作为所述当前图像块的运动矢量,或者用于确定所述当前图像块的运动矢量。
  21. 根据权利要求20所述的方法,其特征在于,所述第七运动矢量是在运动轨迹是连续的假定下基于所述第六运动矢量确定的。
  22. 根据权利要求20或21所述的方法,其特征在于,所述第六已重构图像块属于所述当前图像块所属的帧的前向帧,所述第七已重构图像块属于所述当前图像块所属的帧的后向帧。
  23. 一种用于视频处理的设备,其特征在于,包括:
    降采样单元,用于在获取当前图像块的运动矢量的过程中,对用于匹配的已重构图像块进行匹配之前,对已重构图像数据进行降采样;
    匹配单元,用于利用所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到匹配结果;
    获取单元,用于基于所述匹配结果,获取所述当前图像块的运动矢量。
  24. 根据权利要求23所述的设备,其特征在于,所述设备用于解码端,所述设备还包括:
    解码单元,用于基于所述当前图像块的运动矢量,对所述当前图像块进行解码。
  25. 根据权利要求23所述的设备,其特征在于,所述设备用于编码端,所述设备还包括:
    编码单元,用于基于所述当前图像块的运动矢量,对所述当前图像块进行编码。
  26. 根据权利要求23至25中任一项所述的设备,其特征在于,所述降采样单元进一步用于:
    确定用于匹配的所述已重构图像块;
    对所述已重构图像块的所述已重构图像数据进行降采样。
  27. 根据权利要求26所述的设备,其特征在于,所述降采样单元进一步用于:
    根据所述已重构图像块的内容,对所述已重构图像块的所述已重构图像数据进行降采样。
  28. 根据权利要求27所述的设备,其特征在于,所述降采样单元进一步用于:
    根据所述已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,对所述已重构图像块的所述已重构图像数据进行降采样。
  29. 根据权利要求28所述的设备,其特征在于,所述降采样单元进一步用于:
    根据所述已重构图像块包括的像素数量、像素灰度、边缘特征中的至少一项,确定降采样比例;
    利用所述降采样比例,对所述已重构图像块的所述已重构图像数据进行降采样。
  30. 根据权利要求23至29中任一项所述的设备,其特征在于,所述降采样单元进一步用于:
    确定所述已重构图像块包括的像素数量大于或等于第一预定值;和/或,
    确定所述已重构图像块的灰度直方图的方差大于或等于第二预定值;和/或
    确定所述已重构图像块包括的像素中属于纹理的边缘点的像素数量大于或等于第三预定值。
  31. 根据权利要求23至30中任一项所述的设备,其特征在于,所述降采样单元进一步用于:
    利用间隔相同数量的像素的采样方式,对所述已重构图像数据进行降采样;或,
    对多个像素进行取平均的方式,对所述已重构图像数据进行降采样。
  32. 根据权利要求23至31中任一项所述的设备,其特征在于,所述用于匹配的已重构图像块包括至少两个已重构图像块;
    所述降采样单元进一步用于:
    按照相同的采样比例,对所述至少两个已重构图像块的所述已重构图像数据进行降采样。
  33. 根据权利要求23至32中任一项所述的设备,其特征在于,所述获取单元进一步用于:
    基于所述匹配结果对所述当前图像块的初始运动矢量进行修正,得到所述当前图像块的运动矢量。
  34. 根据权利要求23至32中任一项所述的设备,其特征在于,所述获取单元进一步用于:
    获取当前图像块对应的初始运动矢量;
    针对所述初始运动矢量,确定所述用于匹配的已重构图像块。
  35. 根据权利要求33所述的设备,其特征在于,所述初始运动矢量包括第一运动矢量和第二运动矢量;
    所述匹配单元进一步用于:
    基于第一已重构图像块的降采样后的所述已重构图像数据和第二已重构图像块的降采样后的所述已重构图像数据生成模板,其中,所述第一已重构图像块对应于所述第一运动矢量且属于第一帧,所述第二已重构图像块对 应于所述第二运动矢量且属于第二帧;
    基于所述模板和降采样后的所述已重构图像数据进行匹配,以得到匹配结果。
  36. 根据权利要求35所述的设备,其特征在于,所述匹配单元进一步用于:
    利用N个第三已重构图像块的降采样后的所述已重构图像数据,分别与所述模板进行匹配,其中,所述N个第三已重构图像块对应于N个第三运动矢量且属于所述第一帧;
    利用M个第四已重构图像块的降采样后的所述已重构图像数据,分别与所述模板进行匹配,其中,所述M个第四已重构图像块对应于M个第四运动矢量且属于所述第二帧;
    所述获取单元进一步用于:
    基于所述匹配结果,从所述N个第三运动矢量中选择一个第三运动矢量,以及从所述M个第四运动矢量中选择一个第四运动矢量,所述一个第三运动矢量和所述一个第四运动矢量作为所述当前图像块的运动矢量,或者用于确定所述当前图像块的运动矢量。
  37. 根据权利要求36所述的设备,其特征在于,所述第三运动矢量包括所述第一运动矢量,所述第四运动矢量包括所述第二运动矢量。
  38. 根据权利要求36或37所述的设备,其特征在于,所述N个第三运动矢量中的至少部分运动矢量是基于所述第一运动矢量进行偏移得到,所述M个第四运动矢量中的至少部分运动矢量是基于所述第二运动矢量进行偏移得到的。
  39. 根据权利要求36至38中任一项所述的设备,其特征在于,所述N等于所述M。
  40. 根据权利要求35至39中任一项所述的设备,其特征在于,所述第一帧为所述当前图像块的前向帧,所述第二帧是所述当前图像块的后向帧;或,
    所述第一帧为所述当前图像块的前向帧,所述第二帧是所述当前图像块的前向帧。
  41. 根据权利要求34所述的设备,其特征在于,所述初始运动矢量包括K个第五运动矢量,所述匹配单元进一步用于:
    利用K个第五已重构图像块的邻近已重构图像块的降采样后的所述已重构图像数据,分别与所述当前图像块的邻近已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到所述匹配结果,其中,所述K个第五已重构图像块与所述K个第五运动矢量一一对应;
    所述获取单元进一步用于:
    基于所述匹配结果,从所述K个第五运动矢量中,选择一个所述第五运动矢量作为所述当前图像块的运动矢量,或者用于确定所述当前图像块的运动矢量。
  42. 根据权利要求34所述的设备,其特征在于,所述初始运动矢量包括W个第六运动矢量;
    所述匹配单元进一步用于:
    针对W个运动矢量对中每个运动矢量对对应的两个所述已重构图像块,将其中一个所述已重构图像块的降采样后的所述已重构图像数据与另一个所述已重构图像块的降采样后的所述已重构图像数据进行匹配,以得到所述匹配结果,其中,每个运动矢量对包括一个第六运动矢量以及一个基于所述第六运动矢量确定的第七运动矢量;
    所述获取单元进一步用于:
    基于所述W个运动矢量对对应的匹配结果,选择一个运动矢量对,其中,选择的运动矢量对中的第六运动矢量作为所述当前图像块的运动矢量,或者用于确定所述当前图像块的运动矢量。
  43. 根据权利要求42所述的设备,其特征在于,所述第七运动矢量是在运动轨迹是连续的假定下基于所述第六运动矢量确定的。
  44. 根据权利要求42或43所述的设备,其特征在于,所述第六已重构图像块属于所述当前图像块所属的帧的前向帧,所述第七已重构图像块属于所述当前图像块所属的帧的后向帧。
PCT/CN2018/081651 2018-04-02 2018-04-02 用于视频处理的方法和设备 WO2019191889A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/081651 WO2019191889A1 (zh) 2018-04-02 2018-04-02 用于视频处理的方法和设备
CN201880012518.3A CN110337810B (zh) 2018-04-02 2018-04-02 用于视频处理的方法和设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/081651 WO2019191889A1 (zh) 2018-04-02 2018-04-02 用于视频处理的方法和设备

Publications (1)

Publication Number Publication Date
WO2019191889A1 true WO2019191889A1 (zh) 2019-10-10

Family

ID=68099798

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081651 WO2019191889A1 (zh) 2018-04-02 2018-04-02 用于视频处理的方法和设备

Country Status (2)

Country Link
CN (1) CN110337810B (zh)
WO (1) WO2019191889A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462190A (zh) * 2020-04-20 2020-07-28 海信集团有限公司 一种智能冰箱及食材录入方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113329228B (zh) * 2021-05-27 2024-04-26 杭州网易智企科技有限公司 视频编码方法、解码方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010016010A1 (en) * 2000-01-27 2001-08-23 Lg Electronics Inc. Apparatus for receiving digital moving picture
EP1662800A1 (en) * 2004-11-30 2006-05-31 Humax Co., Ltd. Image down-sampling transcoding method and device
CN101459842A (zh) * 2008-12-17 2009-06-17 浙江大学 一种空间降采样解码方法和装置
CN101605262A (zh) * 2009-07-09 2009-12-16 杭州士兰微电子股份有限公司 可变块尺寸运动预测方法和装置
CN102647594A (zh) * 2012-04-18 2012-08-22 北京大学 一种整像素精度运动估计方法及其系统
CN102790884A (zh) * 2012-07-27 2012-11-21 上海交通大学 一种基于分层运动估计的搜索方法及其实现系统
CN106210449A (zh) * 2016-08-11 2016-12-07 上海交通大学 一种多信息融合的帧率上变换运动估计方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003153269A (ja) * 2001-11-08 2003-05-23 Mitsubishi Electric Corp 動きベクトル検出装置、それを複数用いた動きベクトル検出システムおよび動きベクトル検出方法
BRPI0910477A2 (pt) * 2008-04-11 2015-09-29 Thomson Licensing método e equipamento para predição de equiparação de gabarito (tmp) na codificação e decodificação de vídeo
KR101783990B1 (ko) * 2012-12-21 2017-10-10 한화테크윈 주식회사 디지털 영상 처리 장치 및 영상의 대표 움직임 예측 방법
KR102138368B1 (ko) * 2013-07-19 2020-07-27 삼성전자주식회사 적응적 샘플링에 기초한 계층적 움직임 예측 방법 및 움직임 예측 장치
US10958927B2 (en) * 2015-03-27 2021-03-23 Qualcomm Incorporated Motion information derivation mode determination in video coding
CN106454349B (zh) * 2016-10-18 2019-07-16 哈尔滨工业大学 一种基于h.265视频编码的运动估计块匹配方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010016010A1 (en) * 2000-01-27 2001-08-23 Lg Electronics Inc. Apparatus for receiving digital moving picture
EP1662800A1 (en) * 2004-11-30 2006-05-31 Humax Co., Ltd. Image down-sampling transcoding method and device
CN101459842A (zh) * 2008-12-17 2009-06-17 浙江大学 一种空间降采样解码方法和装置
CN101605262A (zh) * 2009-07-09 2009-12-16 杭州士兰微电子股份有限公司 可变块尺寸运动预测方法和装置
CN102647594A (zh) * 2012-04-18 2012-08-22 北京大学 一种整像素精度运动估计方法及其系统
CN102790884A (zh) * 2012-07-27 2012-11-21 上海交通大学 一种基于分层运动估计的搜索方法及其实现系统
CN106210449A (zh) * 2016-08-11 2016-12-07 上海交通大学 一种多信息融合的帧率上变换运动估计方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462190A (zh) * 2020-04-20 2020-07-28 海信集团有限公司 一种智能冰箱及食材录入方法
CN111462190B (zh) * 2020-04-20 2023-11-17 海信集团有限公司 一种智能冰箱及食材录入方法

Also Published As

Publication number Publication date
CN110337810A (zh) 2019-10-15
CN110337810B (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
US11375226B2 (en) Method and apparatus of video coding with affine motion compensation
US11750818B2 (en) Inter-prediction mode based image processing method, and apparatus therefor
TWI617185B (zh) 具有仿射運動補償的視訊編碼的方法以及裝置
CN110115032B (zh) 用于视频编解码的运动细化的方法以及装置
US20190058896A1 (en) Method and apparatus of video coding with affine motion compensation
JP2022022228A (ja) マルチリファレンス予測のための動きベクトルの精密化
TW202005383A (zh) 部分成本計算
KR102642784B1 (ko) 모션 벡터 리파인먼트를 위한 제한된 메모리 액세스 윈도우
US11997254B2 (en) Video processing method and device thereof
CN111279701B (zh) 视频处理方法和设备
US9473787B2 (en) Video coding apparatus and video coding method
TW202044840A (zh) 視訊編碼中具有運動精化的雙向預測視訊處理方法和裝置
US20190349589A1 (en) Image processing method based on inter prediction mode, and apparatus therefor
WO2019191889A1 (zh) 用于视频处理的方法和设备
US9706221B2 (en) Motion search with scaled and unscaled pictures
WO2020252707A1 (zh) 视频处理方法和设备
WO2021056205A1 (zh) 一种视频处理方法、设备及存储介质
TW202005388A (zh) 交織預測的應用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18913682

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18913682

Country of ref document: EP

Kind code of ref document: A1