WO2020006690A1 - Procédé et dispositif de traitement vidéo - Google Patents

Procédé et dispositif de traitement vidéo Download PDF

Info

Publication number
WO2020006690A1
WO2020006690A1 PCT/CN2018/094387 CN2018094387W WO2020006690A1 WO 2020006690 A1 WO2020006690 A1 WO 2020006690A1 CN 2018094387 W CN2018094387 W CN 2018094387W WO 2020006690 A1 WO2020006690 A1 WO 2020006690A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
image
reference image
image block
reference data
Prior art date
Application number
PCT/CN2018/094387
Other languages
English (en)
Chinese (zh)
Inventor
李蔚然
郑萧桢
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2018/094387 priority Critical patent/WO2020006690A1/fr
Priority to CN201880039240.9A priority patent/CN110832861A/zh
Publication of WO2020006690A1 publication Critical patent/WO2020006690A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding

Definitions

  • Embodiments of the present application relate to the field of video encoding and decoding, and more particularly, to a video processing method and device.
  • inter-prediction of image coding and decoding the more similar the selected reference image and the current image to be coded, the smaller the residuals generated by inter-prediction, thereby improving the coding efficiency of inter-prediction.
  • Some existing technologies can use each image in the video to construct a high-quality specific reference image containing the background content of the scene, such as a long-term reference frame. And thus it is possible to perform inter prediction using the specific reference image.
  • all motion vectors can be set to zero. It is considered that the background image in the long-term reference frame generated by the construction does not have any motion, and zero motion is directly used.
  • the vector is used as the motion vector of the current coding block in the generated long-term reference frame, and no motion search is performed.
  • the embodiments of the present application provide an image processing method and device, which can avoid loss of video coding performance, and can reduce bandwidth pressure and improve coding efficiency.
  • a video processing method which includes: determining a sub-image block from a current image block of a current image; and acquiring an application from a first region located in a reference image and corresponding to the position of the current image block.
  • Reference data for inter-prediction of the sub-image block; inter-prediction of the sub-image block using the reference data; and using the pixels of the current image block after reconstruction to the first region are updated.
  • a processing device including: a determining unit configured to determine a sub-image block from a current image block of a current image; and an acquiring unit configured to determine a position from a reference image and a position of the current image block.
  • obtaining reference data for inter-prediction of the sub-image block a prediction unit for inter-prediction of the sub-image block using the reference data; updating a processing unit, And is configured to perform update processing on pixels in the first region by using pixels of the current image block after reconstruction.
  • a computer system including: a memory for storing computer-executable instructions; a processor for accessing the memory and executing the computer-executable instructions to perform the method of the first aspect.
  • a computer storage medium stores program code, where the program code may be used to instruct execution of the method of the first aspect.
  • a computer program product includes program code, and the program code may be used to instruct to execute the method of the first aspect.
  • the reference data used for inter prediction is obtained from the reference image, and the acquired range does not exceed the reference image and the current image.
  • the region corresponding to the block position, and the reconstructed current image block is used to update the region corresponding to the reference image, thereby avoiding the loss of video coding performance caused by using only zero motion vectors, and the reference.
  • the pixels of the image are updated on a block-by-block basis to reduce the bandwidth pressure, and the problem of low coding efficiency caused by an excessive search range can be avoided.
  • FIG. 1 is a schematic diagram of a coding process of a plurality of coding units.
  • FIG. 2 is a schematic diagram of a video processing method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of image block division of an image according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of sub-image block division of an image block according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an area for acquiring reference data according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a filtering completion sequence of an image block according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a positional relationship between an image block and a sub-image block according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a search area for acquiring reference data according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a video processing device according to an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a computer system according to an embodiment of the present application.
  • a video is made up of multiple images.
  • different images in the video can use different prediction methods.
  • the image can be divided into an intra-prediction image and an inter-prediction image.
  • the inter-prediction image may include a forward prediction image and a bi-directional prediction image.
  • I picture is an intra prediction picture, also called a key frame
  • P picture is a forward prediction picture, that is, a P picture or I picture that has been previously encoded / decoded is used as a reference picture
  • B picture is a bidirectional prediction picture, that is, The front and back images are used as reference images.
  • An implementation method is to encode / decode the multiple pictures to generate a group of pictures (GOP) after encoding / decoding.
  • the GOP consists of an I picture and multiple B pictures (or bidirectional prediction). Picture) and / or P picture (or forward prediction picture).
  • the decoder When the decoder is playing, it reads the GOP one by one, decodes it, reads the picture, and then renders it.
  • images of different resolutions can be encoded / decoded by dividing the image into multiple small blocks, that is, the image can be divided into multiple image blocks.
  • the image can be divided into any number of image blocks.
  • the image can be divided into an array of m ⁇ n image blocks.
  • the image block may have a rectangular shape, a square shape, a circular shape, or any other shape.
  • An image block can have any size, such as p ⁇ q pixels.
  • Each image block can have the same size and / or shape.
  • two or more image blocks may have different sizes and / or shapes.
  • An image patch may or may not have any overlapping portions.
  • the image block is referred to as a macroblock or a maximum coding unit (LCU) in some encoding / decoding standards.
  • a macroblock For the H.264 standard, an image block is called a macroblock, and its size can be 16 ⁇ 16 pixels.
  • HEVC high efficiency video coding
  • an image block is called a maximum coding unit, and its size can be 64 ⁇ 64 pixels.
  • an image block may not be a macro block or a maximum coding unit, but a portion containing a macro block or a maximum coding unit, or at least two complete macro blocks (or maximum coding units). Contains at least one complete macro block (or maximum coding unit) and a portion of one macro block (or maximum coding unit), or contains at least two complete macro blocks (or maximum coding unit) and some macro blocks (or maximum coding) Unit). In this way, after the image is divided into a plurality of image blocks, these image blocks in the image data can be encoded / decoded separately.
  • the encoding process includes prediction, transformation, quantization, and entropy encoding.
  • prediction includes two types of intra prediction and inter prediction, the purpose of which is to remove redundant information of the current image block to be encoded by using prediction block data.
  • the intra prediction obtains reference data (for example, prediction block data) by using information of the frame image.
  • Inter prediction uses the information of the reference image to obtain reference data.
  • the process includes dividing the current image to be encoded into several image blocks to be encoded, and then dividing the image block to be encoded into several sub-image blocks. Then, for each sub-image block, The reference image is searched for the image block that most closely matches the current sub-image block as the predicted image block. The relative displacement between the predicted image block and the current sub-image block is the motion vector. Thereafter, the corresponding sub-image block and the predicted image block are corresponding. The pixel values are subtracted to get the residual.
  • the residuals corresponding to the obtained sub-image blocks are combined to obtain the residuals of the image blocks to be
  • a correlation matrix may be used to remove the correlation of residuals of image blocks or sub-image blocks, that is, redundant information of image blocks or sub-image blocks is removed in order to improve coding efficiency.
  • the transformation of the data block in the image block or sub-image block usually uses two-dimensional transformation, that is, the residual information of the data block is respectively multiplied with an N ⁇ M transformation matrix and its transposition matrix at the encoding end, and then multiplied to obtain Is the transform coefficient. Transform coefficients are quantized to obtain quantized coefficients. Finally, the quantized coefficients are entropy-coded to obtain an entropy-coded bitstream.
  • the entropy-coded bitstream and the encoded coding mode information such as information such as the intra prediction mode and the motion vector (or motion vector residual), are stored or sent to the decoding end.
  • the entropy-decoded bitstream is obtained and then the entropy decoding is performed to obtain the corresponding residuals; the predicted image block corresponding to the sub-image block is found based on the decoded motion vector, intra prediction and other information; according to the predicted image block and the residual The difference obtains the value of each pixel in the current sub-image block.
  • a reference image may be constructed to improve the similarity between the reference image and the current image to be encoded / decoded.
  • a specific type of encoding / decoding scene in the video content in which the background does not change, only the foreground in the video changes or moves.
  • video surveillance belongs to this type of scene.
  • the surveillance camera In a video surveillance scene, the surveillance camera is usually stationary or only moves slowly, and it can be considered that the background is basically unchanged.
  • objects such as people or cars that are captured in video surveillance lenses often move or change, and it can be considered that the foreground changes frequently.
  • a specific reference image can be constructed, and the specific reference image can optionally only contain high-quality background information.
  • the specific reference image may include multiple image blocks, and any one image block may be taken from a decoded image.
  • Different image blocks in the long-term reference image may be obtained from different decoded images.
  • the background portion of the current image to be encoded / decoded can be referred to the long-term reference image, thereby reducing the residual information of inter prediction, thereby improving encoding / decoding efficiency.
  • the specific reference image may be referred to as a composite reference frame or a composite frame (composite reference).
  • the short-term reference image is a concept corresponding to the long-term reference image.
  • the short-term reference image exists in the reference image buffer for a period of time. After the decoded reference image after the short-term reference image is moved into and out of the reference image buffer, the short-term reference image is removed from the reference image buffer.
  • the reference image buffer may also be referred to as a reference image list buffer, a reference image list, a reference frame list buffer, or a reference frame list, etc., which are collectively referred to herein as a reference image buffer.
  • the long-term reference image (or a part of the data in the long-term reference image) can always exist in the reference image buffer.
  • the long-term reference image (or a part of the data in the long-term reference image) is not subject to the decoded reference image in the reference image buffer.
  • the long-term reference image (or a part of the data in the long-term reference image) will be removed from the reference image buffer only when the decoder sends an update instruction operation.
  • Short-term reference pictures and long-term reference pictures may be called differently in different standards.
  • short-term reference pictures are called short-term in H.264 / advanced video coding (AVC) or H.265 / HEVC.
  • Reference frames (short-term references), and long-term reference images are referred to as long-term references.
  • AVC advanced video coding
  • Reference frames (short-term references), and long-term reference images are referred to as long-term references.
  • the long-term reference image is called For the background frame (background picture).
  • standards such as VP8 and VP9 long-term reference images are called golden frames.
  • referring to a long-term reference image as a long-term reference frame does not mean that H.264 / AVC or H.265 / In technologies corresponding to standards such as HEVC.
  • the above-mentioned constructed specific reference image may be a long-term reference image. That is, the long-term reference image may be obtained by constructing image blocks taken from multiple decoded images, or may be obtained by updating an existing reference frame (for example, a pre-stored reference frame) by using multiple decoded images.
  • the specific reference image of this structure may also be a short-term reference image.
  • the long-term reference image may not be a structured reference image.
  • the long-term reference image or the constructed reference image mentioned above may be an image that is not output.
  • the long-term reference image (or structured reference frame) uses the same motion search as the short-term reference image (or non-structured reference frame), it is assumed here that the short-term reference image needs to search for a larger area, then there is at least two disadvantages.
  • the image block 1 to be encoded is selected for updating the long-term reference image, it needs to be used for updating after the reconstruction is completed. It can be seen that at this time, its adjacent image block 2 to be encoded has been The inter-frame search is completed.
  • the pipeline structure and pipeline stages of the encoding end are the performance of the hardware itself.
  • the decoder cannot know the content that is not specified by the standard.
  • the encoding end uses the image block 1 to be updated, the long-term reference image is updated.
  • the image block 2 to be coded uses the region of the image block 1 to be coded for the long-term reference image, which will cause a problem that the decoding end cannot decode correctly.
  • the mode decision in FIG. 1 may be selecting a prediction mode.
  • it may be selecting inter-frame coding or intra-frame coding, or specifically selecting which method of inter-frame coding or intra-frame coding is used; other coding steps may be transforming, Quantization and entropy coding; reconstruction is used for pixel reconstruction.
  • each selected image block for updating the long-term reference image is used to update the long-term reference image.
  • the specific implementation process of this method may be as follows: for the image frame to be encoded, the prediction process is completed one by one by the coding unit to obtain an unfiltered reconstructed image that is added to the predicted value after inverse transform; The entire frame image is filtered to obtain the final reconstructed image of the entire frame; among the final reconstructed image of the entire frame, a coding unit that is used to update the long-term reference image is selected to update the long-term reference image.
  • this implementation manner may be referred to as frame-level long-term reference image refresh.
  • the long-term reference image generated by the used structure is obtained before encoding the current frame, and the step of updating the long-term reference image of the current frame is after the current frame encoding.
  • FIG. 2 is a schematic flowchart of a video processing method 200 according to an embodiment of the present application.
  • the method 200 may be implemented by an encoding end or a decoding end.
  • the video processing device for implementing the method 200 mentioned below may be an encoder or a part of an encoder, or may be a part of a decoder or a decoder.
  • the method 200 includes at least part of the following content.
  • the video processing device determines a sub-image block from a current image block of the current image.
  • the current image may be divided into one or more image blocks.
  • the image block may be a Coding Tree Unit (CTU).
  • the CTU is the encoding carrier of the image block, and contains encoding mode information, or residual information, or transform coefficient information, or intra prediction information, or inter prediction information.
  • the coding tree unit is called differently in different standards.
  • the coding tree unit may also be called a macroblock.
  • the current image may be divided into several image blocks. Although the sizes and shapes of the image blocks shown in FIG. 3 are consistent, it should be understood that the embodiments of the present application are not limited thereto.
  • the sub-picture block may also be called a coding unit (Coding unit). It needs to be known that the coding tree unit is called differently in different standards. For example, in the H.264 / AVC standard, the coding unit It can also be called a subblock.
  • the image block may also be a coding unit, and the sub-image block may also be a prediction unit.
  • the number of image blocks, the size of the image blocks, and / or the shape of the image blocks divided by the images of different frames may be different or the same.
  • one image block may be divided into one or several sub-image blocks
  • FIG. 4 illustrates multiple division manners from the image block to the sub-image block.
  • an image block may be divided into one or more sub-image blocks.
  • the size and / or shape of the multiple sub-image blocks may be the same or different.
  • FIG. 4 only shows that one image block is divided into one or two or four sub-image blocks, it should be understood that the embodiments of the present application are not limited thereto, and one image block may be divided into other numbers of sub-image blocks.
  • the number of sub-image blocks divided by different image blocks, the size of the sub-image blocks, and / or the shape of the sub-image blocks may be different or the same.
  • the video processing device obtains reference data for inter prediction of the sub-image block.
  • the size and / or shape of the first region in the reference image may be equal to the size and / or shape of the current image block.
  • the first region mentioned in the embodiment of the present application may be a region in which the position of the first region in the reference image may be equal to the position of the current image block in the current image; or, the first region is in the reference
  • the position of the image has a certain offset with respect to the position of the current image block in the current image.
  • the size of the offset can be determined according to specific conditions, which is not specifically limited in the embodiments of this application; or the first region is in the reference image.
  • the position of is relative to the position of the current image block in the current image, and there is a certain scaling relationship, which is not specifically limited in this embodiment of the present application.
  • the position mentioned in the embodiment of the present application may be pixel coordinates.
  • the reference image mentioned in the embodiment of the present application may belong to a long-term reference image, / or a construction frame, and / or a frame that is not to be output.
  • the video processing device may first determine whether the reference image is a specific type of reference image, and if so, may follow the embodiments of the present application.
  • the method performs acquisition of reference data and / or subsequent pixel update processing.
  • the specific type of reference image has at least one of the following properties: a reference image that is not to be output, a long-term reference image, and a construction frame.
  • the specific type of reference picture may be a constructed long-term reference picture, or may be a non-output and constructed frame.
  • the type of the reference frame may be identified by a special field in the code stream structure.
  • the method of the embodiment of the present application may not be used to obtain the reference data.
  • the reference data may not be obtained from an area located in the reference image and corresponding to the current image block in the current image position.
  • reference data is obtained for an image block as a coding unit, two or two In the above coding unit.
  • each sub-image block in a single image block may be separately judged whether the corresponding reference image is a specific type of reference image.
  • the types of reference images corresponding to different sub-image blocks may be different.
  • the reference image corresponding to each image block is a specific type of reference image.
  • different images The types of reference images corresponding to the blocks may be different, but the types of reference images or reference images corresponding to each sub-image block in the same image block may be the same.
  • the reference image corresponding to each image frame is a specific type of reference image.
  • different image frames correspond to The types of reference images may be different, but the types of reference images or reference images corresponding to each image block in the same image frame may be the same.
  • the reference data is obtained from the first region of the reference image;
  • the reference data is obtained from the first region of the reference image.
  • various types of reference images may have corresponding identifiers.
  • the video processing device when it is determined that the reference image has an identifier of a long-term reference image, the video processing device obtains the reference data from the first region of the reference image.
  • the video processing device when it is determined that the reference image has an identifier that is not to be output, the video processing device obtains the reference data from the first region of the reference image.
  • the video processing device when it is determined that the reference image has an identifier for constructing a frame, the video processing device obtains the reference data from the first region of the reference image
  • the video processing device when it is determined that the reference image has an identifier that is not to be output, and it is further determined that the reference image has an identifier that constructs a frame, the video processing device starts from within the first region of the reference image, Acquiring the reference data.
  • the image may have a flag indicating whether it is an output frame. When an image is indicated as not being output, it indicates that the frame is a reference image. Further, it is determined whether the frame has a flag for constructing a frame. , It is necessary to update the pixels of the frame, and then the method 200 may be used in prediction and / or pixel update. If an image is instructed to be output, it is not necessary to determine whether it is a structured frame, and it is directly determined that the frame does not need to be updated with pixels and / or method 200 is not used. Alternatively, if an image is instructed not to be output, but has an identification that is not a structured frame, it may be determined that the frame does not need to be updated with pixels and / or method 200 is not used.
  • the embodiment of the present invention is adopted. Method to obtain reference data.
  • the reference image is a long-term reference image
  • the reference image is a structured reference image
  • the reference image is a non-output image
  • the reference image is a non-output image
  • it is further determined that the reference image is a structured reference image.
  • the video processing device obtains the reference data used for inter prediction of the sub-image block from a first region located in the reference image corresponding to the position of the current image block. It means that the region from which the reference data is obtained may be equal to the first region, or may be a partial region of the first region (hereinafter referred to as the second region).
  • a portion filled with a vertical bar (a rectangle, and a partial area is partially blocked by a black fill) may be a current image block, and a black filled portion in the current image block may be a current sub image block.
  • the portion filled with diagonal stripes (rectangular, part of the area is partially blocked by gray filling) may be the first area mentioned above.
  • the size and shape of the first area may be equal to the size and shape of the current image block.
  • obtaining the reference data from the first area may be searching for the first area (the part filled with diagonal stripes) to obtain the reference data, or part of the area within the first area (the gray filled part). ) Search for reference data.
  • the size and / or shape of the area in the reference image for obtaining reference data for different sub-image blocks may be the same, or Is not the same.
  • the size and / or shape of the regions for obtaining reference data corresponding to different image blocks may be the same or different.
  • the center point of the second region for acquiring the reference data in the first region may coincide with the center point of the first region.
  • the search range may be narrowed inward based on the first region according to its technical content. Or disable the technology (for example, sub-pixel search is no longer used, but full-pixel search is used) to ensure that the final reference data can be completely generated from the area corresponding to the current image block position in the reference image.
  • At least a part of pixels of the other regions in the first region except the second region may be Configured to: obtain a pixel value of at least one pixel in the second area.
  • the reference data when the reference data is obtained from a second region in the first region, at least a part of pixels of the other regions in the first region except the second region are used for Yu: Perform an interpolation operation with at least a part of the pixels in the second region.
  • an interpolation operation may be performed between pixels in a surrounding area of the second area (that is, a partial area other than the second area in the first area) and pixels in the second area to obtain a first The pixel value of at least one pixel of the two regions.
  • the video processing device may perform a sub-pixel search on the second region to obtain a first search result; and obtain the reference data based on the first search result.
  • the pixel positions of the sub-pixels mentioned here are located in the second region.
  • the video processing device may perform an entire pixel search on the first region to obtain a second search result; and obtain the reference data based on the second search result.
  • the whole-pixel search and the sub-pixel search may coexist, and the area of the sub-pixel search may be smaller than the area of the whole-pixel search.
  • the region may be the second region described above.
  • an entire pixel search may also be performed in a region smaller than the first region.
  • the area searched by the whole pixel may be equal to the area searched by the sub-pixel, that is, the second area mentioned above.
  • the search range (which may be equal to the second area or the first area) for acquiring the reference data in the first area is smaller than or equal to the search range of the specific area.
  • an initial range value may be set for the area where the reference data is obtained in the reference image. If the range of the first area or the reduced area (for example, for sub-pixel search) is less than or equal to the initial value, then Use the first region or the indented region to search to obtain reference data; or, if the range of the first region or the indented region is greater than the initial value, the first region may be indented or indented The subsequent area is further indented to obtain an area of the same size as the initial value for obtaining reference data. Therefore, using a smaller search area in this embodiment can further improve the efficiency of encoding or decoding.
  • the specific region is a region for obtaining reference data in a non-structured frame.
  • the specific image block is an area for obtaining reference data in the short-term reference image.
  • the specific area may be other areas, for example, it may be a preset area with a specific size.
  • the method 200 in the embodiment of the present application can be used at the encoding end and the decoding end.
  • a flag bit is carried in the code stream to indicate to the decoding end that the position is located in the reference image and the position of the current image block.
  • reference data is obtained for performing inter prediction on the sub-image block.
  • a flag bit is obtained after decoding and the flag bit is used to indicate to the decoding end:
  • the sub-image is acquired from the first region in the reference image according to the method of the embodiment of the present application. Reference data for the block.
  • the video processing device may determine the first region from the reference image according to the position of the current image block in the current image.
  • the reference data is obtained from the first area.
  • the first area may be searched to obtain reference data, or a partial area of the first area may be searched to obtain reference data.
  • the encoding end may transmit the motion vector information corresponding to the first region to the encoding end through the code stream, and the decoding end may determine the first region according to the motion vector information in the code stream.
  • the encoder can pass the motion vector information of the first region to the decoder, and the decoder can determine the first region based on the motion vector information, and then The second region is determined in the first region (for example, according to a preset rule, or according to the number of pixels required to perform the interpolation operation).
  • the encoding end may pass the motion vector information of the first region to the decoding end, and the decoding end may use the motion vector information and other information (such as , Directly determine the second region (the motion vector corresponding to the second region) according to a preset rule, or according to the number of pixels required for the interpolation operation, or the information in the code stream that needs to be indented for the first region
  • the information is equal to the motion vector information corresponding to the first region).
  • the encoding end may pass the motion vector information of the partial area to the decoding end, and the decoding end may determine the second area based on the motion vector information.
  • the area determined for the reference data acquisition is not the area corresponding to the current image block in the reference image
  • the code stream does not meet the standard specifications.
  • the decoding end may judge the motion vector pointing to the long-term reference frame generated by the construction, and the region prediction value pointed by the motion vector should not be a reference corresponding to the image block where the current sub-image block is located. Pixels outside the area of the frame are generated.
  • the decoding end judges the motion vector pointing to the reference image, and the reference data in the area pointed by this motion vector should not be the corresponding position area of the reference image in the image block where the current sub-image block is located. Outer pixels are generated. That is, the following conditions are met:
  • the area pointed by the motion vector should not contain any part of the image block other than the corresponding position in the reference image.
  • the reference data corresponding to the area pointed by the motion vector should all be generated by the pixels of the image block inside the corresponding position of the reference image.
  • the decoder can consider that the code stream does not meet the standard specifications.
  • a video processing device uses the reference data to perform inter prediction on the sub-image block.
  • the reference data mentioned in the embodiment of the present application may be a predicted image block.
  • the video processing device may subtract the corresponding pixel values of the sub-image block and the predicted image block to obtain a residual.
  • the video processing device may combine the residuals corresponding to the obtained sub-image blocks to obtain the residuals of the image block unit to be encoded.
  • transformation, quantization, and entropy coding can be performed in units of sub-image blocks or image blocks to obtain an entropy-coded bit stream.
  • the entropy-coded bitstream and the encoded coding mode information such as information such as the inter prediction mode and motion vector (or motion vector residual), are stored or sent to the decoding end.
  • For the decoding end after obtaining the entropy-coded bit stream, perform entropy decoding, inverse quantization, and inverse transformation to obtain the corresponding residuals; find the predicted image block corresponding to the sub-image block based on the decoded motion vector and inter prediction information. ; Get the value of each pixel in the current sub-image block according to the predicted image block and the residual.
  • the video processing device uses the pixels of the current image block after reconstruction, the video processing device performs update processing on the pixels in the first region.
  • the video processing device uses the pixels of the reconstructed image block unit to perform update processing on the pixels of the first image block (which may include before obtaining reference data), determine the Whether the reference image belongs to a specific type of reference image (long-term reference image and / or construction frame), and if so, the pixels in the image block after reconstruction may be used to update pixels in the first region.
  • a specific type of reference image long-term reference image and / or construction frame
  • the pixels of the corresponding region in the reference image may be updated, or after the reconstruction of multiple image blocks, multiple images in the reference image may be updated.
  • the pixels in the corresponding area of the block are updated, or after the entire frame of image is reconstructed, the reference image is updated by using the entire frame of image.
  • the update processing performed by the video processing device on the pixels in the first area does not necessarily mean that the pixels in the first area must be changed.
  • the update process may or may not change the pixels in the first area.
  • the update process mentioned here may include a step of determining whether to perform pixel change.
  • the pixel update may be performed without performing the determination here.
  • the video processing device may determine whether pixels in the first region need to be updated.
  • a flag bit in the image block indicating whether to refresh pixel information in the first region may be decoded, and whether the first region needs to be updated with the pixels of the current image block is determined according to the flag bit.
  • the result obtained by using the pixels of the reference image to change the pixels of the first region may be that the pixel values of all or part of the pixels of the first region remain unchanged.
  • the pixel value of the pixel of the current image block may be directly replaced with the pixel value of the pixel at the corresponding position in the first region.
  • the further replacement method is: point-by-point replacement according to pixel points, or using one pixel point in the current image block to replace several pixel points in the first region, or using several pixel points in the current image block. , Replacing a pixel point in the first region after weighted averaging.
  • the pixel value of the pixel of the current image block and the pixel value of the pixel at the corresponding position in the first region may be weighted to obtain a pixel value of the pixel in the updated first region.
  • the pixels in the first region are updated by using the filtered pixels of the current image block.
  • the video processing device may first use the filtered part of the pixels to compare the part of the first region in the reference image with the part.
  • the pixels in the corresponding portion of the pixel position are updated.
  • a filtering operation may be performed with an adjacent image block unit that has been reconstructed and filtered, to obtain a final reconstructed image of the part of the pixel, and the reference image may be first used by the part of the pixel.
  • the pixels at corresponding positions in the center are updated.
  • the unfiltered part can be stored in the buffer first, and the adjacent image blocks are reconstructed and filtered After completion, the pixels of other parts of the current image block are filtered, and after the filtering is completed, the pixels of the corresponding position in the reference image are updated using the pixels of the other parts.
  • image block 3 is selected for updating the constructed long-term reference image
  • image blocks 1 and 2 have been mostly reconstructed and filtered, and the image block can be seen
  • the color-filled part of 3 requires the reconstructed values of image block 4 and image block 7 to be filtered, so the blue part is first placed in the buffer, the white unfilled part is filtered, and the final result is used to update the long-term after the filtering is completed.
  • the reference image when the image blocks 4 and 7 are encoded, and after the reconstruction and filtering are completed, the filtered data of the color-filled portion of the image block 3 is updated to the long-term reference image.
  • all image blocks of the current image may obtain reference data according to the video processing method in the embodiment of the present application and subsequently update pixels of corresponding regions in the reference image.
  • part of the image blocks of the current image may also be used to obtain reference data and update pixels of corresponding regions in the reference image according to the video processing method in the embodiment of the present application.
  • the video processing method of the embodiment of the present application obtains reference data and subsequent updates of pixels of corresponding regions in the reference image, and the other part of the image block may not use the video processing method of the embodiment of the present application to obtain reference data and The pixels of the corresponding area in the reference image are not updated.
  • image blocks mentioned above may be image blocks corresponding to the background part, and other image blocks may be image blocks corresponding to the foreground part.
  • Whether an image block belongs to the foreground part or the background part can be determined by the pixel change of the image block in the current image block compared to the previous frame or several frames. Of course, it can also be determined by other determination methods. This application The embodiment does not specifically limit this.
  • all the sub-image blocks can obtain reference data according to the video processing method in the embodiment of the present application.
  • some sub-image blocks of the image block may also obtain reference data according to the video processing method in the embodiment of the present application.
  • 8Tap is used for subpixel interpolation (that is, when subpixel interpolation is used, one of two interpolation directions (vertical direction and horizontal direction) requires two sides of the interpolation direction.
  • 4 integer pixel interpolation filters then when encoding a sub-image block to be encoded in an image block, if the corresponding reference frame is a long-term reference image generated by the construction, the search range can be determined according to the following steps :
  • the initial range of the search range (SR_LTx, SR_LTy) to (SR_RBx, SR_RBy) is obtained.
  • one area is characterized by two pixel positions, that is, Pixel locations in the upper left and lower right corners.
  • the range that can be used in the search interval is that the image block area is contracted inward in four directions. 4 pixels.
  • the maximum offset of the search area corresponding to the upper left corner is LTx + 4, LTy + 4, and the maximum offset of the search area corresponding to the lower right corner is RBx-4, RBy-4.
  • This step 7 can be an optional operation.
  • the reconstruction pixels of the currently-encoded map unit may be used to update pixels at corresponding positions of the long-term reference image.
  • the search range of the short-term reference image of the current image is plus or minus 64 pixel values. If the currently searched reference frame is a long-term reference image generated by construction, the search range can be determined as follows:
  • the initial search range of the current sub-image block is (-64, -64) to (64,64);
  • the final sub-pixel search interval in the constructed long-term reference image is (-28,4) to (-4,28).
  • step 7 Perform an entire pixel search in the range (-32,0) to (0,32). This step 7 can be optional.
  • the reconstruction pixels of the currently-encoded map unit may be used to update pixels at corresponding positions of the long-term reference image.
  • the search area of the current sub-image block in the long-term reference frame is limited to a region corresponding to the current image block position in the reference image block to ensure the current
  • the sub-image block does not use the pixel values of the corresponding positions of other image blocks during the codec reconstruction, so that both the encoding end and the decoding end can use the block-level refresh mechanism to update the long-term reference frame.
  • This method can be particularly used for specific Type of reference frame (for example, a constructed long-term reference image) during motion search.
  • the method of updating the reference image (for example, constructing a generated long-term reference frame) in the hardware is to directly update each image block after completing the reconstruction and filtering, which is referred to as a block level here.
  • the reference frame is refreshed.
  • the image blocks because the image blocks only use the corresponding data in the image block area in the reference image block, they will not be affected by the pre-sequence image block's effect on the long-term reference frame update. Long-term reference frames refresh the same results.
  • the reference data used for inter prediction is obtained from the reference image, and the acquired range does not exceed the reference image and the current image.
  • the region corresponding to the block position, and the reconstructed current image block is used to update (or refresh) the region corresponding to the reference image, thereby avoiding the loss of video coding performance caused by using only zero motion vectors
  • the pixel-by-block update method for pixels of the reference image can be implemented to reduce the bandwidth pressure, and the problem of low encoding efficiency caused by an excessive search range can be avoided.
  • FIG. 9 is a schematic block diagram of a video processing device 300 according to an embodiment of the present application.
  • the video processing device 300 may include a determination unit 310, an acquisition unit 320, a prediction unit 330, and an update processing unit 340.
  • a determining unit 310 configured to determine a sub-image block from a current image block of the current image
  • An obtaining unit 320 configured to obtain reference data for performing inter prediction on the sub-image block from a first region located in a reference image and corresponding to the current image block position;
  • a prediction unit 330 configured to perform inter prediction on the sub-image block by using the reference data
  • An update processing unit 340 is configured to perform an update process on pixels in the first region by using pixels of the current image block after reconstruction.
  • the reference image belongs to a long-term reference image, a construction frame, and / or a frame that is not to be output.
  • the obtaining unit 320 is further configured to:
  • the reference data is obtained from the first region of the reference image.
  • the obtaining unit 320 is further configured to:
  • the reference data is obtained from the first region of the reference image.
  • the obtaining unit 320 is further configured to:
  • the second region is a partial region of the first region.
  • At least a part of the pixels in the partial area other than the second area in the first area is used to: obtain a pixel value of at least one pixel in the second area .
  • At least a part of the pixels in the partial area other than the second area in the first area is used to perform an interpolation operation with at least a part of the pixels in the second area.
  • the obtaining unit 320 is further configured to:
  • the obtaining unit 320 is further configured to:
  • a search range for searching in the first area is less than or equal to a search range of a specific area.
  • the specific region is a region for obtaining reference data in a non-structured frame
  • the specific region is an image block used to obtain reference data in the short-term reference image.
  • the update processing unit 340 is further configured to:
  • the update processing unit 340 is further configured to:
  • the part of pixels in the first region corresponding to the part of the pixel position in the first region is updated using the filtered part of pixels.
  • a position of the first region in the reference image is the same as a position of the current image block in the current image.
  • the current image block is a coding tree unit CTU and the sub-image block coding unit; or the current image block is a coding unit and the sub-image block is a prediction unit.
  • the device 300 is used for an encoding end.
  • the determining unit 310 is further configured to:
  • the position of the current image block in the current image from the reference image, determine the first region and / or an area in the first region for acquiring the reference data.
  • the device 300 further includes:
  • the transmitting unit 350 is configured to carry a flag bit in the code stream, and is used to indicate to the decoding end that: from a first region located in the reference image and corresponding to the current image block position, obtaining information for Reference data for image block inter prediction.
  • the device 300 is used for a decoding end.
  • the determining unit 310 is further configured to:
  • the motion vector information in the code stream transmitted by the encoding end from the reference image, determine the first region and / or an area in the first region for acquiring the reference data.
  • the device 300 further includes:
  • the flag bit decoding unit 360 is configured to decode to obtain a flag bit, and the flag bit is used to indicate to the decoding end that: from a first region located in the reference image and corresponding to the current image block position, obtaining Reference data for performing inter prediction on the sub-image block.
  • the video processing device in the embodiment of the present application may be a chip, which may be implemented by a circuit, but the embodiment of the present application does not limit the specific implementation form.
  • FIG. 10 shows a schematic block diagram of a computer system 400 according to an embodiment of the present application.
  • the computer system 400 may include a processor 410 and further may include a memory 420.
  • the computer system 400 may also include components generally included in other computer systems, such as input-output devices, communication interfaces, and the like, which is not limited in the embodiments of the present application.
  • the memory 420 is configured to store computer-executable instructions.
  • the memory 420 may be various types of memory, for example, may include high-speed random access memory (Random Access Memory, RAM), and may also include non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. Examples are not limited to this.
  • RAM Random Access Memory
  • non-volatile memory such as at least one magnetic disk memory. Examples are not limited to this.
  • the processor 410 is configured to access the memory 420 and execute the computer-executable instructions to perform operations in the foregoing method for video processing in the embodiment of the present application.
  • the processor 410 may include a microprocessor, a field-programmable gate array (FPGA), a central processing unit (CPU), a graphics processing unit (GPU), and the like. Examples are not limited to this.
  • FPGA field-programmable gate array
  • CPU central processing unit
  • GPU graphics processing unit
  • the video processing device 300 and the computer system 400 in the embodiments of the present application may correspond to an execution subject of the video processing method in the embodiments of the present application, and the above and other operations and / or functions of the respective modules in the video processing device 300 and the computer system 400 In order to implement the corresponding processes of the foregoing methods, and for the sake of brevity, we will not repeat them here.
  • the embodiment of the present application further provides an encoder, which is used to implement the function of the encoding end in the embodiment of the present application, and may include the module for the encoding end in the video processing device in the embodiment of the present application or the computer system. .
  • the embodiment of the present application further provides a decoder, which is configured to implement the function of the decoding end in the embodiment of the present application, and may include a module for the decoding end in the video processing device of the embodiment of the present application or the above-mentioned computer system.
  • An embodiment of the present application further provides a codec, which includes the video processing device in the foregoing embodiment of the present application or includes the foregoing computer system.
  • An embodiment of the present application further provides an electronic device, and the electronic device may include a video processing device or a computer system of the foregoing various embodiments of the present application.
  • the electronic device can be an encoder, decoder, codec or video surveillance product.
  • the video processing device, computer system, and electronic device in the embodiments of the present application can be used in an unmanned aerial vehicle.
  • An embodiment of the present application further provides a computer storage medium, and the computer storage medium stores program code, where the program code may be used to instruct to perform the filtering method in the foregoing embodiment of the present application.
  • the term “and / or” is merely an association relationship describing an associated object, and indicates that there may be three relationships.
  • a and / or B can indicate: there are three cases of A alone, A and B, and B alone.
  • the character "/" in this text generally indicates that the related objects are an "or" relationship.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices, or units, or may be electrical, mechanical, or other forms of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions in the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially a part that contributes to the existing technology, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium. Included are instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de traitement d'images, qui peuvent empêcher une perte de performance lors d'un codage vidéo, atténuer une contrainte de largeur de bande et augmenter l'efficacité du codage. Le procédé comprend : à partir d'un bloc d'image actuel d'une image actuelle, la détermination d'un sous-bloc d'image ; à partir d'une première zone qui est située dans une image de référence et correspond à la position du bloc d'image actuel, l'acquisition de données de référence utilisées pour une prédiction entre trames du sous-bloc d'image ; à l'aide des données de référence, la réalisation d'une prédiction entre trames sur le sous-bloc d'image ; et l'utilisation d'un pixel du bloc d'image actuel reconstruit pour mettre à jour un pixel dans la première zone.
PCT/CN2018/094387 2018-07-03 2018-07-03 Procédé et dispositif de traitement vidéo WO2020006690A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/094387 WO2020006690A1 (fr) 2018-07-03 2018-07-03 Procédé et dispositif de traitement vidéo
CN201880039240.9A CN110832861A (zh) 2018-07-03 2018-07-03 视频处理方法和设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/094387 WO2020006690A1 (fr) 2018-07-03 2018-07-03 Procédé et dispositif de traitement vidéo

Publications (1)

Publication Number Publication Date
WO2020006690A1 true WO2020006690A1 (fr) 2020-01-09

Family

ID=69059442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094387 WO2020006690A1 (fr) 2018-07-03 2018-07-03 Procédé et dispositif de traitement vidéo

Country Status (2)

Country Link
CN (1) CN110832861A (fr)
WO (1) WO2020006690A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113556551B (zh) * 2020-04-23 2023-06-23 上海高德威智能交通系统有限公司 一种编码、解码方法、装置及设备
CN112565753B (zh) * 2020-12-06 2022-08-16 浙江大华技术股份有限公司 运动矢量差的确定方法和装置、存储介质及电子装置
CN116684610A (zh) * 2023-05-17 2023-09-01 北京百度网讯科技有限公司 确定长期参考帧的参考状态的方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272494A (zh) * 2008-01-25 2008-09-24 浙江大学 利用合成参考帧的视频编解码方法及装置
WO2015124110A1 (fr) * 2014-02-21 2015-08-27 Mediatek Singapore Pte. Ltd. Procédé de codage vidéo utilisant une prédiction basée sur une copie intra bloc d'image
CN105578196A (zh) * 2015-12-25 2016-05-11 广东中星电子有限公司 视频图像处理方法及设备
CN105847871A (zh) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 视频编解码方法及其装置
CN106331700A (zh) * 2015-07-03 2017-01-11 华为技术有限公司 参考图像编码和解码的方法、编码设备和解码设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8385404B2 (en) * 2008-09-11 2013-02-26 Google Inc. System and method for video encoding using constructed reference frame
CN101431675B (zh) * 2008-12-09 2010-12-08 青岛海信电子产业控股股份有限公司 一种像素运动估计方法和装置
CN101795409B (zh) * 2010-03-03 2011-12-28 北京航空航天大学 内容自适应分数像素运动估计方法
CN103167283B (zh) * 2011-12-19 2016-03-02 华为技术有限公司 一种视频编码方法及设备
CN106878737B (zh) * 2017-03-02 2019-10-08 西安电子科技大学 高效视频编码中的运动估计加速方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272494A (zh) * 2008-01-25 2008-09-24 浙江大学 利用合成参考帧的视频编解码方法及装置
WO2015124110A1 (fr) * 2014-02-21 2015-08-27 Mediatek Singapore Pte. Ltd. Procédé de codage vidéo utilisant une prédiction basée sur une copie intra bloc d'image
CN105847871A (zh) * 2015-01-16 2016-08-10 杭州海康威视数字技术股份有限公司 视频编解码方法及其装置
CN106331700A (zh) * 2015-07-03 2017-01-11 华为技术有限公司 参考图像编码和解码的方法、编码设备和解码设备
CN105578196A (zh) * 2015-12-25 2016-05-11 广东中星电子有限公司 视频图像处理方法及设备

Also Published As

Publication number Publication date
CN110832861A (zh) 2020-02-21

Similar Documents

Publication Publication Date Title
US11601640B2 (en) Image coding method using history-based motion information and apparatus for the same
TWI755376B (zh) 用於視訊寫碼之濾波器之幾何轉換
EP3780618A1 (fr) Procédé et dispositif d'obtention de vecteur de mouvement d'image vidéo
KR102606330B1 (ko) Aps 시그널링 기반 비디오 또는 영상 코딩
TWI639330B (zh) 具有內插參考圖像的視訊編解碼裝置及方法
TW201830963A (zh) 具有用於視頻寫碼之樣本存取之線性模型預測模式
TW201743619A (zh) 在視訊寫碼中適應性迴路濾波中之多個濾波器之混淆
CN111837396A (zh) 基于子图像码流视角相关视频编码中的误差抑制
US11671613B2 (en) Methods for signaling virtual boundaries and wrap-around motion compensation
CN112005551A (zh) 一种视频图像预测方法及装置
CN112385234A (zh) 图像和视频译码的设备和方法
US20200021850A1 (en) Video data decoding method, decoding apparatus, encoding method, and encoding apparatus
KR20190020083A (ko) 인코딩 방법 및 장치 및 디코딩 방법 및 장치
WO2020006690A1 (fr) Procédé et dispositif de traitement vidéo
JP2023521295A (ja) 映像符号化データをシグナリングするための方法
JP2023507259A (ja) ラップアラウンド動き補償を実行する方法
KR20230162989A (ko) 멀티미디어 데이터 프로세싱 방법, 장치, 디바이스, 컴퓨터-판독가능 저장 매체, 및 컴퓨터 프로그램 제품
CN112822498B (zh) 图像处理设备和执行有效去块效应的方法
CN114788284B (zh) 用于在调色板模式下对视频数据进行编码的方法和装置
US20200351493A1 (en) Method and apparatus for restricted long-distance motion vector prediction
CN115486074A (zh) 视频处理中的砖块和条带分割
JP2023504407A (ja) パレットモードを使用するための映像処理方法及び機器
WO2020182194A1 (fr) Procédé de prédiction inter-trames et dispositif associé
CN109672889B (zh) 约束的序列数据头的方法及装置
CN114902670A (zh) 用信号通知子图像划分信息的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18925306

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18925306

Country of ref document: EP

Kind code of ref document: A1