WO2022104678A1 - 视频编解码方法、装置、可移动平台和存储介质 - Google Patents

视频编解码方法、装置、可移动平台和存储介质 Download PDF

Info

Publication number
WO2022104678A1
WO2022104678A1 PCT/CN2020/130367 CN2020130367W WO2022104678A1 WO 2022104678 A1 WO2022104678 A1 WO 2022104678A1 CN 2020130367 W CN2020130367 W CN 2020130367W WO 2022104678 A1 WO2022104678 A1 WO 2022104678A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image block
area
inter
reference area
Prior art date
Application number
PCT/CN2020/130367
Other languages
English (en)
French (fr)
Inventor
周焰
郑萧桢
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/130367 priority Critical patent/WO2022104678A1/zh
Priority to CN202080070713.9A priority patent/CN114762331A/zh
Publication of WO2022104678A1 publication Critical patent/WO2022104678A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation

Definitions

  • the present invention relates to the technical field of video encoding and decoding, and in particular, to a video encoding and decoding method, device, movable platform and storage medium.
  • Video encoding and decoding technology includes compression at the encoding end and decompression at the decoding end.
  • the compression at the encoding end is to compress and encode the original video file through some encoding techniques to form a code stream, and then the decompression at the decoding end is to decode and reconstruct the code stream to form a video file.
  • the decoding process can be regarded as the reverse process of the encoding process.
  • the reference image is generally stored in the memory. When using a certain reference image to perform inter-frame prediction, it is necessary to use the reference image stored in the memory to perform inter-frame prediction. Due to the limitation of access bandwidth, how to efficiently utilize reference images to perform video coding and decoding technology is expected to solve the problem.
  • Embodiments of the present invention provide a video encoding and decoding method, apparatus, device, and storage medium, so as to efficiently use reference images to perform video encoding and decoding.
  • an embodiment of the present invention provides a video encoding method, which includes:
  • a reference area corresponding to the target coding area is determined in a reference image; wherein the reference image is stored in a first memory;
  • the size of the reference area is larger than the size of the target coding area
  • An encoding process is performed on the image of the target encoding region based on the result of the inter prediction operation.
  • the determining a reference area corresponding to the target coding area in the reference image based on the global motion vector includes:
  • a reference area corresponding to the target coding area is determined in the reference image.
  • the determining a reference area corresponding to the target coding area in the reference image based on the moving position includes:
  • an image area whose size is equal to the size of the reference area and covers the moving position is determined as a reference area corresponding to the target coding area.
  • the inter-frame prediction operation includes an integer-pixel motion estimation (Interger Motion Estimation, IME), and the size of the reference area is based on the range of motion search performed in the integer-pixel motion estimation process and the sub-pixel motion.
  • the size of the reserved pixels is determined in the process of estimating FME.
  • the inter prediction operation includes whole-pixel motion estimation and sub-pixel motion estimation
  • reading the image block of the reference area includes:
  • the storage unit of the IME and the storage unit of the FME are different from the first memory and the second memory;
  • an integer-pixel motion vector and a sub-pixel motion vector corresponding to the target coding region are respectively determined.
  • reading the image block of the reference area includes:
  • the performing an inter-frame prediction operation on the image of the target coding area based on the read image block of the reference area including:
  • an integer-pixel motion vector corresponding to the target coding area calculated in the process of integer-pixel motion estimation is obtained.
  • the first image block of the reference region is further used for sub-pixel motion estimation, so as to determine the optimal sub-pixel motion vector of the target coding region.
  • the inter prediction operation includes sub-pixel motion estimation, and the size of the first image block of the reference area is based on the range of motion search performed in the whole-pixel motion estimation process and the size of the pixel-by-pixel motion estimation.
  • the size of the reserved pixels is determined during the process.
  • the inter-frame prediction operation includes:
  • the motion vector obtained by the whole-pixel motion estimation and the image block related to the reference area used in the whole-pixel motion estimation process obtain the corresponding image of the target coding area calculated in the sub-pixel motion estimation process sub-pixel motion vector;
  • the image block of the reference area is the image block corresponding to the luminance component.
  • the sub-pixel motion vector corresponding to the target coding region is used in a coding unit decision-making operation, so as to determine the predicted value of the image region corresponding to the chrominance component of the to-be-coded image.
  • the global motion vector is determined based on the motion vector corresponding to the image block in the previous frame of the image to be encoded;
  • the global motion vector is obtained from an image signal processor
  • the global motion vector reflects the direction and distance in which the object in the image to be encoded is shifted in the reference image as a whole.
  • the inter-frame prediction operation includes a coding unit decision-making operation, the number of target coding regions is two, and the two target coding regions are the first image region corresponding to the luminance component of the to-be-coded image and the second target coding region respectively. the second image area corresponding to the chrominance component of the image to be encoded; and
  • the predicted value of the second image area is determined according to the motion vector corresponding to the luminance component and the image block of the reference area corresponding to the chrominance component of the image to be encoded.
  • the pixel data of the image block of the reference area corresponding to the luminance component of the image to be encoded is different from the pixel data of the image block of the reference area corresponding to the chrominance component of the image to be encoded, and the luminance component of the image to be encoded is different.
  • the image blocks of the corresponding reference area and the image blocks of the reference area corresponding to the chrominance components of the to-be-coded image have different sizes.
  • the inter-frame prediction operation includes integer-pixel motion estimation, sub-pixel motion estimation, coding unit decision operation, and mode decision operation, wherein the integer-pixel motion estimation and the sub-pixel motion estimation use the same number of pixels.
  • a first tile of one size the coding unit decision operation uses a second tile of a second size
  • the mode decision operation uses a third tile of a third size, the third size being larger than the first size and the second dimension.
  • the first hardware structure and the first storage mode corresponding to the inter-frame prediction operation are the same as the second hardware structure and the second storage mode corresponding to the inter-frame prediction operation in the decoding method corresponding to the encoding method.
  • an embodiment of the present invention provides a video decoding method, the method comprising:
  • the size of the reference area is larger than the size of the target coding area
  • decoding processing is performed on the image of the target decoding area.
  • the second hardware structure and the second storage mode corresponding to the inter-frame prediction operation are the same as the first hardware structure and the first storage mode corresponding to the inter-frame prediction operation in the encoding method corresponding to the decoding method.
  • the decision mode supported by the mode decision operation includes skip, merge, or amvp.
  • an embodiment of the present invention provides a video encoding apparatus, including a memory and a processor; wherein, executable code is stored on the memory, and when the executable code is executed by the processor, the Processor implementation:
  • a reference area corresponding to the target coding area is determined; wherein the reference image is stored in the first memory;
  • the size of the reference area is larger than the size of the target coding area
  • An encoding process is performed on the image of the target encoding region based on the result of the inter prediction operation.
  • the processor is used for:
  • a reference area corresponding to the target coding area is determined.
  • the processor is used for:
  • an image area whose size is equal to the size of the reference area and covers the moving position is determined as a reference area corresponding to the target coding area.
  • the inter prediction operation includes an integer-pixel motion estimation IME, and the size of the reference area is based on the range of motion search performed in the integer-pixel motion estimation process and the prediction in the sub-pixel motion estimation process of FME. The size of the remaining pixels is determined.
  • the inter-frame prediction operation includes whole-pixel motion estimation and sub-pixel motion estimation; the processor is configured to:
  • the storage unit of the IME and the storage unit of the FME are different from the first memory and the second memory;
  • an integer-pixel motion vector and a sub-pixel motion vector corresponding to the target coding region are respectively determined.
  • the processor is used for:
  • an integer-pixel motion vector corresponding to the target coding area calculated in the process of integer-pixel motion estimation is obtained.
  • the first image block of the reference region is further used for sub-pixel motion estimation, so as to determine the optimal sub-pixel motion vector of the target coding region.
  • the inter prediction operation includes sub-pixel motion estimation, and the size of the first image block of the reference area is based on the range of motion search performed in the whole-pixel motion estimation process and the size of the pixel-by-pixel motion estimation.
  • the reserved pixel size is determined during the process.
  • the processor is used for:
  • the motion vector obtained by the whole-pixel motion estimation and the image block related to the reference area used in the whole-pixel motion estimation process obtain the corresponding image of the target coding area calculated in the sub-pixel motion estimation process sub-pixel motion vector;
  • the image block of the reference area is the image block corresponding to the luminance component.
  • the sub-pixel motion vector corresponding to the target coding region is used in a coding unit decision-making operation, so as to determine the predicted value of the image region corresponding to the chrominance component of the to-be-coded image.
  • the global motion vector is determined based on the motion vector corresponding to the image block in the previous frame of the image to be encoded;
  • the global motion vector is obtained from an image signal processor
  • the global motion vector reflects the direction and distance in which the object in the image to be encoded is shifted in the reference image as a whole.
  • the inter-frame prediction operation includes a coding unit decision-making operation, the number of target coding regions is two, and the two target coding regions are the first image region corresponding to the luminance component of the to-be-coded image and the second target coding region respectively. the second image area corresponding to the chrominance component of the to-be-coded image; the processor for:
  • the predicted value of the second image area is determined according to the motion vector corresponding to the luminance component and the image block of the reference area corresponding to the chrominance component of the image to be encoded.
  • the pixel data of the image block of the reference area corresponding to the luminance component of the image to be encoded is different from the pixel data of the image block of the reference area corresponding to the chrominance component of the image to be encoded, and the luminance component of the image to be encoded is different.
  • the image blocks of the corresponding reference area and the image blocks of the reference area corresponding to the chrominance components of the to-be-coded image have different sizes.
  • the inter-frame prediction operation includes integer-pixel motion estimation, sub-pixel motion estimation, coding unit decision operation, and mode decision operation, wherein the integer-pixel motion estimation and the sub-pixel motion estimation use the same number of pixels.
  • a first tile of one size the coding unit decision operation uses a second tile of a second size
  • the mode decision operation uses a third tile of a third size, the third size being larger than the first size and the second dimension.
  • the first hardware structure and the first storage manner corresponding to the inter-frame prediction operation are the same as the second hardware structure and the second storage manner corresponding to the inter-frame prediction operation performed by the decoding apparatus corresponding to the encoding apparatus.
  • the video encoding device and the video decoding device are included in the same chip or the same IP core;
  • the first hardware structure corresponding to the inter-frame prediction operation and the second hardware structure corresponding to the inter-frame prediction operation performed by the video decoding apparatus share the same set of logic circuits, and the first storage corresponding to the inter-frame prediction operation The mode shares the same storage resource with the second storage mode corresponding to the inter-frame prediction operation implemented by the video decoding apparatus.
  • an embodiment of the present invention provides a video decoding apparatus, including a memory and a processor; wherein, executable codes are stored on the memory, and when the executable codes are executed by the processor, the Processor implementation:
  • the size of the reference area is larger than the size of the target coding area
  • decoding processing is performed on the image of the target decoding area.
  • the second hardware structure and the second storage manner corresponding to the inter-frame prediction operation are the same as the first hardware structure and the first storage manner corresponding to the inter-frame prediction operation performed by the encoding apparatus corresponding to the decoding apparatus.
  • the video decoding device and the video encoding device are included in the same chip or the same IP core;
  • the second hardware structure corresponding to the inter-frame prediction operation shares the same set of logic circuits with the first hardware structure corresponding to the inter-frame prediction operation implemented by the video encoding apparatus, and the second storage corresponding to the inter-frame prediction operation The mode shares the same storage resource with the first storage mode corresponding to the inter-frame prediction operation implemented by the video encoding apparatus.
  • the decision mode supported by the mode decision operation includes skip, merge, or amvp.
  • an embodiment of the present invention provides a movable platform, including the video encoding apparatus in the third aspect.
  • an embodiment of the present invention provides a remote controller, including the video decoding apparatus in the fourth aspect.
  • an embodiment of the present invention provides a computer-readable storage medium, where executable codes are stored on the computer-readable storage medium, and when the executable codes are executed by a processor of a mobile platform, all The processor can at least implement the video coding and decoding method in the first aspect.
  • embodiments of the present invention provide a computer-readable storage medium, where executable codes are stored on the computer-readable storage medium, and when the executable codes are executed by a processor of a mobile platform, all The processor can at least implement the video coding and decoding method in the first aspect.
  • video encoding and decoding method, device, movable platform, and storage medium provided by the embodiments of the present invention, video encoding and decoding can be performed efficiently using reference images.
  • FIG. 1 is a schematic structural diagram of a coding end provided in an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a video encoding and decoding method according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a video encoding and decoding method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of determining a reference area according to an embodiment of the present invention.
  • FIG. 5a is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention.
  • 5b is a schematic structural diagram of a video decoding apparatus according to an embodiment of the present invention.
  • FIG. 6a is a schematic diagram of determining an image block according to an embodiment of the present invention.
  • FIG. 6b is a schematic diagram of another image block determination provided by an embodiment of the present invention.
  • FIG. 7a is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present invention.
  • FIG. 7b is a schematic structural diagram of a video decoding apparatus according to an embodiment of the present invention.
  • FIG. 8a is a schematic structural diagram of a movable platform according to an embodiment of the present invention.
  • FIG. 8b is a schematic structural diagram of another remote controller according to an embodiment of the present invention.
  • the words “if”, “if” as used herein may be interpreted as “at” or “when” or “in response to determining” or “in response to detecting”.
  • the phrases “if determined” or “if detected (the stated condition or event)” can be interpreted as “when determined” or “in response to determining” or “when detected (the stated condition or event),” depending on the context )” or “in response to detection (a stated condition or event)”.
  • the method provided by the embodiment of the present invention may be implemented in the encoding end or the decoding end.
  • the structure of the encoding end is briefly introduced below.
  • the encoding end the original video frames are subjected to the following processing: prediction, transformation, quantization, entropy coding, inverse quantization, inverse transformation, reconstruction, filtering, etc.
  • the encoding end may include an encoding intra-frame prediction module, an encoding inter-frame prediction module, a transformation module, a quantization module, an entropy encoding module, an inverse quantization module, an inverse transformation module, a reconstruction module, and a filtering module. , refer to the image cache module.
  • the encoding intra prediction module and the encoding inter prediction module may respectively determine intra prediction data, intra prediction related information, inter prediction data, and inter prediction related information based on the reconstructed frame.
  • the switch connected to the coded intra prediction module and the coded inter prediction module is used to select whether to use the coded intra prediction module or the coded inter prediction module, and the selected module provides the adder with the intra prediction data or the inter prediction data.
  • the prediction residual is obtained.
  • the prediction residual is transformed and quantized to obtain quantized coefficients.
  • the quantized coefficients, intra-frame prediction related information, inter-frame prediction related information, etc. are input into the entropy encoder for entropy encoding, and finally encoded data for sending to the decoding end is obtained.
  • the reference image When determining intra-frame prediction data and inter-frame prediction data, a reference image needs to be acquired, the reference image can be stored in the reference image cache module, and can be read from the reference image cache module when used.
  • the reference image can be obtained by performing inverse quantization and inverse transformation on the quantized coefficients to restore the prediction residual.
  • the reconstruction module the prediction residuals are added back to the corresponding intra-frame prediction data and inter-frame prediction data to obtain a reconstructed frame.
  • the reconstructed frame is a distorted video frame.
  • some information of the original video frame is lost, such as the high-frequency component information in the original video frame, resulting in the existence of a gap between the reconstructed frame and the original video frame. Distortion phenomenon.
  • the reconstructed frame needs to be processed accordingly to reduce the distortion phenomenon between the reconstructed frame and the original video frame.
  • the specific method may be to perform filtering processing on the reconstructed frame, and the filtering processing may include deblocking filtering processing, compensation processing, and the like. After filtering the distorted video frame, the reference image can be obtained.
  • the present invention mainly provides a method for reading reference images in the process of determining inter-frame prediction data, and the data reading efficiency can be improved by the data reading method provided by the present invention.
  • FIG. 2 is a flowchart of a video encoding method provided by an embodiment of the present invention. As shown in FIG. 2 , the method includes the following steps:
  • Step 201 Obtain a global motion vector corresponding to the image to be encoded.
  • Step 202 Determine a target coding region in the image to be coded.
  • Step 203 Determine a reference area corresponding to the target coding area in the reference image based on the global motion vector. Wherein, the reference image is stored in the first memory.
  • Step 204 Read the reference area in the reference image, and store the reference area in the second memory, where the size of the reference area is larger than the size of the target coding area.
  • Step 205 In the second memory, read the image block of the reference area.
  • Step 206 Based on the read image blocks of the reference area, perform an inter-frame prediction operation on the image of the target coding area.
  • Step 207 Based on the result of the inter-frame prediction operation, perform coding processing on the image of the target coding region.
  • each inter-frame prediction operation needs to use a reference image, and each inter-frame prediction operation is to perform correlation processing on a part of the image to be coded, and the part of the image to be coded can be the target coding area. .
  • the entire to-be-encoded image may be divided to obtain a plurality of Coding Tree Units (CTUs), and then each CTU is encoded separately.
  • the coding operation may actually include several processes such as intra-frame prediction, inter-frame prediction, transform processing, quantization processing, and entropy coding.
  • the CTU can be further divided, and the above process can be performed in smaller division units.
  • the CTU may be divided according to a quadtree division manner to obtain multiple coding blocks (Coding Unit, CU).
  • the target coding region in this embodiment of the present invention may be a CTU or a CU.
  • the entire reference image may not be used, but a part of the related images of the reference image may be used.
  • the reference image is stored in the first memory
  • the first memory may be a double-rate synchronous dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR).
  • the first memory may be an external memory, and the data stored in the second memory needs to be read during the inter-frame prediction operation on the target coding region, and the second memory may be an internal memory (for example, a line buffer (line buffer)) .
  • the line buffer may be implemented with static static random access memory (SRAM). If an inter-frame prediction operation needs to be performed on the target coding area, a part of related images of the reference image needs to be stored from the first memory to the second memory, and then the inter-frame prediction operation is performed based on the images stored in the second memory.
  • a reference area corresponding to the target encoding area in the reference image may be determined based on a global motion vector (Global Motion Vector, GMV) corresponding to the image to be encoded.
  • GMV Global Motion Vector
  • the global motion vector reflects the direction and distance in which the object in the image to be encoded is shifted in the reference image as a whole.
  • the process of determining the global motion vector may be implemented as: calculating the global motion vector corresponding to the to-be-coded image through an Image Signal Processor (ISP), and sending the global motion vector corresponding to the to-be-coded image to the encoding end.
  • ISP Image Signal Processor
  • the encoding end may also automatically calculate the global motion vector corresponding to the image to be encoded.
  • the global motion vector corresponding to the to-be-coded image may be calculated based on N frames of images preceding the to-be-coded image, where N may be 1 or 2.
  • the following introduces a specific implementation manner of determining a reference area corresponding to a target encoding area in a reference image based on a global motion vector corresponding to the image to be encoded.
  • the process of determining the reference region corresponding to the target coding region in the reference image may be implemented as: determining the initial position of the preset pixel point in the target coding region; superimposing the initial position with the corresponding target coding region.
  • the global motion vector is used to obtain the movement position of the preset pixel point; based on the movement position, the reference area corresponding to the target coding area is determined in the reference image.
  • FIG. 4 is used as an example to illustrate the process of determining the reference area.
  • the left image represents the image to be encoded
  • the right image represents the reference image.
  • CTUs located in the same position as the CTU represented by the box marked with the letter "A" in the reference image are marked with the letter "B".
  • pixel X at the upper left corner of the CTU indicated by the box marked with the letter "B”
  • offset by the direction and distance indicated by the global motion vector another pixel Y can be found. Taking the pixel Y as the pixel at the upper left corner of another CTU, another CTU represented by the box marked with the letter "C” in FIG. 4 can be determined.
  • the first distance m is respectively extended outward along the vertical direction of the reference image, and from the left and right boundaries of the CTU row marked with the letter "C" along the horizontal direction of the reference image, respectively.
  • the second distance x By extending the second distance x, the reference area can be obtained.
  • the video coding apparatus includes an integer pixel search module, a sub-pixel search module, a coding unit decision module, a mode decision module, a sample adaptive offset estimation module, a deblocking filter module, and a sample adaptive offset filter module, And the entropy coding module.
  • the video encoding apparatus reads data from the reference pixel line buffer through the line buffer controller.
  • the reference area may also be referred to as a range of a line buffer (line buffer), and the range defines a range in which the target coding area obtains reference data during an inter-frame prediction operation. After the target reference area is determined, the reference area in the reference image can be read and stored in the second memory.
  • the inter-frame prediction operation can be implemented by several modules with different processing functions, including the integer pixel search module (hereinafter referred to as the IME module), the sub-pixel search module (hereinafter referred to as the FME module), and the coding unit decision-making module. module (hereinafter referred to as CUD module) and mode decision module (hereinafter referred to as MD module).
  • the IME module can perform whole-pixel motion estimation processing on the target coding region
  • the FME module can perform pixel-by-pixel motion estimation processing on the target coding region
  • the CUD module can perform coding unit decision-making operations on the target coding region
  • the MD module can perform sub-pixel motion estimation processing on the target coding region. Perform mode decision operations.
  • Different modules need to use the same or different image blocks in the reference area to perform inter-frame prediction operations.
  • Different modules have their own corresponding storage units, and the image blocks to be used can be read into their corresponding storage units for inter-frame prediction. operate. As shown in Fig. 5a, the image block can be read from the external memory into storage units corresponding to each module of inter-frame prediction.
  • the process of determining the image block and using the image block to perform inter-frame prediction when the above-mentioned four different modules perform the inter-frame prediction operation will be sequentially introduced below. It should be noted that the IME module and the FME module can share the same first image block to perform inter-frame prediction operations. The following describes how the IME module and the FME module determine the first image block and use the first image block when performing the inter-frame prediction operation. The process of performing inter-frame prediction.
  • the first image block of the image block may be read, and the first image block may be stored in the storage unit of the IME and the storage unit of the FME.
  • the first image block is a part of the image block, and the storage unit of the IME and the storage unit of the FME are different from the first memory and the second memory. Then, in the whole-pixel motion estimation process and the sub-pixel motion estimation process, based on the first image block, the integer-pixel motion vector and the sub-pixel motion vector corresponding to the target coding region are determined respectively.
  • the storage unit of the IME, the storage unit of the FME, and the first memory and the second memory are all located in different devices.
  • the process of determining the first image block may be implemented as: acquiring a preset size of the image block; in the reference image, determining the first image block whose size is equal to the size of the image block and covers the moving position.
  • the size of the image block can be set first.
  • the target coding region is a CU
  • the size of the CU is 16 ⁇ 16.
  • the reference image can be based on The motion vector determines the moving position corresponding to the target coding area, and then covers the moving position and selects a first image block with a size of 32 ⁇ 32.
  • the motion vector may be a global motion vector.
  • FIG. 6a and FIG. 6b are used as examples to illustrate the process of determining the first image block.
  • the initial position of the upper left pixel in the target coding region is determined.
  • the initial position is superimposed with the motion vector corresponding to the target coding region to obtain the moving position of the upper left pixel.
  • the reference image determine the first image block whose size is equal to the size of the image block 32 ⁇ 32 and whose moving position is the upper left pixel point.
  • the first two steps are the same as the embodiment corresponding to Fig. 6a, that is, first determine the initial position of the upper left pixel in the target coding region, and then superimpose the initial position with the motion vector corresponding to the target coding region to obtain the upper left pixel. mobile location. It is assumed that the size of the target coding region is 16 ⁇ 16, and the size of the image block is 32 ⁇ 32. In the last step, in the reference image, an image block A of size 16 ⁇ 16 can be determined with the moving position as the upper left pixel point.
  • the upper and lower boundaries of the image block A are expanded by 16 pixels in the vertical direction of the reference image, and the left and right boundaries of the image block A are expanded by 16 pixels in the horizontal direction of the reference image, so that the size can be obtained.
  • the image block A is in the middle of the image block B, and the image block B is the first image block that the IME module and the FME module need to use when performing the inter prediction operation.
  • the size of the image block needs to be known first, and the size of the image block can be determined according to a predetermined rule.
  • the specific implementation process of determining the size of the image block may be: determining the size of the image block according to the range of motion search in the process of whole-pixel motion estimation and the reserved pixel size in the process of sub-pixel motion estimation. That is to say, when the range of motion search is larger in the process of whole-pixel motion estimation and the more pixels are reserved in the process of sub-pixel motion estimation, the size of the image block is larger; When the range for motion search is smaller and the less pixels are reserved in the process of sub-pixel motion estimation, the size of the image block is smaller.
  • the range of the above motion search can be set according to actual requirements, for example, the range of motion search can be set to be four whole pixels around.
  • the first image block used by the IME module can be passed to the FME module, in other words, the first image block can be copied from the storage unit of the IME module. to the storage unit of the FME module, so that the first image block can be shared between the IME module and the FME module.
  • the factor of reserving pixels in the process of sub-pixel motion estimation is also taken into account.
  • the IME module can request a larger first image block from the line buffer at one time, and the larger first image block can meet the use requirements of the IME module and the FME module in the inter-frame prediction process. .
  • the first image block determined by the method provided by the embodiment of the present invention needs to be within the reference area. If it is finally found that the first image block is not within the reference area, the first image block can be guaranteed by modifying the search starting point. within the reference area.
  • the first image block of the image block may be read and stored in the third memory.
  • the first image block is a part of the image block, and the third memory is different from the first memory and the second memory. Then, according to the first image block of the reference region, the integer-pixel motion vector corresponding to the target coding region calculated in the integer-pixel motion estimation process is obtained.
  • the first memory is a double rate synchronous dynamic random access memory
  • the second memory is a line buffer
  • the third memory is a register or a storage unit in the integer pixel search module.
  • an integer-pixel motion estimation process may be performed based on the first image block to obtain the optimal integer-pixel motion vector corresponding to the current CU. Assuming that a CTU size is 32 ⁇ 32 and the CU size supported by the encoder is 16 ⁇ 16, then a CTU can be divided into 4 CUs. The corresponding optimal integer-pixel motion vector can be determined for each CU, then four optimal integer-pixel motion vectors can be determined. After calculating the optimal integer-pixel motion vectors corresponding to the four CUs, the The optimal integer-pixel motion vector corresponding to the four CUs and the first image block used by the IME module are sent to the FME module together.
  • the first image block of the reference area may further be used for sub-pixel motion estimation to determine the optimal sub-pixel motion vector of the target coding area.
  • the sub-pixel motion vector corresponding to the target coding region calculated in the sub-pixel motion estimation process can be determined according to the motion vector obtained by the whole-pixel motion estimation and the first image block related to the reference region used in the whole-pixel motion estimation process .
  • the image block of the reference area is the image block corresponding to the luminance component.
  • the FME module may perform pixel-by-pixel motion estimation according to the integer-pixel motion vector corresponding to each CU and the first sub-data block to obtain an optimal pixel-by-pixel motion vector, such as an optimal 1/4 pixel motion vector.
  • an optimal pixel-by-pixel motion vector such as an optimal 1/4 pixel motion vector.
  • the inter-frame prediction value of the luminance component can also be obtained.
  • the FME module can send the optimal sub-pixel motion vector corresponding to each CU and the inter-frame prediction value of the luminance component to the CUD module.
  • the optimal sub-pixel motion vector is a sub-pixel motion vector with respect to the luminance component.
  • Encoding unit decision operations can be performed in the CUD module.
  • the CUD module may obtain the image data of the chrominance components from the line buffer according to the optimal sub-pixel motion vector sent from the FME module and the position of the current CU in the to-be-coded image. Then, based on the image data of the chrominance components, the predicted values of the image regions corresponding to the chrominance components of the image to be encoded are determined.
  • the CUD module can calculate the rate-distortion cost (RD cost) of each CU.
  • the CUD module will predict the chrominance component to obtain the predicted value, and then make the difference between the predicted value of the luma component and the chrominance component and the original pixel value to obtain the residual, and then transform the residual. quantization, inverse quantization and inverse transformation The process obtains the estimated distortion value, and also performs bit estimation on the coding mode information and coding coefficients to obtain the estimated bit value.
  • the CUD module makes a decision on different CU division methods after obtaining the rate-distortion cost of each CU.
  • a coding tree unit with a size of 32 ⁇ 32 can be divided into 4 CUs with a size of 16 ⁇ 16.
  • a coding tree unit with a size of 32 ⁇ 32 can be divided into 16 CUs with a size of 8 ⁇ 8. Then, it is necessary to compare the rate-distortion cost of 4 CUs with a size of 16x16 in the first division method and the sum of the rate-distortion costs of 16 CUs with a size of 8x8 in the second division method.
  • the coding tree unit with a size of 32 ⁇ 32 is selected to be divided into four CUs with a size of 16 ⁇ 16. Conversely, if the rate-distortion cost of the first division method is greater than the rate-distortion cost of the second division method, the coding tree unit with a size of 32 ⁇ 32 is selected to be divided into 16 CUs with a size of 8 ⁇ 8. It should be noted that the rate-distortion cost finally obtained in the coding unit decision stage is the rate-distortion cost of the amvp mode.
  • the pixel data of the image block of the reference area corresponding to the luminance component of the image to be encoded is different from the pixel data of the image block of the reference area corresponding to the chrominance component of the image to be encoded, and the luminance component of the image to be encoded corresponds to
  • the size of the image block of the reference area is different from that of the image block of the reference area corresponding to the chrominance component of the image to be encoded.
  • the width and height of the chroma component are only half the width and height of the luma component, respectively.
  • the size of the current CU is 16 ⁇ 16
  • the size of the corresponding chroma component image block is 8 ⁇ 8.
  • the size of the image data obtained from the line buffer for chrominance components can be set to 16 ⁇ 16.
  • the CUD module can perform coding unit decision operations.
  • two target coding regions are required.
  • the two target encoding areas are respectively a first image area corresponding to the luminance component of the image to be encoded and a second image area corresponding to the chrominance component of the image to be encoded.
  • the predicted value of the second image area may be determined according to the motion vector corresponding to the luminance component and the image block of the reference area corresponding to the chrominance component of the image to be encoded.
  • the CUD module can decide the division method of the CU and the rate-distortion cost corresponding to the CU. That is, the rate-distortion cost of the amvp mode. Further, the MD module can determine the prediction blocks of the corresponding CU in skip and merge modes and calculate the rate-distortion cost of the corresponding CU. After that, the MD module compares the rate-distortion cost of the CU in amvp mode, skip mode and merge mode, and decides the inter-coding mode of the CU.
  • each skip and merge decision mode is predicted to correspond to multiple motion vectors, so that image blocks will be requested from the line buffer multiple times.
  • the purpose of avoiding multiple requests for image blocks from the line buffer can be achieved by acquiring an image block with a larger size corresponding to multiple motion vectors from the line buffer. Based on this, optionally, the first image block with the same first size is used for the whole-pixel motion estimation and the sub-pixel motion estimation, the second image block of the second size is used for the coding unit decision operation, and the third size is used for the mode decision operation. The third image block, the third size is larger than the first size and the second size.
  • the first hardware structure and first storage mode corresponding to the inter-frame prediction operation in the encoding method are the same as the second hardware structure and second storage mode corresponding to the inter-frame prediction operation in the decoding method corresponding to the encoding method. That is, the first hardware structure and the first storage manner corresponding to the inter-frame prediction operation are the same as the second hardware structure and the second storage manner corresponding to the inter-frame prediction operation performed by the decoding apparatus corresponding to the encoding apparatus.
  • the hardware structure and storage method corresponding to the inter-frame prediction operation performed by the encoding device in the UAV and the inter-frame prediction operation performed by the decoding device in the remote control correspond to The hardware structure and storage method are the same.
  • the video encoding device and the video decoding device are included in the same chip or the same IP core; wherein the first hardware structure corresponding to the video encoding device performing the inter-frame prediction operation corresponds to the video decoding device performing the inter-frame prediction operation
  • the second hardware structure of the video encoding device shares the same set of logic circuits, and the first storage mode corresponding to the inter-frame prediction operation in the video encoding device and the second storage mode corresponding to the inter-frame prediction operation in the video decoding device share the same storage resource.
  • a video encoding device and a video decoding device may be simultaneously included in one chip.
  • the hardware circuit corresponding to the video encoding device in the chip is enabled, and the hardware circuit corresponding to the video decoding device in the chip is disabled.
  • the chip is applied to the remote controller, the hardware circuit corresponding to the video decoding device in the chip is enabled, and the hardware circuit corresponding to the video encoding device in the chip is disabled. Since the video encoding device and the video decoding device can be included in the same chip or IP core, and the video encoding device and the video decoding device can share the same logic circuit and use the same storage resource (for example, the same memory or the same storage unit), therefore , in the design and development process of the chip, the chip area and resources can be saved, and the development cost and the use cost can be saved.
  • FIG. 3 is a flowchart of a video decoding method provided by an embodiment of the present invention. As shown in FIG. 2 , the method includes the following steps:
  • Step 301 obtain the global motion vector corresponding to the encoded image
  • Step 302 determine the target decoding area in the coded image
  • Step 303 based on the global motion vector, determine the reference area corresponding to the target decoding area in the reference image; wherein, the reference image is stored in the first memory;
  • Step 304 read the reference area in the reference image, and store the reference area in the second memory, and the size of the reference area is larger than the size of the target coding area;
  • Step 305 in the second memory, read the image block of the reference area
  • Step 306 based on the read image block of the reference area, perform an inter-frame prediction operation on the image of the target decoding area;
  • Step 307 Perform decoding processing on the image in the target decoding area based on the result of the inter-frame prediction operation.
  • Inter-frame operation is the correlation processing of a portion of the decoded image.
  • a part of the images to be decoded may be the target decoding area.
  • FIG. 5b is a schematic structural diagram of a video decoding apparatus according to an embodiment of the present invention.
  • the video decoding apparatus includes an entropy decoding module, a mode decision module, an adaptive parameter estimation module, a deblocking filter module, a sampling adaptive offset filter module, and a pixel buffer.
  • the mode decision module includes an advanced motion vector prediction (amvp) module, an intra frame (intra) module, a skip (skip) module, and a merge (merge) module.
  • the video decoding apparatus reads data from the reference pixel line buffer through the line buffer controller.
  • the line buffer range corresponding to the current CTU can be determined according to the position of the current CTU and the global motion vector.
  • supported mode decisions can include skip, merge, or amvp. That is to say, the decoding end needs to perform the decoding and reconstruction process of the inter-frame prediction, including the decoding and reconstruction of the amvp, skip, and merge decision modes. Since the coding end also has the interpolation prediction process of skip and merge decision mode, the skip module and merge module at the decoding end and the skip module and merge module at the coding end have the same circuit structure. For example, the skip and merge decision mode at the decoding end and the skip and merge decision mode at the encoding end obtain reference image blocks based on the same hardware structure and/or the same storage method.
  • the acquisition methods of image blocks in skip and merge decision modes can reuse the acquisition methods of image blocks on the encoding side.
  • the image blocks of the amvp decision mode of the encoding side are actually obtained by the IME module and the CUD module from the line buffer, so if the MD module of the decoding side only multiplexes the acquisition of the image blocks of the encoding side mode, the image block corresponding to the amvp decision mode may not be read. Therefore, the image block can be requested from the line buffer directly according to the position of the current CU and the motion vector.
  • the interpolation prediction process needs to reserve pixels, in a possible implementation, it can be set as an image block with a size of 24 ⁇ 24 and a chrominance component acquired by a CU with a size of 16 ⁇ 16.
  • the block size is 16x16.
  • the manner of acquiring the image blocks of the MD module of the entire decoding end may be to prefetch 6 image blocks, for example, 6 image blocks of 44 ⁇ 44 size.
  • 6 image blocks of 44 ⁇ 44 size 2 of the 6 44 ⁇ 44 image blocks are the image blocks of the reference area corresponding to the luminance block, and the other 4 of the 6 44 ⁇ 44 image blocks are the reference area corresponding to the U-component chrominance block.
  • the size of the image block in the reference area corresponding to the image block and the chrominance block of the V component is 44 ⁇ 44, and the size of the chrominance block of the U component and the V component is both 22 ⁇ 22.
  • the second hardware structure and the second storage mode corresponding to the inter-frame prediction operation corresponding to the decoding method are the same as the first hardware structure and the first storage mode corresponding to the inter-frame prediction operation in the encoding method corresponding to the decoding method . That is, the second hardware structure and the second storage manner corresponding to the inter-frame prediction operation are the same as the first hardware structure and the first storage manner corresponding to the inter-frame prediction operation performed by the encoding apparatus corresponding to the decoding apparatus.
  • the hardware structure and storage method corresponding to the inter-frame prediction operation performed by the encoding device in the UAV and the inter-frame prediction operation performed by the decoding device in the remote control correspond to The hardware structure and storage method are the same.
  • the video encoding device and the video decoding device are included in the same chip or the same IP core; wherein the second hardware structure corresponding to the video decoding device performing the inter-frame prediction operation corresponds to the video encoding device performing the inter-frame prediction operation
  • the first hardware structure of the video decoding device shares the same set of logic circuits, and the second storage mode corresponding to the inter-frame prediction operation in the video decoding device and the first storage mode corresponding to the inter-frame prediction operation in the video encoding device share the same storage resource.
  • a video encoding device and a video decoding device may be simultaneously included in one chip.
  • the hardware circuit corresponding to the video encoding device in the chip is enabled, and the hardware circuit corresponding to the video decoding device in the chip is disabled.
  • the chip is applied to the remote controller, the hardware circuit corresponding to the video decoding device in the chip is enabled, and the hardware circuit corresponding to the video encoding device in the chip is disabled. Since the video encoding device and the video decoding device can be included in the same chip or IP core, and the video encoding device and the video decoding device can share the same logic circuit and use the same storage resource (for example, the same memory or the same storage unit), therefore , In the process of chip design and development, it can save chip area and resources, save development costs and use costs
  • the method provided by the embodiment of the present invention can realize the acquisition of image blocks in the process of inter-frame prediction in the highly integrated encoder and decoder.
  • the method is suitable for the architecture of the line buffer, and has low implementation complexity, and the cost of hardware resources is low. Low bandwidth consumption and high cost performance.
  • the method provided by the embodiment of the present invention can reduce the number of interactions between different modules and the line buffer, and can reduce the risk of hardware implementation.
  • a reference area for inter-frame prediction can be determined based on the global motion vector, and inter-frame prediction is performed based on image blocks in the reference area, which avoids copying all reference images from the first memory to the first memory.
  • the second memory is used to perform inter-frame prediction based on the entire reference picture. Since the amount of data that needs to be copied and read is reduced, the consumption of read bandwidth is also reduced, and the reference image can be efficiently used for inter-frame prediction.
  • Yet another exemplary embodiment of the present invention provides a video encoding apparatus, as shown in FIG. 7a, the apparatus includes:
  • memory 1910 for storing computer programs
  • the processor 1920 is used for running the computer program stored in the memory 1910 to realize:
  • a reference area corresponding to the target coding area is determined in a reference image; wherein the reference image is stored in a first memory;
  • the size of the reference area is larger than the size of the target coding area
  • An encoding process is performed on the image of the target encoding region based on the result of the inter prediction operation.
  • the processor 1920 is configured to:
  • a reference area corresponding to the target coding area is determined.
  • the processor 1920 is configured to:
  • an image area whose size is equal to the size of the reference area and covers the moving position is determined as a reference area corresponding to the target coding area.
  • the inter prediction operation includes an integer-pixel motion estimation IME, and the size of the reference area is based on the range of motion search performed in the integer-pixel motion estimation process and the prediction in the sub-pixel motion estimation process of FME. Leave the pixel size determined.
  • the inter-frame prediction operation includes whole-pixel motion estimation and sub-pixel motion estimation; the processor 1920 is configured to:
  • the storage unit of the IME and the storage unit of the FME are different from the first memory and the second memory;
  • an integer-pixel motion vector and a sub-pixel motion vector corresponding to the target coding region are respectively determined.
  • the processor 1920 is configured to:
  • an integer-pixel motion vector corresponding to the target coding area calculated in the process of integer-pixel motion estimation is obtained.
  • the first image block of the reference region is further used for sub-pixel motion estimation, so as to determine the optimal sub-pixel motion vector of the target coding region.
  • the inter prediction operation includes sub-pixel motion estimation, and the size of the first image block of the reference area is based on the range of motion search performed in the whole-pixel motion estimation process and the size of the pixel-by-pixel motion estimation.
  • the reserved pixel size is determined during the process.
  • the processor 1920 is configured to:
  • the motion vector obtained by the whole-pixel motion estimation and the image block related to the reference area used in the whole-pixel motion estimation process obtain the corresponding image of the target coding area calculated in the sub-pixel motion estimation process Sub-pixel motion vector.
  • the image block of the reference area is the image block corresponding to the luminance component.
  • the sub-pixel motion vector corresponding to the target coding region is used in a coding unit decision-making operation, so as to determine the predicted value of the image region corresponding to the chrominance component of the to-be-coded image.
  • the global motion vector is determined based on the motion vector corresponding to the image block in the previous frame of the image to be encoded;
  • the global motion vector is obtained from an image signal processor.
  • the global motion vector reflects the direction and distance in which the object in the image to be encoded is shifted in the reference image as a whole.
  • the inter-frame prediction operation includes a coding unit decision-making operation, the number of target coding regions is two, and the two target coding regions are the first image region corresponding to the luminance component of the to-be-coded image and the second target coding region respectively.
  • the second image area corresponding to the chrominance component of the image to be encoded; the processor 1920 is used for:
  • the predicted value of the second image area is determined according to the motion vector corresponding to the luminance component and the image block of the reference area corresponding to the chrominance component of the image to be encoded.
  • the pixel data of the image block of the reference area corresponding to the luminance component of the image to be encoded is different from the pixel data of the image block of the reference area corresponding to the chrominance component of the image to be encoded, and the luminance component of the image to be encoded is different.
  • the image blocks of the corresponding reference area and the image blocks of the reference area corresponding to the chrominance components of the to-be-coded image have different sizes.
  • the inter-frame prediction operation includes integer-pixel motion estimation, sub-pixel motion estimation, coding unit decision operation, and mode decision operation, wherein the integer-pixel motion estimation and the sub-pixel motion estimation use the same number of pixels.
  • a first tile of one size the coding unit decision operation uses a second tile of a second size
  • the mode decision operation uses a third tile of a third size, the third size being larger than the first size and the second dimension.
  • Yet another exemplary embodiment of the present invention provides a video encoding apparatus, as shown in FIG. 7b, the apparatus includes:
  • memory 1910' for storing computer programs
  • the reference image Based on the global motion vector corresponding to the image to be encoded, in the reference image, determine the reference area corresponding to the target encoding area in the image to be encoded; wherein, the reference image is stored in the first memory;
  • the size of the reference area is larger than the size of the target coding area
  • a second inter-frame prediction operation is performed on the image of the target decoding area
  • decoding processing is performed on the image of the target decoding area.
  • the processor 1920' is configured to: the first inter-frame prediction operation includes a first mode decision operation, and the second inter-frame prediction operation includes a second mode decision operation;
  • the first mode decision operation and the second mode decision operation obtain reference image blocks based on the same hardware structure and/or the same storage manner.
  • the decision mode supported by the second mode decision operation includes skip, merge or amvp.
  • Figure 7a includes memory 1910 and processor 1920.
  • the processor executes the video encoding apparatus shown in FIG. 7a, it may execute the methods of the embodiments shown in FIGS. 1-2, 4-5a, and 6a-6b.
  • the processor executes the video encoding apparatus shown in FIG. 7a, it may execute the methods of the embodiments shown in FIGS. 1-2, 4-5a, and 6a-6b.
  • the parts not described in detail in this embodiment please refer to Relevant descriptions of the embodiments shown in FIGS. 1-2, 4-5a, and 6a-6b.
  • the implementation process and technical effects of the technical solution refer to the descriptions in the embodiments shown in Figs.
  • Figure 7b includes memory 1910' and processor 1920'.
  • the processor executing the video encoding apparatus shown in FIG. 7b may execute the methods of the embodiments shown in FIGS. 3-4, 5b, and 6a-6b.
  • FIG. 3 - a description of the embodiments shown in Figures 4, 5b, 6a-6b.
  • FIG. 3 - a description of the embodiments shown in Figures 4, 5b, 6a-6b.
  • an embodiment of the present invention further provides a movable platform, and the movable platform includes the video encoding and decoding apparatus 800 shown in FIG. 7a.
  • the video coding method can be applied in a mobile platform.
  • the movable platform may include at least one of an unmanned aerial vehicle, an unmanned vehicle, and a handheld gimbal.
  • the UAV may be a rotary-wing UAV, such as a quad-rotor UAV, a hexa-rotor UAV, an octa-rotor UAV, or a fixed-wing UAV.
  • an embodiment of the present invention further provides a remote controller, where the remote controller includes the video codec apparatus 802 shown in FIG. 7b.
  • the video coding and decoding method can be applied in a remote control station.
  • an embodiment of the present invention further provides a computer-readable storage medium, where executable codes are stored in the computer-readable storage medium, and the executable codes are used to implement the video encoding and decoding methods provided in the foregoing embodiments.

Abstract

本发明实施例提供一种视频编解码方法、装置、可移动平台和存储介质。其中,视频编码方法包括:获取待编码图像对应的全局运动矢量;在待编码图像中确定目标编码区域;基于全局运动矢量,在参考图像中确定与目标编码区域对应的参考区域;其中,参考图像被存储于第一存储器中;读取参考图像中的参考区域,并将参考区域存储于第二存储器中;在第二存储器中,读取参考区域的图像块;基于读取的参考区域的图像块,对目标编码区域的图像进行帧间预测操作;基于所述帧间预测操作的结果,对所述目标编码区域的图像进行编码处理。采用本发明提供的视频编解码方法,可以降低读取带宽的占用,可以高效地利用参考图像进行视频编解码处理。

Description

视频编解码方法、装置、可移动平台和存储介质 技术领域
本发明涉及视频编解码技术领域,尤其涉及一种视频编解码方法、装置、可移动平台和存储介质。
背景技术
视频编解码技术包括编码端的压缩和解码端的解压缩,其中编码端的压缩是通过一些编码技术将原始的视频文件进行压缩编码形成码流,然后解码端的解压缩就是将码流进行解码重建形成视频文件,解码过程可以看作是编码过程的逆过程。在对视频图像进行编解码的过程中,需要基于视频图像对应的参考图像,对视频图像进行帧间预测。其中,参考图像一般是存储在存储器中的。在使用某一参考图像进行帧间预测时,就需要利用存储器中存储的参考图像来进行帧间预测。由于存取带宽的限制,如何高效地利用参考图像进行视频编解码技术中期望解决的问题。
发明内容
本发明实施例提供一种视频编解码方法、装置、设备和存储介质,用以高效地利用参考图像进行视频编解码。
第一方面,本发明实施例提供一种视频编码方法,该方法包括:
获取待编码图像对应的全局运动矢量;
在所述待编码图像中确定目标编码区域;
基于所述全局运动矢量,在参考图像中确定与所述目标编码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
在所述第二存储器中,读取所述参考区域的图像块;
基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作;
基于所述帧间预测操作的结果,对所述目标编码区域的图像进行编码处理。
可选地,所述基于所述全局运动矢量,在参考图像中确定与所述目标编码区域对应的参考区域,包括:
确定所述目标编码区域中预设像素点的初始位置;
将所述初始位置叠加所述全局运动矢量,得到所述预设像素点的移动位置;
基于所述移动位置,在所述参考图像中确定与所述目标编码区域对应的参考区域。
可选地,所述基于所述移动位置,在所述参考图像中确定与所述目标编码区域对应的参考区域,包括:
获取预先设置的所述参考区域的尺寸;
在所述参考图像中,确定尺寸等于所述参考区域的尺寸且覆盖所述移动位置的图像区域,作为与所述目标编码区域对应的参考区域。
可选地,所述帧间预测操作包括整像素运动估计(Interger Motion Estimation,IME),所述参考区域的尺寸为根据在所述整像素运动估计过程中进行运动搜索的范围以及在分像素运动估计FME的过程中预留像素的大小确定的。
可选地,所述帧间预测操作包括整像素运动估计和分像素运动估计;
所述在所述第二存储器中,读取所述参考区域的图像块,包括:
读取所述图像块的第一图像块,并将所述第一图像块存储于IME的存储单元和FME的存储单元中,其中,所述第一图像块为所述图像块的一部分,所述IME的存储单元和所述FME的存储单元不同于所述第一存储器和所述第二存储器;
在所述整像素运动估计过程中和所述分像素运动估计的过程中,基于所述第一图像块,分别确定所述目标编码区域对应的整像素运动矢量和分像素运动矢量。
可选地,所述在所述第二存储器中,读取所述参考区域的图像块,包括:
读取所述图像块的第一图像块,并将所述第一图像块存储于第三存储器中,其中,所述第一图像块为所述图像块的一部分,所述第三存储器不同于所述第一存储器和所述第二存储器;以及
所述基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作,包括:
根据所述参考区域的所述第一图像块,获取在整像素运动估计过程中计算出的所述目标编码区域对应的整像素运动矢量。
可选地,所述参考区域的所述第一图像块进一步用于分像素运动估计,以用于确定所述目标编码区域的最优分像素运动矢量。
可选地,所述帧间预测操作包括分像素运动估计,所述参考区域的第一图像块的尺寸为根据在整像素运动估计过程中进行运动搜索的范围以及在所述分像素运动估计的过程中预留像素的大小确定的。
可选地,所述帧间预测操作,包括:
根据所述整像素运动估计得到的运动矢量和所述整像素运动估计过程中使用到的关于参考区域的图像块,获取在所述分像素运动估计过程中计算出的所述目标编码区域对应的分像素运动矢量;
其中,所述参考区域的图像块为亮度分量对应的图像块。
可选地,将所述目标编码区域对应的所述分像素运动矢量用于编码单元决策操作,以用于确定所述待编码图像的色度分量对应的图像区域的预测值。
可选地,所述全局运动矢量是基于所述待编码图像的上一帧图像中的图像块对应的运动矢量而被确定的;或者
所述全局运动矢量是从图像信号处理器获取的;
其中,所述全局运动矢量反映所述待编码图像中的物体整体在参考图像中偏移的方向与距离。
可选地,所述帧间预测操作包括编码单元决策操作,所述目标编码区域的数量为两个,两个目标编码区域分别为所述待编码图像的亮度分量对应的第一图像区域和所述待编码图像的色度分量对应的第二图像区域;以及
根据亮度分量对应的运动矢量和所述待编码图像的色度分量对应的参考区域的图像块,确定所述第二图像区域的预测值。
可选地,所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块的像素数据不同,以及所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块尺寸不相同。
可选地,所述帧间预测操作包括整像素运动估计、分像素运动估计、编码单元决策操作以及模式决策操作,其中,所述整像素运动估计和所述分像素运动估计使用具有相同的第一尺寸的第一图像块,所述编码单元决策操作使用第二尺寸的第二图像块,所述模式决策操作使用第三尺寸的第三图像块,所述第三尺寸大于所述第一尺寸和所述第二尺寸。
可选地,所述帧间预测操作对应的第一硬件结构和第一存储方式与所述编码方法对应的解码方法中的帧间预测操作对应的第二硬件结构和第二存储方式相同。
第二方面,本发明实施例提供一种视频解码方法,该方法包括:
获取已编码图像对应的全局运动矢量;
在已编码图像中确定目标解码区域;
基于所述全局运动矢量,在参考图像中确定与所述目标解码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
在所述第二存储器中,读取所述参考区域的图像块;
基于读取的所述参考区域的所述图像块,对所述目标解码区域的图像进行帧间预测操作;
基于所述帧间预测操作的结果,对所述目标解码区域的图像进行解码处理。
可选地,所述帧间预测操作对应的第二硬件结构和第二存储方式与所述解码方法对应的编码方法中的帧间预测操作对应的第一硬件结构和第一存储方式 相同。
可选地,模式决策操作支持的决策模式包括skip、merge或者amvp。
第三方面,本发明实施例提供一种视频编码装置,包括存储器、处理器;其中,所述存储器上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器实现:
获取待编码图像对应的全局运动矢量;
在所述待编码图像中确定目标编码区域;
基于所述全局运动矢量,在参考图像中,确定与所述目标编码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
在所述第二存储器中,读取所述参考区域的图像块;
基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作;
基于所述帧间预测操作的结果,对所述目标编码区域的图像进行编码处理。
可选地,所述处理器,用于:
确定所述目标编码区域中预设像素点的初始位置;
将所述初始位置叠加所述全局运动矢量,得到所述预设像素点的移动位置;
基于所述移动位置,在所述参考图像中,确定与所述目标编码区域对应的参考区域。
可选地,所述处理器,用于:
获取预先设置的所述参考区域的尺寸;
在所述参考图像中,确定尺寸等于所述参考区域的尺寸且覆盖所述移动位置的图像区域,作为与所述目标编码区域对应的参考区域。
可选地,所述帧间预测操作包括整像素运动估计IME,所述参考区域的尺寸为根据在所述整像素运动估计过程中进行运动搜索的范围以及在分像素运动估计FME的过程中预留像素的大小确定的。
可选地,所述帧间预测操作包括整像素运动估计和分像素运动估计;所述处理器,用于:
读取所述图像块的第一图像块,并将所述第一图像块存储于IME的存储单元和FME的存储单元中,其中,所述第一图像块为所述图像块的一部分,所述IME的存储单元和所述FME的存储单元不同于所述第一存储器和所述第二存储器;
在所述整像素运动估计过程中和所述分像素运动估计的过程中,基于所述第一图像块,分别确定所述目标编码区域对应的整像素运动矢量和分像素运动矢量。
可选地,所述处理器,用于:
读取所述图像块的第一图像块,并将所述第一图像块存储于第三存储器中,其中,所述第一图像块为所述图像块的一部分,所述第三存储器不同于所述第一存储器和所述第二存储器;以及
根据所述参考区域的所述第一图像块,获取在整像素运动估计过程中计算出的所述目标编码区域对应的整像素运动矢量。
可选地,所述参考区域的所述第一图像块进一步用于分像素运动估计,以用于确定所述目标编码区域的最优分像素运动矢量。
可选地,所述帧间预测操作包括分像素运动估计,所述参考区域的第一图像块的尺寸为根据在整像素运动估计过程中进行运动搜索的范围以及在所述分像素运动估计的过程中预留像素大小确定的。
可选地,所述处理器,用于:
根据所述整像素运动估计得到的运动矢量和所述整像素运动估计过程中使用到的关于参考区域的图像块,获取在所述分像素运动估计过程中计算出的所述目标编码区域对应的分像素运动矢量;
其中,所述参考区域的图像块为亮度分量对应的图像块。
可选地,将所述目标编码区域对应的所述分像素运动矢量用于编码单元决策操作,以用于确定所述待编码图像的色度分量对应的图像区域的预测值。
可选地,所述全局运动矢量是基于所述待编码图像的上一帧图像中的图像 块对应的运动矢量而被确定的;或者
所述全局运动矢量是从图像信号处理器获取的;
其中,所述全局运动矢量反映所述待编码图像中的物体整体在参考图像中偏移的方向与距离。
可选地,所述帧间预测操作包括编码单元决策操作,所述目标编码区域的数量为两个,两个目标编码区域分别为所述待编码图像的亮度分量对应的第一图像区域和所述待编码图像的色度分量对应的第二图像区域;所述处理器,用于:
根据亮度分量对应的运动矢量和所述待编码图像的色度分量对应的参考区域的图像块,确定所述第二图像区域的预测值。
可选地,所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块的像素数据不同,以及所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块尺寸不相同。
可选地,所述帧间预测操作包括整像素运动估计、分像素运动估计、编码单元决策操作以及模式决策操作,其中,所述整像素运动估计和所述分像素运动估计使用具有相同的第一尺寸的第一图像块,所述编码单元决策操作使用第二尺寸的第二图像块,所述模式决策操作使用第三尺寸的第三图像块,所述第三尺寸大于所述第一尺寸和所述第二尺寸。
可选地,所述帧间预测操作对应的第一硬件结构和第一存储方式与所述编码装置对应的解码装置执行帧间预测操作对应的第二硬件结构和第二存储方式相同。
可选地,所述视频编码装置与视频解码装置包含于同一芯片或同一IP核中;
其中,所述帧间预测操作对应的第一硬件结构与所述视频解码装置执行帧间预测操作对应的第二硬件结构共用同一套逻辑电路,并且,所述帧间预测操作对应的第一存储方式与所述视频解码装置实现帧间预测操作对应的第二存储方式共用同一存储资源。
第四方面,本发明实施例提供一种视频解码装置,包括存储器、处理器;其中,所述存储器上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器实现:
获取已编码图像对应的全局运动矢量;
在已编码图像中确定目标解码区域;
基于所述全局运动矢量,在参考图像中确定与所述目标解码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
在所述第二存储器中,读取所述参考区域的图像块;
基于读取的所述参考区域的所述图像块,对所述目标解码区域的图像进行帧间预测操作;
基于所述帧间预测操作的结果,对所述目标解码区域的图像进行解码处理。
可选地,所述帧间预测操作对应的第二硬件结构和第二存储方式与所述解码装置对应的编码装置执行帧间预测操作对应的第一硬件结构和第一存储方式相同。
可选地,所述视频解码装置与视频编码装置包含于同一芯片或同一IP核中;
其中,所述帧间预测操作对应的第二硬件结构与所述视频编码装置实现帧间预测操作对应的第一硬件结构共用同一套逻辑电路,并且,所述帧间预测操作对应的第二存储方式与所述视频编码装置实现帧间预测操作对应的第一存储方式共用同一存储资源。
可选地,模式决策操作支持的决策模式包括skip、merge或者amvp。
第五方面,本发明实施例提供一种可移动平台,包括第三方面中的视频编码装置。
第六方面,本发明实施例提供一种遥控器,包括第四方面中的视频解码装置。
第七方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可 读存储介质上存储有可执行代码,当所述可执行代码被可移动平台的处理器执行时,使所述处理器至少可以实现第一方面中的视频编解码方法。
第八方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有可执行代码,当所述可执行代码被可移动平台的处理器执行时,使所述处理器至少可以实现第一方面中的视频编解码方法。
通过本发明实施例提供的视频编解码方法、装置、可移动平台和存储介质,可以高效地利用参考图像进行视频编解码。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种编码端的结构示意图;
图2为本发明实施例提供的一种视频编解码方法的流程图示意图;
图3为本发明实施例提供的一种视频编解码方法的流程图示意图;
图4为本发明实施例提供的一种参考区域确定示意图;
图5a为本发明实施例提供的一种视频编码装置的结构示意图;
图5b为本发明实施例提供的一种视频解码装置的结构示意图;
图6a为本发明实施例提供的一种图像块确定示意图;
图6b为本发明实施例提供的另一种图像块确定示意图;
图7a为本发明实施例提供的一种视频编码装置的结构示意图;
图7b为本发明实施例提供的一种视频解码装置的结构示意图;
图8a为本发明实施例提供的一种可移动平台的结构示意图。
图8b为本发明实施例提供的另一种遥控器的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义,“多种”一般包含至少两种。
取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。
另外,下述各方法实施例中的步骤时序仅为一种举例,而非严格限定。
本发明实施例提供的方法可以在编码端或者解码端中实现。下面对编码端的结构进行简单的介绍。在编码端中,原始的视频帧会被进行以下处理:预测、变换、量化、熵编码、反量化、反变换、重建、滤波等。对应这些处理过程,如图1所示,编码端可以包括编码帧内预测模块、编码帧间预测模块、变换模块、量化模块、熵编码模块、反量化模块、反变换模块、重建模块、滤波模块、参考图像缓存模块。
在图1中,编码帧内预测模块、编码帧间预测模块可以基于重建帧分别确定帧内预测数据、帧内预测相关信息、帧间预测数据、帧间预测相关信息。与编码帧内预测模块和编码帧间预测模块相连的开关用于选择使用编码帧内预测模块还是编码帧间预测模块,由被选择的模块向加法器提供帧内预测数据或者帧间预测数据。帧内预测数据或者帧间预测数据经过加法器之后,得到预测残 差。预测残差经过变换、量化处理,得到量化系数。量化系数、帧内预测相关信息、帧间预测相关信息等被输入到熵编码器中进行熵编码,最终得到用于向解码端发送的编码数据。
在确定帧内预测数据、帧间预测数据时,需要获取参考图像,参考图像可以被存储在参考图像缓存模块中,在使用时可以从参考图像缓存模块中读取出。参考图像可以通过以下方式获得:将量化系数进行反量化、反变换,以恢复预测残差。在重建模块,预测残差被加回到相应的帧内预测数据、帧间预测数据上,得到重建帧。重建帧是失真的视频帧,在变换以及量化的过程中,丢失了原始的视频帧的某些信息,如原始的视频帧中的高频分量信息,导致重建帧与原始的视频帧之间存在失真现象。因此,需要对重建帧进行相应的处理,以减小重建帧和原始的视频帧之间的失真现象。具体做法可以是对重建帧进行滤波处理,滤波处理可以包括去块滤波处理、补偿处理等。在对失真的视频帧进行滤波处理之后,就可以得到参考图像。
在本发明中主要是提供在确定帧间预测数据的过程中读取参考图像的方法,通过本发明提供的数据读取方法可以提高数据读取效率。
图2为本发明实施例提供的一种视频编码方法的流程图,如图2所示,该方法包括如下步骤:
步骤201、获取待编码图像对应的全局运动矢量。
步骤202、在待编码图像中确定目标编码区域。
步骤203、基于全局运动矢量,在参考图像中确定与目标编码区域对应的参考区域。其中,参考图像被存储于第一存储器中。
步骤204、读取参考图像中的参考区域,并将参考区域存储于第二存储器中,参考区域的尺寸大于目标编码区域的尺寸。
步骤205、在第二存储器中,读取参考区域的图像块。
步骤206、基于读取的参考区域的图像块,对目标编码区域的图像进行帧间预测操作。
步骤207、基于所述帧间预测操作的结果,对所述目标编码区域的图像进行 编码处理。
在关于视频编码的实际应用中,进行帧间预测操作需要使用参考图像,每次帧间预测操作是对待编码图像中的一部分图像进行相关处理,该待编码图像中的一部分图像可以是目标编码区域。
需要说明的是,在对待编码图像进行编码之前,可以先对整个的待编码图像进行划分,得到多个编码树块(Coding Tree Unit,CTU),然后分别对每个CTU进行编码。编码操作实际可以包括帧内预测、帧间预测、变换处理、量化处理、熵编码等几个过程,在不同过程中还可以继续对CTU进行划分,以更小的划分单位进行上述过程。例如,可以按照四叉树划分方式对CTU进行划分,得到多个编码块(Coding Unit,CU)。本发明实施例中的目标编码区域可以是CTU或者CU。
在对目标编码区域进行帧间预测操作时,可以不使用整个参考图像,而是可以使用参考图像的一部分相关图像即可。
需要说明的是,参考图像存储在第一存储器中,第一存储器可以是双倍速率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,DDR)。第一存储器可以是外部存储器,而对目标编码区域进行帧间预测操作的过程中需要读取第二存储器中存储的数据,第二存储器可以是内部存储器(例如,线缓存器(line buffer))。该线缓存器可以用静态静态随机存取存储器(SRAM)来实现。如果需要对目标编码区域进行帧间预测操作,则首先需要将参考图像的一部分相关图像从第一存储器存储到第二存储器中,再基于第二存储器中存储的图像进行帧间预测操作。
在从第一存储器将参考图像的一部分相关图像读取到第二存储器之前,首先可以确定需要读取参考图像的哪部分图像区域。在本发明实施例中,可以基于待编码图像对应的全局运动矢量(Global Motion Vector,GMV)确定参考图像中与目标编码区域对应的参考区域。其中,全局运动矢量反映了待编码图像中的物体整体在参考图像中偏移的方向与距离。
确定全局运动矢量的过程可以实现为:通过图像信号处理器(Image Signal  Processor,ISP)计算出待编码图像对应的全局运动矢量,将待编码图像对应的全局运动矢量发送给编码端。或者,编码端也可以自动计算待编码图像对应的全局运动矢量。可以基于待编码图像之前N帧图像计算待编码图像对应的全局运动矢量,其中,N可以是1或者2。
下面介绍基于待编码图像对应的全局运动矢量确定参考图像中与目标编码区域对应的参考区域的具体实施方式。
可选地,基于全局运动矢量,在参考图像中确定与目标编码区域对应的参考区域的过程可以实现为:确定目标编码区域中预设像素点的初始位置;将初始位置叠加目标编码区域对应的全局运动矢量,得到预设像素点的移动位置;基于移动位置,在参考图像中,确定与目标编码区域对应的参考区域。
为了便于理解,以图4为例说明确定参考区域的过程。图4中左图表示待编码图像,右图表示参考图像。参考图像中与字母“A”标注的方框所表示的CTU位于相同位置的CTU通过字母“B”标注出。从字母“B”标注的方框所表示的CTU左上角位置上的像素X起始,偏移全局运动矢量所指示的方向与距离,能够找到另外一个像素Y。以该像素Y作为另一个CTU左上角位置上的像素,可以确定出图4中字母“C”标注的方框所表示的另一个CTU。然后,从字母“C”标注的CTU的上下边界沿参考图像的竖直方向分别向外扩展第一距离m,且从字母“C”标注的CTU行的左右边界沿参考图像的水平方向分别向外扩展第二距离x,就可以得到参考区域。
请参见图5a。如图5a所示,视频编码装置包括整像素搜索模块,分像素搜索模块,编码单元决策模块,模式决策模块,采样自适应偏移估计模块,去块滤波模块,采样自适应偏移滤波模块,以及熵编码模块。其中,视频编码装置通过线缓冲器控制器从参考像素线缓冲器读取数据。在本发明实施例中,参考区域也可以称为是线缓存器(line buffer)的范围,该范围限定了目标编码区域在帧间预测操作的过程中获取参考数据的范围。在确定了目标参考区域之后,可以读取参考图像中的参考区域,并将参考区域存储于第二存储器中。需要说明的是,帧间预测操作又可以由几个具有不同处理功能的模块实现,包括整像 素搜索模块(以下简称为IME模块)、分像素搜索模块(以下简称为FME模块)、编码单元决策模块(以下简称为CUD模块)以及模式决策模块(以下简称为MD模块)。其中,IME模块中可以对目标编码区域进行整像素运动估计处理,FME模块中可以对目标编码区域进行分像素运动估计处理,CUD模块中可以对目标编码区域进行编码单元决策操作,MD模块中可以进行模式决策操作。在不同模块中需要使用参考区域的相同或者不同图像块进行帧间预测操作,不同模块有各自对应的存储单元,可以将需要使用的图像块读取到各自对应的存储单元中再进行帧间预测操作。如图5a所示,图像块可以从外部存储器读取到帧间预测的各个模块分别对应的存储单元中。
下面将依次介绍上述4个不同模块在执行帧间预测操作时确定图像块以及使用图像块进行帧间预测的过程。需要说明的是,IME模块和FME模块可以共用相同的第一图像块进行帧间预测操作,下面将介绍IME模块和FME模块在执行帧间预测操作时确定第一图像块以及使用第一图像块进行帧间预测的过程。
在实际应用中,可以读取图像块的第一图像块,并将第一图像块存储于IME的存储单元和FME的存储单元中。其中,第一图像块为图像块的一部分,IME的存储单元和FME的存储单元不同于第一存储器和第二存储器。然后在整像素运动估计过程中和分像素运动估计的过程中,基于第一图像块,分别确定目标编码区域对应的整像素运动矢量和分像素运动矢量。
上述IME的存储单元、FME的存储单元与第一存储器和第二存储器都位于不同的装置中。
可选地,确定第一图像块的过程可以实现为:获取预先设置的图像块尺寸;在参考图像中,确定尺寸等于该图像块尺寸且覆盖移动位置的第一图像块。
在实际应用中,可以先设定图像块尺寸,例如目标编码区域为CU,该CU的大小为16×16,假设设置该CU对应的图像块尺寸为32×32,那么可以在参考图像中基于运动矢量确定目标编码区域对应的移动位置,然后覆盖该移动位置选取一块大小为32×32的第一图像块。其中,运动矢量可以是全局运动矢量。
为了便于理解,以图6a和图6b为例说明确定第一图像块的过程。在图6a 中,确定目标编码区域中左上角像素点的初始位置。然后将初始位置叠加目标编码区域对应的运动矢量,得到左上角像素点的移动位置。最后在参考图像中,确定尺寸等于图像块尺寸32×32且以移动位置为左上角像素点的第一图像块。
在图6b中,前两步与图6a对应的实施方式相同,即先确定目标编码区域中左上角像素点的初始位置,然后将初始位置叠加目标编码区域对应的运动矢量,得到左上角像素点的移动位置。假设目标编码区域的大小是16×16,图像块尺寸为32×32。在最后一步中,可以在参考图像中,确定以移动位置为左上角像素点且大小为16×16的图像块A。然后再将图像块A的上下边界沿参考图像的竖直方向分别向外扩展16行像素,并将图像块A的左右边界沿参考图像的水平方向分别向外扩展16列像素,这样可以得到大小为32×32的图像块B。图像块A在图像块B的中间位置,图像块B即为IME模块和FME模块在执行帧间预测操作时需要使用的第一图像块。
在上述确定第一图像块的过程中,需要先知道图像块尺寸,图像块尺寸可以根据预定的规则进行确定。确定图像块尺寸的具体实施过程可以是:根据在整像素运动估计过程中进行运动搜索的范围以及在分像素运动估计的过程中预留像素大小,确定图像块尺寸。也就是说,当在整像素运动估计过程中进行运动搜索的范围越大且在分像素运动估计的过程中预留像素越多时,图像块尺寸越大;反之,当在整像素运动估计过程中进行运动搜索的范围越小且在分像素运动估计的过程中预留像素越少时,图像块尺寸越小。
上述运动搜索的范围可以根据实际需求进行设定,例如可以设定运动搜索范围为周围4个整像素点。
可以理解的是,为了减少各模块向线缓冲器请求数据的次数,可以将IME模块使用的第一图像块传给FME模块,换句话说可以将第一图像块从IME模块的存储单元中拷贝到FME模块的存储单元,这样IME模块和FME模块之间就可以共享第一图像块。基于此,在上述确定第一图像块尺寸的过程中,除了可以考虑在整像素运动估计过程中进行运动搜索的范围之外,还将在分像素运动估计的过程中预留像素的因素纳入到考虑范围之内,使得IME模块能一次性向线 缓冲器请求一个较大的第一图像块,该较大的第一图像块既可以满足IME模块也可以FME模块在帧间预测过程中的使用需求。
另外,通过本发明实施例提供的方法确定出的第一图像块需要在参考区域内,如果最终发现第一图像块不在参考区域内,那么就可以通过修正搜索起始点的方式保证第一图像块在参考区域内。
可选地,可以读取图像块的第一图像块,并将第一图像块存储于第三存储器中。其中,第一图像块为图像块的一部分,第三存储器不同于第一存储器和第二存储器。然后根据参考区域的第一图像块,获取在整像素运动估计过程中计算出的目标编码区域对应的整像素运动矢量。在一个实施方式中,第一存储器为双倍速率同步动态随机存储器,第二存储器为线缓存器,以及第三存储器为整像素搜索模块中的寄存器或存储单元。
实际应用中,在确定出IME模块需要使用的第一图像块之后,可以基于该第一图像块进行整像素运动估计过程,以得到当前CU对应的最优整像素运动矢量。假设一个CTU大小为32×32,编码器支持的CU大小为16×16,那么可以将一个CTU划分为4个CU。针对每个CU都能确定出对应的最优整像素运动矢量,那么就可以确定出4个最优整像素运动矢量,在计算出4个CU分别对应的最优整像素运动矢量之后,可以将4个CU分别对应的最优整像素运动矢量以及IME模块使用的第一图像块一同传给FME模块。
可选地,参考区域的第一图像块进一步可以用于分像素运动估计,以用于确定目标编码区域的最优分像素运动矢量。具体可以根据整像素运动估计得到的运动矢量和整像素运动估计过程中使用到的关于参考区域的第一图像块,确定在分像素运动估计过程中计算出的目标编码区域对应的分像素运动矢量。其中,所述参考区域的图像块为亮度分量对应的图像块。
实际应用中,FME模块可以根据每个CU对应的整像素运动矢量以及第一子数据块进行分像素运动估计,得到最优分像素运动矢量,例如是最优1/4像素运动矢量。在进行分像素运动估计之后,除了可以得到最优分像素运动矢量之外,还可以得到亮度分量的帧间预测值。FME模块可以将各CU分别对应的最优 分像素运动矢量以及亮度分量的帧间预测值发送到CUD模块。在一个实施方式中,所述最优分像素运动矢量为关于亮度分量的分像素运动矢量。
在CUD模块中可以执行编码单元决策操作。在一实施方式中,CUD模块可以根据FME模块传送过来的最优分像素运动矢量以及当前CU在待编码图像中的位置,向线缓冲器获取色度分量的图像数据。然后基于色度分量的图像数据,确定待编码图像的色度分量对应的图像区域的预测值。
CUD模块能够计算每个CU的率失真代价(RD cost)。首先,CUD模块会进行色度分量的预测得到预测值,然后将亮度分量和色度分量的预测值分别与原始像素值作差得到残差,再将残差进行变换量化、反量化和反变换过程得到失真估计值,同时还会对编码模式信息以及编码系数进行比特估计得到比特估计值。接下来,CUD模块根据失真估计值和比特估计值计算得到的率失真代价,在得到每个CU的率失真代价之后进行不同CU划分方式的决策。例如,在第一种划分方式中,尺寸为32x32的编码树单元能够被划分为4个尺寸为16x16的CU。在第二种划分方式中,尺寸为32x32的编码树单元能够被划分为16个尺寸为8x8的CU。那么,需要在将在第一种划分方式中的4个尺寸为16x16的CU的率失真代价和第二种划分方式中的16个尺寸为8x8的CU的率失真代价之和进行比较,选择率失真相对较小的划分方式。也就是说,若第一种划分方式的率失真代价小于第二种划分方式的率失真代价,则选择将尺寸为32x32的编码树单元划分为4个尺寸为16x16的CU。反之,若第一种划分方式的率失真代价大于第二种划分方式的率失真代价,则选择将尺寸为32x32的编码树单元划分为16个尺寸为8x8的CU。需要说明的是,在编码单元决策阶段最后得到的率失真代价为amvp模式的率失真代价。
在本发明的一实施方式中,待编码图像的亮度分量对应的参考区域的图像块与待编码图像的色度分量对应的参考区域的图像块的像素数据不同,以及待编码图像的亮度分量对应的参考区域的图像块与待编码图像的色度分量对应的参考区域的图像块尺寸不相同。例如,在420采样格式下,色度分量的宽和高分别只有亮度分量的宽和高的一半。举例来说,假设当前CU的大小为16×16, 对应的色度分量图像块的大小为8×8。考虑到插值过程中需要预留像素,可以设定向线缓冲器获取色度分量的图像数据的大小为16×16。在获取到各CU分别对应的色度分量的图像数据之后,可以根据色度分量的图像数据进行色度插值预测,得到待编码图像的色度分量对应的图像区域的预测值。
CUD模块可以执行编码单元决策操作。在执行编码单元决策操作的过程中,需要两个目标编码区域。两个目标编码区域分别为待编码图像的亮度分量对应的第一图像区域和待编码图像的色度分量对应的第二图像区域。可以根据亮度分量对应的运动矢量和待编码图像的色度分量对应的参考区域的图像块,确定第二图像区域的预测值。
CUD模块能够决策出CU的划分方式以及该CU对应的率失真代价。即,amvp模式的率失真代价。进一步,MD模块能够确定对应的CU在skip和merge模式下的预测块以及计算对应的CU的率失真代价。之后,MD模块将amvp模式下、skip模式和merge模式下CU的率失真代价进行比较,决策出该CU的帧间编码模式。
可以理解的是,由于需要做不同大小的skip和merge决策模式,每个skip和merge决策模式预测的过程中又对应多个运动矢量,这样会多次向线缓冲器请求图像块。在本发明实施例中,可以通过向线缓冲器获取一个对应于多个运动矢量的、尺寸较大的图像块的方式,实现避免多次向线缓冲器请求图像块的目的。基于此,可选地,整像素运动估计和分像素运动估计使用具有相同的第一尺寸的第一图像块,编码单元决策操作使用第二尺寸的第二图像块,模式决策操作使用第三尺寸的第三图像块,第三尺寸大于第一尺寸和第二尺寸。
在一个实施方式中,编码方法中帧间预测操作对应的第一硬件结构和第一存储方式与编码方法对应的解码方法中的帧间预测操作对应的第二硬件结构和第二存储方式相同。即,所述帧间预测操作对应的第一硬件结构和第一存储方式与所述编码装置对应的解码装置执行帧间预测操作对应的第二硬件结构和第二存储方式相同。例如,当无人飞行器包括编码装置并且遥控器包括解码装置的时候,无人飞行器中的编码装置执行帧间预测操作对应的硬件结构和存储方 式和遥控器中解码装置执行帧间预测操作对应的硬件结构和存储方式相同。
在另一实施方式中,视频编码装置与视频解码装置包含于同一芯片或同一IP核中;其中,视频编码装置执行帧间预测操作对应的第一硬件结构与视频解码装置执行帧间预测操作对应的第二硬件结构共用同一套逻辑电路,并且,视频编码装置中的帧间预测操作对应的第一存储方式与视频解码装置中的帧间预测操作对应的第二存储方式共用同一存储资源。例如,可以在一芯片中同时包括视频编码装置和视频解码装置。当该芯片应用于无人飞行器时,使能该芯片中的视频编码装置对应的硬件电路,禁能该芯片中的视频解码装置对应的硬件电路。当该芯片应用于遥控器时,使能该芯片中的视频解码装置对应的硬件电路,禁能该芯片中视频编码装置对应的硬件电路。由于视频编码装置和视频解码装置能够包含于同一芯片或者IP核中,并且视频编码装置和视频解码装置能够共用同一逻辑电路和采用同一存储资源(例如,同一存储器或同一存储单元)的方式,因此,在芯片的设计开发过程中,能够节省芯片面积和资源,节约开发成本和使用成本。
上面介绍了编码端进行帧间预测的过程,下面将介绍解码端进行帧间预测的过程。图3为本发明实施例提供的一种视频解码方法的流程图,如图2所示,该方法包括如下步骤:
步骤301、获取已编码图像对应的全局运动矢量;
步骤302、在已编码图像中确定目标解码区域;
步骤303、基于全局运动矢量,在参考图像中确定与目标解码区域对应的参考区域;其中,参考图像被存储于第一存储器中;
步骤304、读取参考图像中的参考区域,并将参考区域存储于第二存储器中,参考区域的尺寸大于目标编码区域的尺寸;
步骤305、在第二存储器中,读取参考区域的图像块;
步骤306、基于读取的参考区域的图像块,对目标解码区域的图像进行帧间预测操作;
步骤307、基于帧间预测操作的结果,对目标解码区域的图像进行解码处理。
在关于视频解码的实际应用中,帧间预测操作需要使用参考图像。帧间操作是对已解码图像中的一部分图像进行相关处理。该待解码图像中的一部分图像可以是目标解码区域。
请参见图5b。图5b为本发明实施例提供的一种视频解码装置的结构示意图。如图5b所示,视频解码装置包括熵解码模块,模式决策模块,采用自适应参数估计模块,去块滤波模块,采样自适应偏移滤波模块,以及像素缓冲器。其中,模式决策模块包括先进运动矢量预测(amvp)模块,帧内(intra)模块,跳过(skip)模块,以及合并(merge)模块。其中,视频解码装置通过线缓冲器控制器从参考像素线缓冲器读取数据。在解码端MD模块中,可以根据当前CTU的位置以及全局运动矢量,确定当前CTU对应的线缓冲器范围。在解码过程中,支持的模式决策可以包括skip、merge或者amvp。也就是说,解码端需要进行帧间预测的解码重建过程,包括amvp、skip、merge决策模式的解码重建。由于编码端也有skip和merge决策模式的插值预测过程,因此,在解码端的跳过模块和合并模块和在编码端的跳过模块和合并模块具有相同的电路结构。例如,在解码端的skip和merge决策模式和在编码端的skip和merge决策模式基于相同的硬件结构和/或相同的存储方式获取参考图像块。
此外,skip和merge决策模式的图像块的获取方式可以复用编码端的图像块的获取方式。对于amvp决策模式的插值预测过程,由于编码端amvp决策模式的图像块实际是由IME模块和CUD模块向线缓冲器请求获取得到的,因此解码端MD模块如果只是复用编码端的图像块的获取方式,则可能会读取不到amvp决策模式对应的图像块。因此,可以直接根据当前CU的位置以及运动矢量,向线缓冲器请求图像块。考虑到插值预测过程需要预留像素,在一种可能的实现方式中,可以设定为大小为16×16的一个CU获取的亮度分量的图像块的大小为24×24、色度分量的图像块的大小为16×16。
整个解码端MD模块的图像块的获取方式可以是先预取6个图像块,例如6个44×44的图像块。其中,6个44×44的图像块中的2个为亮度块对应的参考区域的图像块以及6个44×44的图像块中的另外4个分别为U分量色度块对应 的参考区域的图像块和V分量色度块对应的参考区域的图像块,亮度块的大小为44×44,U分量和V分量的色度块的大小均为22×22。
在一个实施方式中,解码方法对应的帧间预测操作对应的第二硬件结构和第二存储方式与解码方法对应的编码方法中的帧间预测操作对应的第一硬件结构和第一存储方式相同。即,所述帧间预测操作对应的第二硬件结构和第二存储方式与所述解码装置对应的编码装置执行帧间预测操作对应的第一硬件结构和第一存储方式相同。例如,当无人飞行器包括编码装置并且遥控器包括解码装置的时候,无人飞行器中的编码装置执行帧间预测操作对应的硬件结构和存储方式和遥控器中解码装置执行帧间预测操作对应的硬件结构和存储方式相同。
在另一实施方式中,视频编码装置与视频解码装置包含于同一芯片或同一IP核中;其中,视频解码装置执行帧间预测操作对应的第二硬件结构与视频编码装置执行帧间预测操作对应的第一硬件结构共用同一套逻辑电路,并且,视频解码装置中的帧间预测操作对应的第二存储方式与视频编码装置中的帧间预测操作对应的第一存储方式共用同一存储资源。例如,可以在一芯片中同时包括视频编码装置和视频解码装置。当该芯片应用于无人飞行器时,使能该芯片中的视频编码装置对应的硬件电路,禁能该芯片中的视频解码装置对应的硬件电路。当该芯片应用于遥控器时,使能该芯片中的视频解码装置对应的硬件电路,禁能该芯片中视频编码装置对应的硬件电路。由于视频编码装置和视频解码装置能够包含于同一芯片或者IP核中,并且视频编码装置和视频解码装置能够共用同一逻辑电路和采用同一存储资源(例如,同一存储器或同一存储单元)的方式,因此,在芯片的设计开发过程中,能够节省芯片面积和资源,节约开发成本和使用成本
通过本发明实施例提供的方法,能够实现高集成度编码器、解码器中帧间预测过程中图像块的获取,该方法适用于线缓冲器的架构,实现复杂度较低,硬件资源成本和带宽消耗较低,且性价比较高。另外,本发明实施例提供的方法能够减少不同模块与线缓冲器之间的交互次数,能够降低硬件实现风险。
通过本发明实施例提供的方法,可以基于全局运动矢量确定进行帧间预测 的参考区域,基于参考区域内的图像块进行帧间预测,这样避免了从第一存储器中拷贝全部的参考图像到第二存储器中以基于整个参考图像进行帧间预测。由于需要拷贝读取的数据量减低,因此对读取带宽的消耗也随之降低,可以高效地利用参考图像进行帧间预测。
本发明又一示例性实施例提供了一种视频编码装置,如图7a所示,该装置包括:
存储器1910,用于存储计算机程序;
处理器1920,用于运行存储器1910中存储的计算机程序以实现:
获取待编码图像对应的全局运动矢量;
在待编码图像中确定目标编码区域;
基于所述全局运动矢量,在参考图像中确定与所述目标编码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
在所述第二存储器中,读取所述参考区域的图像块;
基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作;
基于所述帧间预测操作的结果,对所述目标编码区域的图像进行编码处理。
可选地,所述处理器1920,用于:
确定所述目标编码区域中预设像素点的初始位置;
将所述初始位置叠加所述全局运动矢量,得到所述预设像素点的移动位置;
基于所述移动位置,在所述参考图像中,确定与所述目标编码区域对应的参考区域。
可选地,所述处理器1920,用于:
获取预先设置的所述参考区域的尺寸;
在所述参考图像中,确定尺寸等于所述参考区域的尺寸且覆盖所述移动位置的图像区域,作为与所述目标编码区域对应的参考区域。
可选地,所述帧间预测操作包括整像素运动估计IME,所述参考区域的尺寸为根据在所述整像素运动估计过程中进行运动搜索的范围以及在分像素运动估计FME的过程中预留像素大小确定的。
可选地,所述帧间预测操作包括整像素运动估计和分像素运动估计;所述处理器1920,用于:
读取所述图像块的第一图像块,并将所述第一图像块存储于IME的存储单元和FME的存储单元中,其中,所述第一图像块为所述图像块的一部分,所述IME的存储单元和所述FME的存储单元不同于所述第一存储器和所述第二存储器;
在所述整像素运动估计过程中和所述分像素运动估计的过程中,基于所述第一图像块,分别确定所述目标编码区域对应的整像素运动矢量和分像素运动矢量。
可选地,所述处理器1920,用于:
读取所述图像块的第一图像块,并将所述第一图像块存储于第三存储器中,其中,所述第一图像块为所述图像块的一部分,所述第三存储器不同于所述第一存储器和所述第二存储器;以及
根据所述参考区域的所述第一图像块,获取在整像素运动估计过程中计算出的所述目标编码区域对应的整像素运动矢量。
可选地,所述参考区域的所述第一图像块进一步用于分像素运动估计,以用于确定所述目标编码区域的最优分像素运动矢量。
可选地,所述帧间预测操作包括分像素运动估计,所述参考区域的第一图像块的尺寸为根据在整像素运动估计过程中进行运动搜索的范围以及在所述分像素运动估计的过程中预留像素大小确定的。
可选地,所述处理器1920,用于:
根据所述整像素运动估计得到的运动矢量和所述整像素运动估计过程中使用到的关于参考区域的图像块,获取在所述分像素运动估计过程中计算出的所述目标编码区域对应的分像素运动矢量。其中,所述参考区域的图像块为亮度分量对应的图像块。
可选地,将所述目标编码区域对应的所述分像素运动矢量用于编码单元决策操作,以用于确定所述待编码图像的色度分量对应的图像区域的预测值。
可选地,所述全局运动矢量是基于所述待编码图像的上一帧图像中的图像块对应的运动矢量而被确定的;或者
所述全局运动矢量是从图像信号处理器获取的。其中,所述全局运动矢量反映所述待编码图像中的物体整体在参考图像中偏移的方向与距离。
可选地,所述帧间预测操作包括编码单元决策操作,所述目标编码区域的数量为两个,两个目标编码区域分别为所述待编码图像的亮度分量对应的第一图像区域和所述待编码图像的色度分量对应的第二图像区域;所述处理器1920,用于:
根据亮度分量对应的运动矢量和所述待编码图像的色度分量对应的参考区域的图像块,确定所述第二图像区域的预测值。
可选地,所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块的像素数据不同,以及所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块尺寸不相同。
可选地,所述帧间预测操作包括整像素运动估计、分像素运动估计、编码单元决策操作以及模式决策操作,其中,所述整像素运动估计和所述分像素运动估计使用具有相同的第一尺寸的第一图像块,所述编码单元决策操作使用第二尺寸的第二图像块,所述模式决策操作使用第三尺寸的第三图像块,所述第三尺寸大于所述第一尺寸和所述第二尺寸。
本发明又一示例性实施例提供了一种视频编码装置,如图7b所示,该装置包括:
存储器1910’,用于存储计算机程序;
处理器1920’,用于运行存储器1910’中存储的计算机程序以实现:
基于待编码图像对应的全局运动矢量,在参考图像中确定与所述待编码图像中的目标编码区域对应的参考区域;其中,所述参考图像被存储于第一存储 器中;
读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
在所述第二存储器中,读取所述参考区域的第一图像块;
基于所述第一图像块,对所述目标编码区域的图像进行第一帧间预测操作;
基于所述第一帧间预测操作的结果,对所述目标编码区域的图像进行编码处理;
在已编码图像中确定目标解码区域;
在所述第二存储器中,读取所述参考区域的第二图像块;
基于所述第二图像块,对所述目标解码区域的图像进行第二帧间预测操作;
基于所述第二帧间预测操作的结果,对所述目标解码区域的图像进行解码处理。
可选地,所述处理器1920’,用于:所述第一帧间预测操作包括第一模式决策操作,以及所述第二帧间预测操作包括第二模式决策操作;
其中,所述第一模式决策操作和所述第二模式决策操作基于相同的硬件结构和/或相同的存储方式获取参考图像块。
可选地,所述第二模式决策操作支持的决策模式包括skip、merge或者amvp。
请参见图7a。图7a包括存储器1910和处理器1920。其中,处理器执行图7a所示的视频编码装置可以执行图1-图2、图4-图5a、图6a-图6b所示实施例的方法,本实施例未详细描述的部分,可参考对图1-图2、图4-图5a、图6a-图6b所示实施例的相关说明。该技术方案的执行过程和技术效果参见图1-图2、图4-图5a、图6a-图6b所示实施例中的描述,在此不再赘述。
请参见图7b。图7b包括存储器1910’和处理器1920’。其中,处理器执行图7b所示的视频编码装置可以执行图3-图4、图5b、图6a-图6b所示实施例的方法,本实施例未详细描述的部分,可参考对图3-图4、图5b、图6a-图6b所示实施例的相关说明。该技术方案的执行过程和技术效果参见图3-图4、图5b、图6a-图6b所示实施例中的描述,在此不再赘述。
如图8a所示,本发明实施例还提供了一种可移动平台,可移动平台包括图7a所示的视频编解码装置800。
所述视频编码方法可以应用在可移动平台中。
示例性的,所述可移动平台可以包括无人机、无人车、手持云台中的至少一种。
进一步而言,无人机可以为旋翼型无人机,例如四旋翼无人机、六旋翼无人机、八旋翼无人机,也可以是固定翼无人机。
如图8b所示,本发明实施例还提供了一种遥控器,遥控器包括图7b所示的视频编解码装置802。
所述视频编解码方法可以应用在遥控器台中。
另外,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有可执行代码,所述可执行代码用于实现如前述各实施例提供的视频编解码方法。
以上各个实施例中的技术方案、技术特征在不相冲突的情况下均可以单独,或者进行组合,只要未超出本领域技术人员的认知范围,均属于本发明保护范围内的等同实施例。
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (42)

  1. 一种视频编码方法,其特征在于,包括:
    获取待编码图像对应的全局运动矢量;
    在所述待编码图像中确定目标编码区域;
    基于所述全局运动矢量,在参考图像中确定与所述目标编码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
    读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
    在所述第二存储器中,读取所述参考区域的图像块;
    基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作;
    基于所述帧间预测操作的结果,对所述目标编码区域的图像进行编码处理。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述全局运动矢量,在参考图像中确定与所述目标编码区域对应的参考区域,包括:
    确定所述目标编码区域中预设像素点的初始位置;
    将所述初始位置叠加所述全局运动矢量,得到所述预设像素点的移动位置;
    基于所述移动位置,在所述参考图像中确定与所述目标编码区域对应的参考区域。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述移动位置,在所述参考图像中确定与所述目标编码区域对应的参考区域,包括:
    获取预先设置的所述参考区域的尺寸;
    在所述参考图像中,确定尺寸等于所述参考区域的尺寸且覆盖所述移动位置的图像区域,作为与所述目标编码区域对应的参考区域。
  4. 根据权利要求1所述的方法,其特征在于,所述帧间预测操作包括整像素运动估计IME,所述参考区域的尺寸为根据在所述整像素运动估计过程中进行运动搜索的范围以及在分像素运动估计FME的过程中预留像素的大小确定的。
  5. 根据权利要求1所述的方法,其特征在于,所述帧间预测操作包括整像素运动估计和分像素运动估计;
    所述在所述第二存储器中,读取所述参考区域的图像块,包括:
    读取所述参考区域的第一图像块,并将所述第一图像块存储于IME的存储单元和FME的存储单元中,其中,所述第一图像块为所述图像块的一部分,所述IME的存储单元和所述FME的存储单元不同于所述第一存储器和所述第二存储器;
    在所述整像素运动估计过程中和所述分像素运动估计的过程中,基于所述第一图像块,分别确定所述目标编码区域对应的整像素运动矢量和分像素运动矢量。
  6. 根据权利要求1所述的方法,其特征在于,所述在所述第二存储器中,读取所述参考区域的图像块,包括:
    读取所述参考区域的第一图像块,并将所述第一图像块存储于第三存储器中,其中,所述第一图像块为所述图像块的一部分,所述第三存储器不同于所述第一存储器和所述第二存储器;以及
    所述基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作,包括:
    根据所述参考区域的所述第一图像块,获取在整像素运动估计过程中计算出的所述目标编码区域对应的整像素运动矢量。
  7. 根据权利要求6所述的方法,其特征在于,所述参考区域的所述第一图像块进一步用于分像素运动估计,以用于确定所述目标编码区域的最优分像素运动矢量。
  8. 根据权利要求1所述的方法,其特征在于,所述帧间预测操作包括分像素运动估计,所述参考区域的第一图像块的尺寸为根据在整像素运动估计过程中进行运动搜索的范围以及在所述分像素运动估计的过程中预留像素的大小确定的。
  9. 根据权利要求8所述的方法,其特征在于,所述帧间预测操作,包括:
    根据所述整像素运动估计得到的运动矢量和所述整像素运动估计过程中使用到的关于参考区域的图像块,获取在所述分像素运动估计过程中计算出的所述目标编码区域对应的分像素运动矢量;
    其中,所述参考区域的图像块为亮度分量对应的图像块。
  10. 根据权利要求9所述的方法,其特征在于,将所述目标编码区域对应的所述分像素运动矢量用于编码单元决策操作,以用于确定所述待编码图像的色度分量对应的图像区域的预测值。
  11. 根据权利要求1所述的方法,其特征在于,所述全局运动矢量是基于所述待编码图像的上一帧图像中的图像块对应的运动矢量而被确定的;或者
    所述全局运动矢量是从图像信号处理器获取的;
    其中,所述全局运动矢量反映所述待编码图像中的物体整体在参考图像中偏移的方向与距离。
  12. 根据权利要求1所述的方法,其特征在于,所述帧间预测操作包括编码单元决策操作,所述目标编码区域的数量为两个,两个目标编码区域分别为所述待编码图像的亮度分量对应的第一图像区域和所述待编码图像的色度分量对应的第二图像区域;以及
    根据亮度分量对应的运动矢量和所述待编码图像的色度分量对应的参考区域的图像块,确定所述第二图像区域的预测值。
  13. 根据权利要求12所述的方法,其特征在于,所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块的像素数据不同,以及所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块尺寸不相同。
  14. 根据权利要求1所述的方法,其特征在于,所述帧间预测操作包括整像素运动估计、分像素运动估计、编码单元决策操作以及模式决策操作,其中,所述整像素运动估计和所述分像素运动估计使用具有相同的第一尺寸的第一图像块,所述编码单元决策操作使用第二尺寸的第二图像块,所述模式决策操作使用第三尺寸的第三图像块,所述第三尺寸大于所述第一尺寸和所述第二尺寸。
  15. 根据权利要求1所述的方法,其特征在于,所述帧间预测操作对应的第一硬件结构和第一存储方式与所述编码方法对应的解码方法中的帧间预测操作对应的第二硬件结构和第二存储方式相同;或者
    所述帧间预测操作对应的第一硬件结构能够作为视频解码方法中的帧间操作对应的硬件结构,并且所述第一存储方式对应的存储资源能够作为所述视频解码方法中的存储方式对应的存储资源。
  16. 一种视频解码方法,其特征在于,包括:
    获取已编码图像对应的全局运动矢量;
    在已编码图像中确定目标解码区域;
    基于所述全局运动矢量,在参考图像中确定与所述目标解码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
    读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
    在所述第二存储器中,读取所述参考区域的图像块;
    基于读取的所述参考区域的所述图像块,对所述目标解码区域的图像进行帧间预测操作;
    基于所述帧间预测操作的结果,对所述目标解码区域的图像进行解码处理。
  17. 根据权利要求16所述的方法,其特征在于,所述帧间预测操作对应的第二硬件结构和第二存储方式与所述解码方法对应的编码方法中的帧间预测操作对应的第一硬件结构和第一存储方式相同;或者
    所述帧间预测操作对应的第二硬件结构能够作为视频编码方法中的帧间操作对应的硬件结构,并且所述第二存储方式对应的存储资源能够作为所述视频编码方法中的存储方式对应的存储资源。
  18. 根据权利要求16所述的方法,其特征在于,模式决策操作支持的决策模式包括skip、merge或者amvp。
  19. 一种视频编码装置,其特征在于,包括存储器、处理器;其中,所述存储器上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述 处理器实现:
    获取待编码图像对应的全局运动矢量;
    在所述待编码图像中确定目标编码区域;
    基于所述全局运动矢量,在参考图像中,确定与所述目标编码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
    读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
    在所述第二存储器中,读取所述参考区域的图像块;
    基于读取的所述参考区域的所述图像块,对所述目标编码区域的图像进行帧间预测操作;
    基于所述帧间预测操作的结果,对所述目标编码区域的图像进行编码处理。
  20. 根据权利要求19所述的装置,其特征在于,所述处理器,用于:
    确定所述目标编码区域中预设像素点的初始位置;
    将所述初始位置叠加所述全局运动矢量,得到所述预设像素点的移动位置;
    基于所述移动位置,在所述参考图像中,确定与所述目标编码区域对应的参考区域。
  21. 根据权利要求20所述的装置,其特征在于,所述处理器,用于:
    获取预先设置的所述参考区域的尺寸;
    在所述参考图像中,确定尺寸等于所述参考区域的尺寸且覆盖所述移动位置的图像区域,作为与所述目标编码区域对应的参考区域。
  22. 根据权利要求19所述的装置,其特征在于,所述帧间预测操作包括整像素运动估计IME,所述参考区域的尺寸为根据在所述整像素运动估计过程中进行运动搜索的范围以及在分像素运动估计FME的过程中预留像素的大小确定的。
  23. 根据权利要求19所述的装置,其特征在于,所述帧间预测操作包括整像素运动估计和分像素运动估计;所述处理器,用于:
    读取所述参考区域的第一图像块,并将所述第一图像块存储于IME的存储单元和FME的存储单元中,其中,所述第一图像块为所述图像块的一部分,所 述IME的存储单元和所述FME的存储单元不同于所述第一存储器和所述第二存储器;
    在所述整像素运动估计过程中和所述分像素运动估计的过程中,基于所述第一图像块,分别确定所述目标编码区域对应的整像素运动矢量和分像素运动矢量。
  24. 根据权利要求19所述的装置,其特征在于,所述处理器,用于:
    读取所述参考区域的第一图像块,并将所述第一图像块存储于第三存储器中,其中,所述第一图像块为所述图像块的一部分,所述第三存储器不同于所述第一存储器和所述第二存储器;以及
    根据所述参考区域的所述第一图像块,获取在整像素运动估计过程中计算出的所述目标编码区域对应的整像素运动矢量。
  25. 根据权利要求24所述的装置,其特征在于,所述参考区域的所述第一图像块进一步用于分像素运动估计,以用于确定所述目标编码区域的最优分像素运动矢量。
  26. 根据权利要求19所述的装置,其特征在于,所述帧间预测操作包括分像素运动估计,所述参考区域的第一图像块的尺寸为根据在整像素运动估计过程中进行运动搜索的范围以及在所述分像素运动估计的过程中预留像素大小确定的。
  27. 根据权利要求26所述的装置,其特征在于,所述处理器,用于:
    根据所述整像素运动估计得到的运动矢量和所述整像素运动估计过程中使用到的关于参考区域的图像块,获取在所述分像素运动估计过程中计算出的所述目标编码区域对应的分像素运动矢量;
    其中,所述参考区域的图像块为亮度分量对应的图像块。
  28. 根据权利要求27所述的装置,其特征在于,将所述目标编码区域对应的所述分像素运动矢量用于编码单元决策操作,以用于确定所述待编码图像的色度分量对应的图像区域的预测值。
  29. 根据权利要求19所述的装置,其特征在于,所述全局运动矢量是基于 所述待编码图像的上一帧图像中的图像块对应的运动矢量而被确定的;或者
    所述全局运动矢量是从图像信号处理器获取的;其中,所述全局运动矢量反映所述待编码图像中的物体整体在参考图像中偏移的方向与距离。
  30. 根据权利要求19所述的装置,其特征在于,所述帧间预测操作包括编码单元决策操作,所述目标编码区域的数量为两个,两个目标编码区域分别为所述待编码图像的亮度分量对应的第一图像区域和所述待编码图像的色度分量对应的第二图像区域;所述处理器,用于:
    根据亮度分量对应的运动矢量和所述待编码图像的色度分量对应的参考区域的图像块,确定所述第二图像区域的预测值。
  31. 根据权利要求30所述的装置,其特征在于,所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块的像素数据不同,以及所述待编码图像的亮度分量对应的参考区域的图像块与所述待编码图像的色度分量对应的参考区域的图像块尺寸不相同。
  32. 根据权利要求19所述的装置,其特征在于,所述帧间预测操作包括整像素运动估计、分像素运动估计、编码单元决策操作以及模式决策操作,其中,所述整像素运动估计和所述分像素运动估计使用具有相同的第一尺寸的第一图像块,所述编码单元决策操作使用第二尺寸的第二图像块,所述模式决策操作使用第三尺寸的第三图像块,所述第三尺寸大于所述第一尺寸和所述第二尺寸。
  33. 根据权利要求19所述的装置,其特征在于,所述帧间预测操作对应的第一硬件结构和第一存储方式与所述编码装置对应的解码装置执行帧间预测操作对应的第二硬件结构和第二存储方式相同。
  34. 根据权利要求19所述的装置,其特征在于,所述视频编码装置与视频解码装置包含于同一芯片或同一IP核中;
    其中,所述帧间预测操作对应的第一硬件结构与所述视频解码装置执行帧间预测操作对应的第二硬件结构共用同一套逻辑电路,并且,所述帧间预测操作对应的第一存储方式与所述视频解码装置执行帧间预测操作对应的第二存储方式共用同一存储资源。
  35. 一种视频解码装置,其特征在于,包括存储器、处理器;其中,所述存储器上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器实现:
    获取已编码图像对应的全局运动矢量;
    在已编码图像中确定目标解码区域;
    基于所述全局运动矢量,在参考图像中确定与所述目标解码区域对应的参考区域;其中,所述参考图像被存储于第一存储器中;
    读取所述参考图像中的所述参考区域,并将所述参考区域存储于第二存储器中,所述参考区域的尺寸大于所述目标编码区域的尺寸;
    在所述第二存储器中,读取所述参考区域的图像块;
    基于读取的所述参考区域的所述图像块,对所述目标解码区域的图像进行帧间预测操作;
    基于所述帧间预测操作的结果,对所述目标解码区域的图像进行解码处理。
  36. 根据权利要求35所述的装置,其特征在于,所述帧间预测操作对应的第二硬件结构和第二存储方式与所述解码装置对应的编码装置执行帧间预测操作对应的第一硬件结构和第一存储方式相同。
  37. 根据权利要求35所述的装置,其特征在于,所述视频解码装置与视频编码装置包含于同一芯片或同一IP核中;
    其中,所述帧间预测操作对应的第二硬件结构与所述视频编码装置实现帧间预测操作对应的第一硬件结构共用同一套逻辑电路,并且,所述帧间预测操作对应的第二存储方式与所述视频编码装置实现帧间预测操作对应的第一存储方式共用同一存储资源。
  38. 根据权利要求35所述的装置,其特征在于,模式决策操作支持的决策模式包括skip、merge或者amvp。
  39. 一种可移动平台,其特征在于,包括权利要求18-34中任意一项所述的视频编码装置。
  40. 一种可移动平台,其特征在于,包括权利要求35-38中任意一项所述 的视频解码装置
  41. 一种计算机可读存储介质,其特征在于,所述存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,所述程序指令用于实现权利要求1-15中任一项所述的视频编解码方法。
  42. 一种计算机可读存储介质,其特征在于,所述存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,所述程序指令用于实现权利要求16-18中任一项所述的视频编解码方法。
PCT/CN2020/130367 2020-11-20 2020-11-20 视频编解码方法、装置、可移动平台和存储介质 WO2022104678A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/130367 WO2022104678A1 (zh) 2020-11-20 2020-11-20 视频编解码方法、装置、可移动平台和存储介质
CN202080070713.9A CN114762331A (zh) 2020-11-20 2020-11-20 视频编解码方法、装置、可移动平台和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/130367 WO2022104678A1 (zh) 2020-11-20 2020-11-20 视频编解码方法、装置、可移动平台和存储介质

Publications (1)

Publication Number Publication Date
WO2022104678A1 true WO2022104678A1 (zh) 2022-05-27

Family

ID=81708213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130367 WO2022104678A1 (zh) 2020-11-20 2020-11-20 视频编解码方法、装置、可移动平台和存储介质

Country Status (2)

Country Link
CN (1) CN114762331A (zh)
WO (1) WO2022104678A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439424A (zh) * 2022-08-23 2022-12-06 成都飞机工业(集团)有限责任公司 一种无人机航拍视频图像智能检测方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055717B (zh) * 2023-03-31 2023-07-14 湖南国科微电子股份有限公司 视频压缩方法、装置、计算机设备及计算机可读存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1554194A (zh) * 2001-09-12 2004-12-08 �ʼҷ����ֵ��ӹɷ����޹�˾ 运动估计和/或补偿
CN1925617A (zh) * 2005-08-29 2007-03-07 三星电子株式会社 提高的运动估计、视频编码方法及使用所述方法的设备
CN101505427A (zh) * 2009-02-20 2009-08-12 杭州爱威芯科技有限公司 视频压缩编码算法中的运动估计装置
CN102611826A (zh) * 2011-01-21 2012-07-25 索尼公司 图像处理装置、图像处理方法以及程序
US20180063547A1 (en) * 2016-08-23 2018-03-01 Canon Kabushiki Kaisha Motion vector detection apparatus and method for controlling the same
CN108702512A (zh) * 2017-10-31 2018-10-23 深圳市大疆创新科技有限公司 运动估计方法和装置
CN111479115A (zh) * 2020-04-14 2020-07-31 腾讯科技(深圳)有限公司 一种视频图像处理方法、装置及计算机可读存储介质
US10743023B2 (en) * 2015-12-04 2020-08-11 Sony Corporation Image processing apparatus and image processing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1554194A (zh) * 2001-09-12 2004-12-08 �ʼҷ����ֵ��ӹɷ����޹�˾ 运动估计和/或补偿
CN1925617A (zh) * 2005-08-29 2007-03-07 三星电子株式会社 提高的运动估计、视频编码方法及使用所述方法的设备
CN101505427A (zh) * 2009-02-20 2009-08-12 杭州爱威芯科技有限公司 视频压缩编码算法中的运动估计装置
CN102611826A (zh) * 2011-01-21 2012-07-25 索尼公司 图像处理装置、图像处理方法以及程序
US10743023B2 (en) * 2015-12-04 2020-08-11 Sony Corporation Image processing apparatus and image processing method
US20180063547A1 (en) * 2016-08-23 2018-03-01 Canon Kabushiki Kaisha Motion vector detection apparatus and method for controlling the same
CN108702512A (zh) * 2017-10-31 2018-10-23 深圳市大疆创新科技有限公司 运动估计方法和装置
CN111479115A (zh) * 2020-04-14 2020-07-31 腾讯科技(深圳)有限公司 一种视频图像处理方法、装置及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439424A (zh) * 2022-08-23 2022-12-06 成都飞机工业(集团)有限责任公司 一种无人机航拍视频图像智能检测方法
CN115439424B (zh) * 2022-08-23 2023-09-29 成都飞机工业(集团)有限责任公司 一种无人机航拍视频图像智能检测方法

Also Published As

Publication number Publication date
CN114762331A (zh) 2022-07-15

Similar Documents

Publication Publication Date Title
JP7269257B2 (ja) フレームレベル超解像ベースビデオ符号化
US9877044B2 (en) Video encoder and operation method thereof
JP4723025B2 (ja) 画像符号化方法および画像符号化装置
JP3861698B2 (ja) 画像情報符号化装置及び方法、画像情報復号装置及び方法、並びにプログラム
KR20060054485A (ko) 경계강도에 기초한 적응 필터링
KR20110039516A (ko) 움직임 추정을 위한 방법, 시스템 및 애플리케이션
CN113196783B (zh) 去块效应滤波自适应的编码器、解码器及对应方法
WO2022104678A1 (zh) 视频编解码方法、装置、可移动平台和存储介质
US9872017B2 (en) Method for coding a sequence of digitized images
WO2020232845A1 (zh) 一种帧间预测的方法和装置
JP2023521295A (ja) 映像符号化データをシグナリングするための方法
WO2020006690A1 (zh) 视频处理方法和设备
WO2023092256A1 (zh) 一种视频编码方法及其相关装置
WO2021244182A1 (zh) 视频编码方法、视频解码方法及相关设备
CN114071161B (zh) 图像编码方法、图像解码方法及相关装置
US20130156114A1 (en) Data Movement Reduction In Video Compression Systems
CN116250240A (zh) 图像编码方法、图像解码方法及相关装置
US8249373B2 (en) Image data decoding apparatus and method for decoding image data
WO2022110131A1 (zh) 帧间预测方法、装置、编码器、解码器和存储介质
WO2022037458A1 (zh) 视频编解码中的运动信息列表构建方法、装置及设备
KR20230162988A (ko) 멀티미디어 데이터 프로세싱 방법 및 장치, 컴퓨터 디바이스, 및 컴퓨터-판독가능 저장 매체
WO2020135368A1 (zh) 一种帧间预测的方法和装置
JP6234770B2 (ja) 動画像復号処理装置、動画像符号化処理装置およびその動作方法
JP2024513993A (ja) 方法、電子装置、非一時的コンピュータ可読記憶媒体、およびコンピュータプログラム
CN116527912A (zh) 编码视频数据处理方法和视频编码处理器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961969

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20961969

Country of ref document: EP

Kind code of ref document: A1