WO2017124298A1 - 视频编码、解码方法及其帧间预测方法、装置和系统 - Google Patents

视频编码、解码方法及其帧间预测方法、装置和系统 Download PDF

Info

Publication number
WO2017124298A1
WO2017124298A1 PCT/CN2016/071341 CN2016071341W WO2017124298A1 WO 2017124298 A1 WO2017124298 A1 WO 2017124298A1 CN 2016071341 W CN2016071341 W CN 2016071341W WO 2017124298 A1 WO2017124298 A1 WO 2017124298A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
current
pixel
motion vector
current image
Prior art date
Application number
PCT/CN2016/071341
Other languages
English (en)
French (fr)
Inventor
王振宇
王荣刚
姜秀宝
高文
Original Assignee
北京大学深圳研究生院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学深圳研究生院 filed Critical 北京大学深圳研究生院
Priority to PCT/CN2016/071341 priority Critical patent/WO2017124298A1/zh
Priority to US15/746,932 priority patent/US10425656B2/en
Publication of WO2017124298A1 publication Critical patent/WO2017124298A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • the present application relates to the field of digital video codec technology, and in particular, to a video encoding and decoding method, and an interframe prediction method, apparatus, and system.
  • panoramic images and panoramic video are an important part. Since panoramic video records the entire picture of a 360-degree view with a very high amount of data, compression of panoramic video is a key technology in virtual reality applications. As an emerging media, panoramic video has the characteristics of large field of view, high resolution and large data volume compared with traditional video. With the panoramic video, the observer's viewpoint is unchanged, and the observation direction can be changed to observe all the surrounding scenes, while the ordinary two-dimensional video only reflects a certain part of the panoramic video.
  • Cylindrical panoramic video is a common panoramic video that acts as a virtual camera that projects a three-dimensional object in space onto a cylinder.
  • the generation of cylindrical panoramic video can be acquired using a multi-camera or single camera acquisition system.
  • the field of view of the panoramic video is 5-6 times that of the ordinary video
  • the amount of data of the panoramic video is 5-6 times that of the ordinary video when the same visual quality is provided to the user.
  • the traditional video transmission scheme the use of panoramic video in a network environment becomes difficult.
  • block coding and transmission become a common scheme for panoramic video network transmission.
  • the transmission method of the cylindrical panoramic video mainly includes the following steps:
  • the panoramic image is segmented and the sequence of each image block is independently encoded.
  • the transmission medium can be the Internet, a wireless network, a local area network, an optical network, other suitable transmission medium, or a suitable combination of these transmission media.
  • the block sequences are independently decoded and projected to obtain the desired image.
  • the size of the block has an important influence on the coding efficiency of the panoramic video and the transmission area, and these two factors directly determine the amount of data to be transmitted. If the coding block size is small, the transmission area is small, but the coding efficiency is low; if the coding block size is large, the coding efficiency is high, but the transmission area is also large. Therefore, under the same visual quality, the amount of data to be transmitted is different for different coding block sizes.
  • the panoramic video has certain speciality with respect to the ordinary video, for example, the panoramic video has cyclicity, the picture has large distortion, and the like, a special encoding technique is needed to improve the compression efficiency of the panoramic video.
  • an interframe prediction method selects a block of the same size as a prediction block of the current image block in units of image blocks.
  • the picture has large distortion.
  • the size of the object will be enlarged or reduced with the movement, which affects the prediction performance and compression efficiency of the code.
  • the present invention provides a video encoding and decoding method and an interframe prediction method, apparatus and system thereof, which solve the problem of poor inter-frame prediction performance and poor compression efficiency in a video encoding and decoding process with severe lens distortion.
  • the present application provides an inter prediction method for video codec, including:
  • the predicted value of the current pixel is obtained according to the obtained motion vector of the current pixel.
  • the present application further provides an inter prediction apparatus for video codec, including:
  • An information acquiring module configured to acquire a motion vector of a current image block and related spatial location information of the current pixel
  • a calculating module configured to obtain a motion vector of the current pixel according to the motion vector of the current image block and the related spatial position information of the current pixel
  • a prediction module configured to obtain a predicted value of the current pixel according to the obtained motion vector of the current pixel.
  • the present application further provides a video encoding method, including:
  • the residual block is transformed, quantized, and entropy encoded to obtain an encoded code stream.
  • the present application further provides a video decoding method, including:
  • the predicted image block and the reconstructed residual block are added to obtain a reconstructed image block.
  • the present application further provides a video coding system, including:
  • Image block dividing means for dividing the current image into a plurality of image blocks
  • the inter prediction apparatus is configured to obtain a predicted image block of a current image block
  • a residual calculation device configured to subtract the current image block from the predicted image block to obtain a residual block
  • the code stream generating means is configured to transform, quantize and entropy encode the residual block to obtain an encoded code stream.
  • the present application further provides a video decoding system, including:
  • a residual block reconstruction apparatus configured to perform entropy decoding, inverse quantization, and inverse transform on the encoded code stream to obtain a reconstructed residual block
  • the inter prediction apparatus is configured to obtain a predicted image block of a current image block
  • An image block reconstruction device is configured to add the predicted image block and the reconstructed residual block to obtain a reconstructed image block.
  • 1 is a schematic diagram of a transmission method of cylindrical panoramic video
  • FIG. 2 is a schematic flowchart of a panoramic video encoding method according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart of inter prediction for a panoramic video codec according to an embodiment of the present application
  • FIG. 5 is a schematic block diagram of an inter prediction apparatus for a panoramic video codec according to an embodiment of the present application
  • FIG. 6 is a schematic flowchart of a method for decoding a panoramic video according to an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of a panoramic video coding system according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a panoramic video decoding system according to an embodiment of the present application.
  • the video encoding and decoding method and the interframe prediction method, apparatus, and system provided by the present application can be applied to a panoramic video codec, and can also be applied to a semi-panoramic or other sequence with a large lens distortion.
  • the codec in order to facilitate the description of the present application, the present application only describes the panoramic video codec as an example.
  • the inventive idea of the present application is that for a typical panoramic video encoding, the panoramic video is obtained by cylinder mapping, so that the pictures located at the top and bottom of the panoramic image are stretched laterally.
  • the width of the object in the image increases; conversely, the width of the object in the image decreases.
  • the magnitude of the stretching or reduction is related to the longitudinal coordinate and the longitudinal motion vector of the object in the image, so that the motion vector of each pixel in the image block can be calculated more accurately according to the data (correlated spatial position information), thereby improving Full The performance and compression efficiency of interframe prediction during video encoding and decoding.
  • this embodiment provides a panoramic video encoding method, including the following steps:
  • Step 1.1 Divide the current image into several image blocks. Specifically, the size of the sliced image block can be selected according to actual needs.
  • Step 1.2 The motion vector (MV x , MV y ) of the current image block is obtained by motion estimation.
  • the motion estimation adopts any feasible method in the prior art.
  • Step 1.3 Obtain a predicted image block by inter prediction.
  • the inter prediction method includes the following steps:
  • Step 2.1 Obtain the motion vector of the current image block and the relevant spatial position information of the current pixel.
  • Step 2.2 Obtain a motion vector of the current pixel according to the motion vector of the current image block and the related spatial position information of the current pixel.
  • the relevant spatial location information of the current pixel includes the size of the current image, the coordinates of the current image block within the current image, the size of the current image block, and the coordinates of the current pixel within the current image block.
  • FIG. 4 is a schematic diagram of the principle of the inter prediction method in this embodiment.
  • the current image is an image with a width of width and a height of height.
  • the motion vector of the current image block is defined as the motion vector of the pixel point at the center of the image block, denoted as (MV x , MV y ).
  • the motion vector of the current image block may also be tested with other definitions.
  • the current image block has a width w and a height h.
  • the coordinates of the current image block in the current image are defined as the coordinates (x, y) of the pixel in the upper left corner of the current image block. In the coordinate system, the upper left corner of the current image may be used as the origin, respectively, downward and rightward. The positive direction of the ordinate and abscissa.
  • the coordinate system used by the coordinates (i, j) of the current pixel in the current image block may be a coordinate system with the pixel in the upper left corner of the current image block as the origin, and the positive direction of the ordinate and the abscissa respectively downward and right. .
  • the relevant spatial location information of the current pixel may select other available information according to actual needs.
  • the present application is only described by taking the above information as an example. It should be understood that the present application is based on considering the relevant spatial position information of the current pixel, thereby overcoming the distortion characteristics of the panoramic image lens and when the object is in the picture.
  • the related problems caused by the enlargement/reduction phenomenon generated during exercise improve the accuracy of calculating the motion vector of the pixel, and improve the performance and compression efficiency of interframe prediction in the process of encoding and decoding the panoramic video.
  • the motion vector of the current pixel is obtained by the following function relationship in step 2.2:
  • f is a preset function
  • (x, y) is the coordinate of the pixel in the upper left corner of the current image block
  • (i, j) is the coordinate of the current pixel in the current image block
  • w and h are the current image respectively.
  • the width and height of the block, width and height are the width and height of the current image, respectively (MV x , MV y ) is the motion vector of the current image block.
  • MV' x and MV' y can be obtained by the following methods:
  • is the horizontal scaling factor, which can be approximated by the following formula:
  • Step 2.3 Obtain a predicted value of the current pixel according to the obtained motion vector of the current pixel, thereby obtaining a predicted image block of the current image block.
  • the sample value of the reference sample position is calculated by using an interpolation method, and the sample value is used as a predicted value of the current pixel.
  • the pixel value of the integer pixel position is taken as the sample value of the reference sample position, and the sample value is used as the predicted value of the current pixel.
  • the interpolation method uses adaptive interpolation.
  • Adaptive interpolation includes different interpolation filters, and the selection of the interpolation filter is determined by the coordinates (abscissa and ordinate) of the reference sample.
  • the selection method of the interpolation filter includes but is not limited to the following manners: the coordinates of the reference sample position are assumed to be (refX, refY), when refY is less than height/2, the horizontal interpolation uses a 4-tap filter, and the vertical interpolation uses an 8-tap filter; Otherwise, the horizontal interpolation uses an 8-tap filter and the vertical interpolation uses a 4-tap filter.
  • the interpolation uses 1/4 pixel precision.
  • the filter corresponding to the 1/4 pixel position is ⁇ -1, 4, -10, 57, 19, -7, 3, -1 ⁇ , 2/4 pixels.
  • the filter corresponding to the position is ⁇ -1, 4, -11, 40, 40, -11, 4, -1 ⁇
  • the filter corresponding to the 3/4 pixel position is ⁇ -1, 3, -7, 19, 57. , -10, 4, -1 ⁇ .
  • the filter corresponding to the 1/4 pixel position is ⁇ 2, -9, 57, 17, 4, 1 ⁇
  • the filter corresponding to the 2/4 pixel position is ⁇ 2, -9, 39, 39, -9, 2 ⁇
  • the filter corresponding to the 3/4 pixel position is ⁇ 1, -4, 17, 57, -9, 2 ⁇ .
  • the interpolation filter is selected in the above manner.
  • the selection of the interpolation filter can be freely designed according to actual needs.
  • Step 1.4 Subpixels of the same position of the predicted image block are subtracted from each pixel of the current image block to obtain a residual block.
  • Step 1.5 transforming and quantizing the residual block to obtain a quantized block; finally, each coefficient of the quantized block and the motion vector of the current image block are written into the encoded code stream by entropy coding.
  • an inter prediction method for panoramic video codec is provided according to the foregoing embodiment 1.
  • the embodiment further provides an inter prediction apparatus for panoramic video coding and decoding, including information acquisition.
  • the information obtaining module 101 is configured to acquire a motion vector of a current image block and related spatial position information of the current pixel.
  • the calculation module 102 is configured to obtain a motion vector of the current pixel according to the motion vector of the current image block and the related spatial position information of the current pixel.
  • the relevant spatial location information of the current pixel includes the size of the current image, the coordinates of the current image block within the current image, the size of the current image block, and the coordinates of the current pixel within the current image block.
  • FIG. 4 is a schematic diagram of the principle of the inter prediction apparatus in this embodiment.
  • the current image is an image with a width of width and a height of height.
  • the motion vector of the current image block is defined as the motion vector of the pixel point at the center of the image block, denoted as (MV x , MV y ).
  • the motion vector of the current image block may also be tested with other definitions.
  • the current image block has a width w and a height h.
  • the coordinates of the current image block in the current image are defined as the coordinates (x, y) of the pixel in the upper left corner of the current image block. In the coordinate system, the upper left corner of the current image may be used as the origin, respectively, downward and rightward. The positive direction of the ordinate and abscissa.
  • the coordinate system used by the coordinates (i, j) of the current pixel in the current image block may be a coordinate system with the pixel in the upper left corner of the current image block as the origin, and the positive direction of the ordinate and the abscissa respectively downward and right. .
  • the relevant spatial location information of the current pixel may select other available information according to actual needs.
  • the present application is only described by taking the above information as an example. It should be understood that the present application is based on considering the relevant spatial position information of the current pixel, thereby overcoming the distortion characteristics of the panoramic image lens and when the object is in the picture.
  • the related problems caused by the enlargement/reduction phenomenon generated during exercise improve the accuracy of calculating the motion vector of the pixel, and improve the performance and compression efficiency of interframe prediction in the process of encoding and decoding the panoramic video.
  • the calculation module 102 obtains the current image through the following functional relationship.
  • Prime motion vector :
  • f is a preset function
  • (x, y) is the coordinate of the pixel in the upper left corner of the current image block
  • (i, j) is the coordinate of the current pixel in the current image block
  • w and h are the current image respectively.
  • the width and height of the block, width and height are the width and height of the current image, respectively (MV x , MV y ) is the motion vector of the current image block.
  • MV' x and MV' y can be obtained by the following methods:
  • is the horizontal scaling factor, which can be approximated by the following formula:
  • the prediction module 103 is configured to obtain a predicted value of the current pixel according to the obtained motion vector of the current pixel, thereby obtaining a predicted image block of the current image block.
  • the prediction module 103 calculates the sample value of the reference sample position using the interpolation method, and uses the sample value as the current pixel.
  • the predicted value when the obtained reference sample position pointed by the motion vector of the current pixel is an integer pixel position, the prediction module 103 uses the pixel value of the integer pixel position as the sample value of the reference sample position, and uses the sample value as the current value. The predicted value of the pixel.
  • the interpolation method uses adaptive interpolation.
  • Adaptive interpolation includes different interpolation filters, and the selection of the interpolation filter is determined by the coordinates (abscissa and ordinate) of the reference sample.
  • the selection method of the interpolation filter includes but is not limited to the following manners: the coordinates of the reference sample position are assumed to be (refX, refY), when refY is less than height/2, the horizontal interpolation uses a 4-tap filter, and the vertical interpolation uses an 8-tap filter; Otherwise, the horizontal interpolation uses an 8-tap filter and the vertical interpolation uses a 4-tap filter.
  • the interpolation uses 1/4 pixel precision.
  • the filter corresponding to the 1/4 pixel position is ⁇ -1, 4, -10, 57, 19, -7, 3, -1 ⁇ , 2/4 pixels.
  • the filter corresponding to the position is ⁇ -1, 4, -11, 40, 40, -11, 4, -1 ⁇
  • the filter corresponding to the 3/4 pixel position is ⁇ -1, 3, -7, 19, 57. , -10, 4, -1 ⁇ .
  • the filter corresponding to the 1/4 pixel position is ⁇ 2, -9, 57, 17, 4, 1 ⁇
  • the filter corresponding to the 2/4 pixel position is ⁇ 2, -9, 39, 39, -9, 2 ⁇
  • the filter corresponding to the 3/4 pixel position is ⁇ 1, -4, 17, 57, -9, 2 ⁇ .
  • the interpolation filter is selected in the above manner.
  • the selection of the interpolation filter can be freely designed according to actual needs.
  • the embodiment provides a panoramic video decoding method, including the following steps:
  • Step 3.1 Entropy decoding, inverse quantization, and inverse transform on the encoded code stream to obtain a reconstructed residual block.
  • Step 3.2 Obtain a predicted image block by inter prediction.
  • the inter prediction method includes the following steps:
  • Step 2.1 Obtain the motion vector of the current image block and the relevant spatial position information of the current pixel.
  • Step 2.2 Obtain a motion vector of the current pixel according to the motion vector of the current image block and the related spatial position information of the current pixel. Specifically, the motion vector of the current image block can be obtained by motion estimation.
  • the relevant spatial location information of the current pixel includes the size of the current image, the coordinates of the current image block within the current image, the size of the current image block, and the coordinates of the current pixel within the current image block.
  • FIG. 4 is a schematic diagram of the principle of the inter prediction method in this embodiment.
  • the current image is an image with a width of width and a height of height.
  • the motion vector of the current image block is defined as the motion vector of the pixel point at the center of the image block, denoted as (MV x , MV y ).
  • the motion vector of the current image block may also be tested with other definitions.
  • the current image block has a width w and a height h.
  • the coordinates of the current image block in the current image are defined as the coordinates (x, y) of the pixel in the upper left corner of the current image block. In the coordinate system, the upper left corner of the current image may be used as the origin, respectively, downward and rightward. The positive direction of the ordinate and abscissa.
  • the coordinate system used by the coordinates (i, j) of the current pixel in the current image block may be a coordinate system with the pixel in the upper left corner of the current image block as the origin, and the positive direction of the ordinate and the abscissa respectively downward and right. .
  • the relevant spatial location information of the current pixel may select other available information according to actual needs.
  • the present application is only described by taking the above information as an example. It should be understood that the present application is based on considering the relevant spatial position information of the current pixel, thereby overcoming the distortion characteristics of the panoramic image lens and when the object is in the picture.
  • the related problems caused by the enlargement/reduction phenomenon generated during exercise improve the accuracy of calculating the motion vector of the pixel Authenticity, improve the performance and compression efficiency of inter-frame prediction in the video encoding and decoding process.
  • the motion vector of the current pixel is obtained by the following function relationship in step 3.2:
  • f is a preset function
  • (x, y) is the coordinate of the pixel in the upper left corner of the current image block
  • (i, j) is the coordinate of the current pixel in the current image block
  • w and h are the current image respectively.
  • the width and height of the block, width and height are the width and height of the current image, respectively (MV x , MV y ) is the motion vector of the current image block.
  • MV' x and MV' y can be obtained by the following methods:
  • is the horizontal scaling factor, which can be approximated by the following formula:
  • Step 2.3 Obtain a predicted value of the current pixel according to the obtained motion vector of the current pixel, thereby obtaining a predicted image block of the current image block.
  • the sample value of the reference sample position is calculated by using an interpolation method, and the sample value is used as a predicted value of the current pixel.
  • the pixel value of the integer pixel position is taken as the sample value of the reference sample position, and the sample value is used as the predicted value of the current pixel.
  • the interpolation method uses adaptive interpolation.
  • Adaptive interpolation includes different interpolation filters, and the selection of the interpolation filter is determined by the coordinates (abscissa and ordinate) of the reference sample.
  • the selection method of the interpolation filter includes but is not limited to the following manners: the coordinates of the reference sample position are assumed to be (refX, refY), when refY is less than height/2, the horizontal interpolation uses a 4-tap filter, and the vertical interpolation uses an 8-tap filter; Otherwise, the horizontal interpolation uses an 8-tap filter and the vertical interpolation uses a 4-tap filter.
  • the interpolation uses 1/4 pixel precision.
  • the filter corresponding to the 1/4 pixel position is ⁇ -1, 4, -10, 57, 19, -7, 3, -1 ⁇ , 2/4 pixels.
  • the filter corresponding to the position is ⁇ -1, 4, -11, 40, 40, -11, 4, -1 ⁇
  • the filter corresponding to the 3/4 pixel position is ⁇ -1, 3, -7, 19, 57. , -10, 4, -1 ⁇ .
  • the filter corresponding to the 1/4 pixel position is ⁇ 2, -9, 57, 17, 4, 1 ⁇
  • the filter corresponding to the 2/4 pixel position is ⁇ 2, -9, 39, 39, -9, 2 ⁇
  • the filter corresponding to the 3/4 pixel position is ⁇ 1, -4, 17, 57, -9, 2 ⁇ .
  • part of the panorama is taken by a fisheye camera, it is positive through cylinder mapping.
  • the vertical resolution of the part is higher and the horizontal resolution is lower.
  • the vertical resolution of the lower part of the picture is lower and the horizontal resolution is higher.
  • the part with low resolution is originally interpolated. This part of the picture is relatively smooth in the horizontal (or vertical) direction, so the interpolation does not require so many taps, which is the same as the traditional unified use.
  • the interpolation method of the filter can reduce the amount of calculation. Therefore, in the present embodiment, the interpolation filter is selected in the above manner. Of course, in other embodiments, the selection of the interpolation filter can be freely designed according to actual needs.
  • Step 3.3 Perform motion compensation, and add the pixel values of the same position of the predicted image block and the reconstructed residual block to obtain a reconstructed image block.
  • the reconstructed image block is the decoded image block.
  • the present embodiment provides a panoramic video encoding system, including an image block dividing device 201, an inter prediction device 202, a residual computing device 203, and Code stream generating means 204.
  • the image block dividing means 201 is for dividing the current image into a plurality of image blocks.
  • the inter prediction device 202 is configured to obtain a predicted image block of the current image block. Furthermore, in the present embodiment, the inter prediction apparatus 202 employs the inter prediction apparatus provided in the second embodiment.
  • the residual calculation means 203 is for subtracting the current image block from the predicted image block to obtain a residual block.
  • the code stream generating means 204 is configured to transform, quantize and entropy encode the residual block to obtain an encoded code stream.
  • the embodiment provides a panoramic video decoding system, including a residual block reconstruction device 301, an inter prediction device 302, and an image block reconstruction device 303. .
  • the residual block reconstruction means 301 is configured to perform entropy decoding, inverse quantization and inverse transform on the encoded code stream to obtain a reconstructed residual block.
  • the inter prediction device 302 is configured to obtain a predicted image block of the current image block. Furthermore, in the present embodiment, the inter prediction apparatus 302 employs the inter prediction apparatus provided in the second embodiment.
  • the image block reconstruction means 303 is for adding the predicted image block and the reconstructed residual block to obtain a reconstructed image block.
  • the video processing device may include an encoding device and/or a decoding device, the encoding device including an encoding process and a decoding process,
  • the code device includes a decoding process.
  • the decoding process of the decoding device is the same as the decoding process of the encoding device.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: a read only memory. Random access memory, disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种视频编码、解码方法及其帧间预测方法、装置和系统,其中,帧间预测方法包括:获取当前图像块的运动矢量以及当前像素的相关空间位置信息;根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量;根据得到的当前像素的运动矢量得到当前像素的预测值。因此,在进行帧间预测时,不仅仅考虑当前图像块的运动矢量,还考虑当前像素的相关空间位置信息,可以适应不同图像镜头畸变的特性以及当物体在画面中运动时产生的放大/缩小现象,从而提高计算像素的运动矢量时的准确性,提升视频编解码过程中帧间预测的性能和压缩效率。

Description

视频编码、解码方法及其帧间预测方法、装置和系统 技术领域
本申请涉及数字视频编解码技术领域,具体涉及一种视频编码、解码方法及其帧间预测方法、装置和系统。
背景技术
目前,虚拟现实技术和相关应用正在快速发展。在虚拟现实技术中,全景图像和全景视频是一个重要的组成部分。由于全景视频记录了360度视角的全部画面,具有极高的数据量,因此全景视频的压缩是虚拟现实应用中的一个关键技术。全景视频作为一种新兴的媒体,和传统的视频相比,具有视野大,分辨率高,数据量大等特点。利用全景视频,观察者视点不变,改变观察方向能够观察到周围的全部场景,而普通的二维视频只反应了全景视频的某个局部。
柱面全景视频是一种常见的全景视频,它相当于一个虚拟的摄像机,把空间中的三维物体投影到柱面上。柱面全景视频的生成可以利用多摄像头或者单摄像头采集系统采集而成。
由于全景视频的视野范围是普通视频的5~6倍,在给用户提供相同的视觉质量的情况下,全景视频的数据量是普通视频的5~6倍。如果按照传统的视频传输方案,全景视频在网络环境下的使用变得困难重重。但是,由于在同一时刻,用户所需要看到的内容只是全景视频的某一部分,所以分块编码与传输成为了全景视频网络传输的常见方案。
请参考图1,柱面全景视频的传输方法主要包括下面步骤:
对全景图像进行分块,并对每个图像块的序列独立进行编码。
之后选择所需要的编码后的数据进行传输。在此可以根据用户当前的视角选择数据。传输媒介可以是因特网、无线网络、局域网、光学网络、其它合适的传输媒介、或者这些传输媒介的适当组合。
最后解码端接收到数据之后,对这些块序列进行独立的解码和投影变换,得到所需图像。
在全景视频的分块编码中,分块的尺寸对于全景视频的编码效率以及传输区域有着重要的影响,而这两项因素直接决定着需要传输的数据量。如果编码块尺寸小,则传输区域较小,但是编码效率会较低;如果编码块尺寸大,则编码效率较高,但是传输区域也较大。所以在相同的视觉质量下,不同的编码块尺寸,需要传输的数据量是不一样的。
另外,由于全景视频相对普通视频具有一定的特殊性,例如全景视频具有循环性,画面存在较大畸变等,需要使用一个特殊的编码技术以提高全景视频的压缩效率。
传统的视频编解码标准中,采用帧间预测的方法,帧间预测方法以图像块为单位,在参考图像上选取一个相同大小的块作为当前图像块的预测块。而在全景视频中,画面存在较大的畸变,当物体在画面中运动时,物体的大小会伴随着运动出现放大或缩小的现象,从而影响编码的预测性能以及压缩效率。
发明内容
本申请提供一种视频编码、解码方法及其帧间预测方法、装置和系统,解决了部分镜头畸变严重的视频编解码过程中帧间预测性能差、压缩效率差的问题。
根据本申请的第一方面,本申请提供了一种用于视频编解码的帧间预测方法,包括:
获取当前图像块的运动矢量以及当前像素的相关空间位置信息;
根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量;
根据得到的当前像素的运动矢量得到当前像素的预测值。
根据本申请的第二方面,本申请还提供了一种用于视频编解码的帧间预测装置,包括:
信息获取模块,用于获取当前图像块的运动矢量以及当前像素的相关空间位置信息;
计算模块,用于根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量;
预测模块,用于根据得到的当前像素的运动矢量得到当前像素的预测值。
根据本申请的第三方面,本申请还提供了一种视频编码方法,包括:
将当前图像划分为若干图像块;
采用上述帧间预测方法得到当前图像块的预测图像块;
将当前图像块与预测图像块相减,得到残差块;
对残差块进行变换、量化和熵编码,以得到编码码流。
根据本申请的第四方面,本申请还提供了一种视频解码方法,包括:
对编码码流进行熵解码、反量化和反变换,以得到重建的残差块;
采用上述帧间预测方法得到当前图像块的预测图像块;
将预测图像块和重建的残差块相加,得到重建的图像块。
根据本申请的第五方面,本申请还提供了一种视频编码系统,包括:
图像块划分装置,用于将当前图像划分为若干图像块;
上述帧间预测装置,用于得到当前图像块的预测图像块;
残差计算装置,用于将当前图像块与预测图像块相减,得到残差块;
码流生成装置,用于对残差块进行变换、量化和熵编码,以得到编码码流。
根据本申请的第六方面,本申请还提供了一种视频解码系统,包括:
残差块重建装置,用于对编码码流进行熵解码、反量化和反变换,以得到重建的残差块;
上述帧间预测装置,用于得到当前图像块的预测图像块;
图像块重建装置,用于将预测图像块和重建的残差块相加,得到重建的图像块。
本申请提供的视频编码、解码方法及其帧间预测方法、装置和系统中,在进行帧间预测时,不仅仅考虑当前图像块的运动矢量,还考虑当前像素的相关空间位置信息,可以适应不同图像镜头畸变的特性以及当物体在画面中运动时产生的放大/缩小现象,从而提高计算像素的运动矢量时的准确性,提升视频编解码过程中帧间预测的性能和压缩效率。
附图说明
图1为柱面全景视频的传输方法示意图;
图2为本申请一种实施例中全景视频编码方法的流程示意图;
图3为本申请一种实施例中用于全景视频编解码的帧间预测的流程示意图;
图4为本申请一种实施例中帧间预测的原理示意图;
图5为本申请一种实施例中用于全景视频编解码的帧间预测装置的模块示意图;
图6为本申请一种实施例中全景视频解码方法的流程示意图;
图7为本申请一种实施例中全景视频编码系统的结构示意图;
图8为本申请一种实施例中全景视频解码系统的结构示意图。
具体实施方式
首先需要说明的是,本申请提供的视频编码、解码方法及其帧间预测方法、装置和系统,可以应用在全景视频编解码中,也可以应用在半全景或其他镜头畸变较大的序列的编解码中,为了便于对本申请进行说明,本申请仅以全景视频编解码为例进行说明。
本申请的发明构思在于:针对典型的全景视频编码,全景视屏通过柱面映射得到,因此位于全景图像顶部和底部的画面会被横向拉伸。当物体从图像中部向顶部或底部运动时,物体在图像中的宽度会增加;反之,物体在图像中的宽度会减小。同时,拉伸或缩小的幅度同物体在图像中的纵向坐标和纵向运动矢量相关,因此可以跟据这些数据(相关空间位置信息)更精确地计算图像块中每个像素的运动矢量,从而提升全 景视频编解码过程中帧间预测的性能和压缩效率。
下面通过具体实施方式结合附图对本申请作进一步详细说明。
实施例一
请参考图2,本实施例提供了一种全景视频编码方法,包括下面步骤:
步骤1.1:将当前图像划分为若干图像块。具体的,切分的图像块的大小可以根据实际需求选择。
步骤1.2:通过运动估计得到当前图像块的运动矢量(MVx,MVy)。
具体的,运动估计采用现有技术中任意一种可行的方法。
步骤1.3:通过帧间预测得到预测图像块。
请参考图3,本实施例中,帧间预测方法包括下面步骤:
步骤2.1:获取当前图像块的运动矢量以及当前像素的相关空间位置信息。
步骤2.2:根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量。
本实施例中,当前像素的相关空间位置信息包括当前图像的尺寸大小,当前图像块在当前图像内的坐标,当前图像块的尺寸大小,以及当前像素在当前图像块内的坐标。
具体的,请参考图4,为本实施例中帧间预测方法的原理示意图。
当前图像的宽为width,高为height的图像。当前图像块的运动矢量定义为图像块中心像素点的运动矢量,记为(MVx,MVy),当然,在其他实施例中,当前图像块的运动矢量也可以采用其他定义试。另外,当前图像块的宽为w,高为h。当前图像块在当前图像内的坐标定义为当前图像块左上角像素在当前图像中的坐标(x,y),在该坐标系中,可以以当前图像左上角为原点,向下和向右分别为纵坐标和横坐标的正方向。当前像素在当前图像块内的坐标(i,j)所采用的坐标系可以为一个以当前图像块左上角像素为原点,向下和向右分别为纵坐标和横坐标的正方向的坐标系。
在其他实施例中,当前像素的相关空间位置信息可以根据实际需求选择其他可用的信息。本实施例中仅以上述信息为例来对本申请进行说明,应当理解,本申请正是由于考虑了当前像素的相关空间位置信息,从而克服了由全景图像镜头畸变的特性以及当物体在画面中运动时产生的放大/缩小现象所引起的相关问题,提高了计算像素的运动矢量时的准确性,提升全景视频编解码过程中帧间预测的性能和压缩效率。
所以,在本实施例中,步骤2.2中通过下面函数关系得到当前像素的运动矢量:
(MV′x,MV′y)=f(x,y,w,h,i,j,MVx,MVy,width,height)
其中,f为预设函数,(x,y)为当前图像块左上角像素在当前图像内的坐标,(i,j)为当前像素在当前图像块内的坐标,w、h分别为当前图像块的宽和高,width、height分别为当前图像的宽和高,(MVx,MVy)为当前图像块的运动矢量。
具体的,MV′x、MV′y可以分别通过下面方法得到:
Figure PCTCN2016071341-appb-000001
其中,α为横向缩放因子,可近似由以下公式求得:
Figure PCTCN2016071341-appb-000002
步骤2.3:根据得到的当前像素的运动矢量得到当前像素的预测值,进而得到当前图像块的预测图像块。
本实施例中,优选的,当得到的当前像素的运动矢量指向的参考样本位置不是整像素位置时,则使用插值法计算参考样本位置的样本值,并将该样本值作为当前像素的预测值;当得到的当前像素的运动矢量指向的参考样本位置是整像素位置时,则将该整像素位置的像素值作为参考样本位置的样本值,并将该样本值作为当前像素的预测值。
具体的,插值法采用自适应插值法。自适应插值包括了不同的插值滤波器,插值滤波器的选取由参考样本的坐标(横坐标和纵坐标)确定。插值滤波器的选取方式包括但不限于以下方式:假定参考样本位置的坐标为(refX,refY),当refY小于height/2时,横向插值使用4抽头滤波器,纵向插值使用8抽头滤波器;否则,横向插值使用8抽头滤波器,纵向插值使用4抽头滤波器。插值使用1/4像素精度,对8抽头滤波器,1/4像素位置对应的滤波器为{-1,4,-10,57,19,-7,3,-1},2/4像素位置对应的滤波器为{-1,4,-11,40,40,-11,4,-1},3/4像素位置对应的滤波器为{-1,3,-7,19,57,-10,4,-1}。对4抽头滤波器,1/4像素位置对应的滤波器为{2,-9,57,17,-4,1},2/4像素位置对应的滤波器为{2,-9,39,39,-9,2},3/4像素位置对应的滤波器为{1,-4,17,57,-9,2}。
由于一部分全景图是通过鱼眼相机拍摄得到,通过柱面映射得到正常的全景图时,画面上部分纵向分辨率较高,横向分辨率较低,反之,画面下部分纵向分辨率较低,横向分辨率较高。在柱面映射的时候,分辨率低的部分本来就是插值得到的,这部分画面在横向(或纵向)上就相 对平滑,因此插值不需要那么多的抽头数,相比于传统的统一使用相同滤波器的插值方法,可以减少运算量。所以,本实施例中采用上述方式选择插值滤波器。当然,在其他实施例中,插值滤波器的选择可以根据实际需求自由设计。
步骤1.4:当前图像块每个像素减去预测图像块相同位置的像素,得到残差块。
步骤1.5:对残差块进行变换、量化得到量化块;最后通过熵编码将量化块的每个系数以及当前图像块的运动矢量写入编码码流。
实施例二
请参考图5,基于上述实施例一提供的一种用于全景视频编解码的帧间预测方法,本实施例还相应提供了一种用于全景视频编解码的帧间预测装置,包括信息获取模块101、计算模块102和预测模块103。
信息获取模块101用于获取当前图像块的运动矢量以及当前像素的相关空间位置信息。
计算模块102用于根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量。
本实施例中,当前像素的相关空间位置信息包括当前图像的尺寸大小,当前图像块在当前图像内的坐标,当前图像块的尺寸大小,以及当前像素在当前图像块内的坐标。
具体的,请参考图4,为本实施例中帧间预测装置的原理示意图。
当前图像的宽为width,高为height的图像。当前图像块的运动矢量定义为图像块中心像素点的运动矢量,记为(MVx,MVy),当然,在其他实施例中,当前图像块的运动矢量也可以采用其他定义试。另外,当前图像块的宽为w,高为h。当前图像块在当前图像内的坐标定义为当前图像块左上角像素在当前图像中的坐标(x,y),在该坐标系中,可以以当前图像左上角为原点,向下和向右分别为纵坐标和横坐标的正方向。当前像素在当前图像块内的坐标(i,j)所采用的坐标系可以为一个以当前图像块左上角像素为原点,向下和向右分别为纵坐标和横坐标的正方向的坐标系。
在其他实施例中,当前像素的相关空间位置信息可以根据实际需求选择其他可用的信息。本实施例中仅以上述信息为例来对本申请进行说明,应当理解,本申请正是由于考虑了当前像素的相关空间位置信息,从而克服了由全景图像镜头畸变的特性以及当物体在画面中运动时产生的放大/缩小现象所引起的相关问题,提高了计算像素的运动矢量时的准确性,提升全景视频编解码过程中帧间预测的性能和压缩效率。
所以,在本实施例中,计算模块102通过下面函数关系得到当前像 素的运动矢量:
(MV′x,MV′y)=f(x,y,w,h,i,j,MVx,MVy,width,height)
其中,f为预设函数,(x,y)为当前图像块左上角像素在当前图像内的坐标,(i,j)为当前像素在当前图像块内的坐标,w、h分别为当前图像块的宽和高,width、height分别为当前图像的宽和高,(MVx,MVy)为当前图像块的运动矢量。
具体的,MV′x、MV′y可以分别通过下面方法得到:
Figure PCTCN2016071341-appb-000003
其中,α为横向缩放因子,可近似由以下公式求得:
Figure PCTCN2016071341-appb-000004
预测模块103用于根据得到的当前像素的运动矢量得到当前像素的预测值,进而得到当前图像块的预测图像块。
本实施例中,优选的,当得到的当前像素的运动矢量指向的参考样本位置不是整像素位置时,则预测模块103使用插值法计算参考样本位置的样本值,并将该样本值作为当前像素的预测值;当得到的当前像素的运动矢量指向的参考样本位置是整像素位置时,则预测模块103将该整像素位置的像素值作为参考样本位置的样本值,并将该样本值作为当前像素的预测值。
具体的,插值法采用自适应插值法。自适应插值包括了不同的插值滤波器,插值滤波器的选取由参考样本的坐标(横坐标和纵坐标)确定。插值滤波器的选取方式包括但不限于以下方式:假定参考样本位置的坐标为(refX,refY),当refY小于height/2时,横向插值使用4抽头滤波器,纵向插值使用8抽头滤波器;否则,横向插值使用8抽头滤波器,纵向插值使用4抽头滤波器。插值使用1/4像素精度,对8抽头滤波器,1/4像素位置对应的滤波器为{-1,4,-10,57,19,-7,3,-1},2/4像素位置对应的滤波器为{-1,4,-11,40,40,-11,4,-1},3/4像素位置对应的滤波器为{-1,3,-7,19,57,-10,4,-1}。对4抽头滤波器,1/4像素位置对应的滤波器为{2,-9,57,17,-4,1},2/4像素位置对应的滤波器为{2,-9,39,39,-9,2},3/4像素位置对应的滤波器为{1,-4,17,57,-9,2}。
由于一部分全景图是通过鱼眼相机拍摄得到,通过柱面映射得到正常的全景图时,画面上部分纵向分辨率较高,横向分辨率较低,反之, 画面下部分纵向分辨率较低,横向分辨率较高。在柱面映射的时候,分辨率低的部分本来就是插值得到的,这部分画面在横向(或纵向)上就相对平滑,因此插值不需要那么多的抽头数,相比于传统的统一使用相同滤波器的插值方法,可以减少运算量。所以,本实施例中采用上述方式选择插值滤波器。当然,在其他实施例中,插值滤波器的选择可以根据实际需求自由设计。
实施例三
请参考图6,本实施例提供了一种全景视频解码方法,包括下面步骤:
步骤3.1:对编码码流进行熵解码、反量化和反变换,以得到重建的残差块。
步骤3.2:通过帧间预测得到预测图像块。
请参考图3,本实施例中,帧间预测方法包括下面步骤:
步骤2.1:获取当前图像块的运动矢量以及当前像素的相关空间位置信息。
步骤2.2:根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量。具体的,当前图像块的运动矢量可以通过运动估计得到。
本实施例中,当前像素的相关空间位置信息包括当前图像的尺寸大小,当前图像块在当前图像内的坐标,当前图像块的尺寸大小,以及当前像素在当前图像块内的坐标。
具体的,请参考图4,为本实施例中帧间预测方法的原理示意图。
当前图像的宽为width,高为height的图像。当前图像块的运动矢量定义为图像块中心像素点的运动矢量,记为(MVx,MVy),当然,在其他实施例中,当前图像块的运动矢量也可以采用其他定义试。另外,当前图像块的宽为w,高为h。当前图像块在当前图像内的坐标定义为当前图像块左上角像素在当前图像中的坐标(x,y),在该坐标系中,可以以当前图像左上角为原点,向下和向右分别为纵坐标和横坐标的正方向。当前像素在当前图像块内的坐标(i,j)所采用的坐标系可以为一个以当前图像块左上角像素为原点,向下和向右分别为纵坐标和横坐标的正方向的坐标系。
在其他实施例中,当前像素的相关空间位置信息可以根据实际需求选择其他可用的信息。本实施例中仅以上述信息为例来对本申请进行说明,应当理解,本申请正是由于考虑了当前像素的相关空间位置信息,从而克服了由全景图像镜头畸变的特性以及当物体在画面中运动时产生的放大/缩小现象所引起的相关问题,提高了计算像素的运动矢量时的准 确性,提升全景视频编解码过程中帧间预测的性能和压缩效率。
所以,在本实施例中,步骤3.2中通过下面函数关系得到当前像素的运动矢量:
(MV′x,MV′y)=f(x,y,w,h,i,j,MVx,MVy,width,height)
其中,f为预设函数,(x,y)为当前图像块左上角像素在当前图像内的坐标,(i,j)为当前像素在当前图像块内的坐标,w、h分别为当前图像块的宽和高,width、height分别为当前图像的宽和高,(MVx,MVy)为当前图像块的运动矢量。
具体的,MV′x、MV′y可以分别通过下面方法得到:
Figure PCTCN2016071341-appb-000005
其中,α为横向缩放因子,可近似由以下公式求得:
Figure PCTCN2016071341-appb-000006
步骤2.3:根据得到的当前像素的运动矢量得到当前像素的预测值,进而得到当前图像块的预测图像块。
本实施例中,优选的,当得到的当前像素的运动矢量指向的参考样本位置不是整像素位置时,则使用插值法计算参考样本位置的样本值,并将该样本值作为当前像素的预测值;当得到的当前像素的运动矢量指向的参考样本位置是整像素位置时,则将该整像素位置的像素值作为参考样本位置的样本值,并将该样本值作为当前像素的预测值。
具体的,插值法采用自适应插值法。自适应插值包括了不同的插值滤波器,插值滤波器的选取由参考样本的坐标(横坐标和纵坐标)确定。插值滤波器的选取方式包括但不限于以下方式:假定参考样本位置的坐标为(refX,refY),当refY小于height/2时,横向插值使用4抽头滤波器,纵向插值使用8抽头滤波器;否则,横向插值使用8抽头滤波器,纵向插值使用4抽头滤波器。插值使用1/4像素精度,对8抽头滤波器,1/4像素位置对应的滤波器为{-1,4,-10,57,19,-7,3,-1},2/4像素位置对应的滤波器为{-1,4,-11,40,40,-11,4,-1},3/4像素位置对应的滤波器为{-1,3,-7,19,57,-10,4,-1}。对4抽头滤波器,1/4像素位置对应的滤波器为{2,-9,57,17,-4,1},2/4像素位置对应的滤波器为{2,-9,39,39,-9,2},3/4像素位置对应的滤波器为{1,-4,17,57,-9,2}。
由于一部分全景图是通过鱼眼相机拍摄得到,通过柱面映射得到正 常的全景图时,画面上部分纵向分辨率较高,横向分辨率较低,反之,画面下部分纵向分辨率较低,横向分辨率较高。在柱面映射的时候,分辨率低的部分本来就是插值得到的,这部分画面在横向(或纵向)上就相对平滑,因此插值不需要那么多的抽头数,相比于传统的统一使用相同滤波器的插值方法,可以减少运算量。所以,本实施例中采用上述方式选择插值滤波器。当然,在其他实施例中,插值滤波器的选择可以根据实际需求自由设计。
步骤3.3:进行运动补偿,将预测图像块和重建的残差块相同位置的像素值相加,得到重建图像块。重建图像块即为解码得到的图像块。
实施例四
请参考图7,对应于上述实施例一提供的全景视频编码方法,本实施例相应提供了一种全景视频编码系统,包括图像块划分装置201、帧间预测装置202、残差计算装置203和码流生成装置204。
图像块划分装置201用于将当前图像划分为若干图像块。
帧间预测装置202用于得到当前图像块的预测图像块。并且,本实施例中,帧间预测装置202采用上述实施例二提供的帧间预测装置。
残差计算装置203用于将当前图像块与预测图像块相减,得到残差块。
码流生成装置204用于对残差块进行变换、量化和熵编码,以得到编码码流。
实施例五
请参考图8,对应于上述实施例三提供的全景视频解码方法,本实施例相应提供了一种全景视频解码系统,包括残差块重建装置301、帧间预测装置302和图像块重建装置303。
残差块重建装置301用于对编码码流进行熵解码、反量化和反变换,以得到重建的残差块。
帧间预测装置302用于得到当前图像块的预测图像块。并且,本实施例中,帧间预测装置302采用上述实施例二提供的帧间预测装置。
图像块重建装置303用于将预测图像块和重建的残差块相加,得到重建的图像块。
需要说明的是,本申请实施例中,仅对全景视频编解码过程中的帧间预测方法进行详细说明,对于全景视频编解码过程中的其他步骤,皆可以采用现有技术中的任意一种可行方法。另外,通常,视频处理装置可包括编码装置和/或解码装置,编码装置包括编码过程和解码过程,解 码装置包括解码过程。解码装置的解码过程与编码装置的解码过程相同。
本领域技术人员可以理解,上述实施方式中各种方法的全部或部分步骤可以通过程序来控制相关硬件完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器、随机存取存储器、磁盘或光盘等。
以上内容是结合具体的实施方式对本申请所作的进一步详细说明,不能认定本申请的具体实施只局限于这些说明。对于本申请所属技术领域的普通技术人员来说,在不脱离本申请发明构思的前提下,还可以做出若干简单推演或替换。

Claims (14)

  1. 一种用于视频编解码的帧间预测方法,其特征在于,包括:
    获取当前图像块的运动矢量以及当前像素的相关空间位置信息;
    根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量;
    根据得到的当前像素的运动矢量得到当前像素的预测值。
  2. 如权利要求1所述的方法,其特征在于,所述当前像素的相关空间位置信息包括当前图像的尺寸大小,当前图像块在当前图像内的坐标,当前图像块的尺寸大小,以及当前像素在当前图像块内的坐标。
  3. 如权利要求2所述的方法,其特征在于,所述根据当前图像块的运动矢量以及当前像素的空间位置信息得到当前像素的运动矢量的步骤,包括:
    通过下面函数关系得到当前像素的运动矢量:
    (MV′x,MV′y)=f(x,y,w,h,i,j,MVx,MVy,width,height)
    其中,f为预设函数,(x,y)为当前图像块左上角像素在当前图像内的坐标,(i,j)为当前像素在当前图像块内的坐标,w、h分别为当前图像块的宽和高,width、height分别为当前图像的宽和高,(MVx,MVy)为当前图像块的运动矢量。
  4. 如权利要求1-3任一项所述的方法,其特征在于,所述根据得到的当前像素的运动矢量得到当前像素的预测值的步骤,包括:当得到的当前像素的运动矢量指向的参考样本位置不是整像素位置时,则使用插值法计算参考样本位置的样本值,并将该样本值作为当前像素的预测值;当得到的当前像素的运动矢量指向的参考样本位置是整像素位置时,则将该整像素位置的像素值作为参考样本位置的样本值,并将该样本值作为当前像素的预测值。
  5. 如权利要求4所述的方法,其特征在于,所述插值法采用自适应插值法,所述自适应插值法所采用的插值滤波器根据所述参考样本的坐标确定。
  6. 一种用于视频编解码的帧间预测装置,其特征在于,包括:
    信息获取模块,用于获取当前图像块的运动矢量以及当前像素的相关空间位置信息;
    计算模块,用于根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量;
    预测模块,用于根据得到的当前像素的运动矢量得到当前像素的预测值。
  7. 如权利要求6所述的装置,其特征在于,所述当前像素的相关空间位置信息包括当前图像的尺寸大小,当前图像块在当前图像内的坐标,当前图像块的尺寸大小,以及当前像素在当前图像块内的坐标。
  8. 如权利要求7所述的装置,其特征在于,计算模块用于根据当前图像块的运动矢量以及当前像素的相关空间位置信息得到当前像素的运动矢量时:
    计算模块用于通过下面函数关系得到当前像素的运动矢量:
    (MV′x,MV′y)=f(x,y,w,h,i,j,MVx,MVy,width,height)
    其中,f为预设函数,(x,y)为当前图像块左上角像素在当前图像内的坐标,(i,j)为当前像素在当前图像块内的坐标,w、h分别为当前图像块的宽和高,width、height分别为当前图像的宽和高,(MVx,MVy)为当前图像块的运动矢量。
  9. 如权利要求6-8任一项所述的装置,其特征在于,预测模块用于根据得到的当前像素的运动矢量得到当前像素的预测值时:当得到的当前像素的运动矢量指向的参考样本位置不是整像素位置时,则预测模块用于使用插值法计算参考样本位置的样本值,并将该样本值作为当前像素的预测值;当得到的当前像素的运动矢量指向的参考样本位置是整像素位置时,则预测模块用于将该整像素位置的像素值作为参考样本位置的样本值,并将该样本值作为当前像素的预测值。
  10. 如权利要求9所述的装置,其特征在于,所述插值法采用自适应插值法,所述自适应插值法所采用的插值滤波器根据所述参考样本的坐标确定。
  11. 一种视频编码方法,其特征在于,包括:
    将当前图像划分为若干图像块;
    采用如权利要求1-5任一项所述的帧间预测方法得到当前图像块的预测图像块;
    将当前图像块与预测图像块相减,得到残差块;
    对残差块进行变换、量化和熵编码,以得到编码码流。
  12. 一种视频解码方法,其特征在于,包括:
    对编码码流进行熵解码、反量化和反变换,以得到重建的残差块;
    采用如权利要求1-5任一项所述的帧间预测方法得到当前图像块的预测图像块;
    将预测图像块和重建的残差块相加,得到重建的图像块。
  13. 一种视频编码系统,其特征在于,包括:
    图像块划分装置,用于将当前图像划分为若干图像块;
    如权利要求6-10任意一项所述的帧间预测装置,用于得到当前图像块的预测图像块;
    残差计算装置,用于将当前图像块与预测图像块相减,得到残差块;
    码流生成装置,用于对残差块进行变换、量化和熵编码,以得到编码码流。
  14. 一种视频解码系统,其特征在于,包括:
    残差块重建装置,用于对编码码流进行熵解码、反量化和反变换,以得到重建的残差块;
    如权利要求6-10任意一项所述的帧间预测装置,用于得到当前图像块的预测图像块;
    图像块重建装置,用于将预测图像块和重建的残差块相加,得到重建的图像块。
PCT/CN2016/071341 2016-01-19 2016-01-19 视频编码、解码方法及其帧间预测方法、装置和系统 WO2017124298A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/071341 WO2017124298A1 (zh) 2016-01-19 2016-01-19 视频编码、解码方法及其帧间预测方法、装置和系统
US15/746,932 US10425656B2 (en) 2016-01-19 2016-01-19 Method of inter-frame prediction for video encoding and decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/071341 WO2017124298A1 (zh) 2016-01-19 2016-01-19 视频编码、解码方法及其帧间预测方法、装置和系统

Publications (1)

Publication Number Publication Date
WO2017124298A1 true WO2017124298A1 (zh) 2017-07-27

Family

ID=59361090

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/071341 WO2017124298A1 (zh) 2016-01-19 2016-01-19 视频编码、解码方法及其帧间预测方法、装置和系统

Country Status (2)

Country Link
US (1) US10425656B2 (zh)
WO (1) WO2017124298A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10728551B2 (en) * 2017-11-09 2020-07-28 Gopro, Inc. Methods and apparatus for block-based layout for non-rectangular regions between non-contiguous imaging regions
CN110234013B (zh) * 2019-06-20 2022-04-26 电子科技大学 一种帧级运动矢量精度比特分配的优化方法
WO2020257484A1 (en) * 2019-06-21 2020-12-24 Vid Scale, Inc. Precision refinement for motion compensation with optical flow
CN113302929A (zh) * 2019-06-24 2021-08-24 华为技术有限公司 几何分割模式的样本距离计算
CN113365074B (zh) * 2021-06-07 2022-11-08 同济大学 限制点预测常现位置及其点矢量数目的编解码方法及装置
CN117499664B (zh) * 2023-12-29 2024-03-19 南京博润类脑智能技术有限公司 一种基于比特替换的图像数据嵌入和提取方法、装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101415122A (zh) * 2007-10-15 2009-04-22 华为技术有限公司 一种帧间预测编解码方法及装置
CN101600112A (zh) * 2009-07-09 2009-12-09 杭州士兰微电子股份有限公司 分像素运动估计装置和方法
CN101771867A (zh) * 2008-12-29 2010-07-07 富士通株式会社 缩小尺寸解码方法和系统
CN101820547A (zh) * 2009-02-27 2010-09-01 源见科技(苏州)有限公司 帧间模式选择方法
CN103108181A (zh) * 2007-03-23 2013-05-15 三星电子株式会社 用于图像编码和图像解码的方法和设备
CN103563370A (zh) * 2011-05-27 2014-02-05 思科技术公司 用于图像运动预测的方法、装置及计算机程序产品
WO2015106126A1 (en) * 2014-01-09 2015-07-16 Qualcomm Incorporated Adaptive motion vector resolution signaling for video coding
CN105681805A (zh) * 2016-01-19 2016-06-15 北京大学深圳研究生院 视频编码、解码方法及其帧间预测方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1279293A1 (en) * 2000-01-21 2003-01-29 Nokia Corporation A motion estimation method and a system for a video coder
US6996180B2 (en) * 2001-09-05 2006-02-07 Intel Corporation Fast half-pixel motion estimation using steepest descent
WO2008027249A2 (en) * 2006-08-28 2008-03-06 Thomson Licensing Method and apparatus for determining expected distortion in decoded video blocks
US8259804B2 (en) * 2007-01-03 2012-09-04 International Business Machines Corporation Method and system for signal prediction in predictive coding
JP4544334B2 (ja) * 2008-04-15 2010-09-15 ソニー株式会社 画像処理装置および画像処理方法
US20100086051A1 (en) * 2008-10-06 2010-04-08 Lg Electronics Inc. Method and an apparatus for processing a video signal
US10154276B2 (en) * 2011-11-30 2018-12-11 Qualcomm Incorporated Nested SEI messages for multiview video coding (MVC) compatible three-dimensional video coding (3DVC)
US9313493B1 (en) * 2013-06-27 2016-04-12 Google Inc. Advanced motion estimation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103108181A (zh) * 2007-03-23 2013-05-15 三星电子株式会社 用于图像编码和图像解码的方法和设备
CN101415122A (zh) * 2007-10-15 2009-04-22 华为技术有限公司 一种帧间预测编解码方法及装置
CN101771867A (zh) * 2008-12-29 2010-07-07 富士通株式会社 缩小尺寸解码方法和系统
CN101820547A (zh) * 2009-02-27 2010-09-01 源见科技(苏州)有限公司 帧间模式选择方法
CN101600112A (zh) * 2009-07-09 2009-12-09 杭州士兰微电子股份有限公司 分像素运动估计装置和方法
CN103563370A (zh) * 2011-05-27 2014-02-05 思科技术公司 用于图像运动预测的方法、装置及计算机程序产品
WO2015106126A1 (en) * 2014-01-09 2015-07-16 Qualcomm Incorporated Adaptive motion vector resolution signaling for video coding
CN105681805A (zh) * 2016-01-19 2016-06-15 北京大学深圳研究生院 视频编码、解码方法及其帧间预测方法和装置

Also Published As

Publication number Publication date
US20190110060A1 (en) 2019-04-11
US10425656B2 (en) 2019-09-24

Similar Documents

Publication Publication Date Title
JP7313816B2 (ja) 画像予測方法および関連装置
WO2017124298A1 (zh) 视频编码、解码方法及其帧间预测方法、装置和系统
JP7335315B2 (ja) 画像予測方法および関連装置
CN110115037B (zh) 球面投影运动估计/补偿和模式决策
KR101131756B1 (ko) 도메인 변환을 이용한 메시 기반 비디오 압축
CN105681805B (zh) 视频编码、解码方法及其帧间预测方法和装置
KR102263625B1 (ko) 티어드 신호 품질 계층에서의 모션 맵들 및 다른 보조 맵들의 업샘플링 및 다운샘플링
US10506249B2 (en) Segmentation-based parameterized motion models
KR102254986B1 (ko) 구면 투영부들에 의한 왜곡을 보상하기 위한 등장방형 객체 데이터의 프로세싱
Dziembowski et al. IV-PSNR—the objective quality metric for immersive video applications
CN110692241B (zh) 使用多种全局运动模型的多样化运动
Kim et al. Dynamic frame resizing with convolutional neural network for efficient video compression
US11202099B2 (en) Apparatus and method for decoding a panoramic video
US8170110B2 (en) Method and apparatus for zoom motion estimation
CN115486068A (zh) 用于视频编码中基于深度神经网络的帧间预测的方法和设备
JP4494471B2 (ja) 環状映像の参照画素補間方法、その装置、環状映像符号化方法、その装置及び環状映像復号化方法ならびにその装置
US10979704B2 (en) Methods and apparatus for optical blur modeling for improved video encoding
WO2017124305A1 (zh) 基于多方式边界填充的全景视频编码、解码方法和装置
CN118077203A (zh) 具有明确信示的扩展旋转的经扭曲的运动补偿
CN118075493A (zh) 一种基于动态NeRF的体积视频处理方法及系统
EP4371301A1 (en) Warped motion compensation with explicitly signaled extended rotations
Bauermann et al. Analysis of the decoding-complexity of compressed image-based scene representations
Owen Temporal motion models for video mosaicing and synthesis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16885571

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16885571

Country of ref document: EP

Kind code of ref document: A1