WO2021012942A1 - 残差编码、解码方法及装置、存储介质及电子装置 - Google Patents

残差编码、解码方法及装置、存储介质及电子装置 Download PDF

Info

Publication number
WO2021012942A1
WO2021012942A1 PCT/CN2020/100558 CN2020100558W WO2021012942A1 WO 2021012942 A1 WO2021012942 A1 WO 2021012942A1 CN 2020100558 W CN2020100558 W CN 2020100558W WO 2021012942 A1 WO2021012942 A1 WO 2021012942A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
residual
coded
decoded
prediction
Prior art date
Application number
PCT/CN2020/100558
Other languages
English (en)
French (fr)
Inventor
于婧
黎天送
曾幸
王宁
喻莉
李君临
Original Assignee
中兴通讯股份有限公司
华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司, 华中科技大学 filed Critical 中兴通讯股份有限公司
Publication of WO2021012942A1 publication Critical patent/WO2021012942A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction

Definitions

  • the present disclosure relates to the field of communications, and in particular, to residual coding and decoding methods and devices, storage media, and electronic devices.
  • Inter-frame prediction is the core component of the video coding standard HEVC/AV1/AVS. It uses the correlation of the video time domain and uses adjacent coded image pixels in the time domain to predict the pixels of the current image to effectively remove the video time domain. The purpose of redundancy.
  • the current mainstream video coding standards all adopt block-based motion compensation technology. The principle is to find a best matching block for each pixel block of the current image through motion estimation in the previously encoded image.
  • the image used for prediction is called the reference image, and the displacement from the reference block to the current pixel block is called the motion vector (MV for short).
  • the difference between the original pixel value of the current block and the pixel value of the prediction block after motion compensation of the reference block The difference between them is called residual (also called residual value, residual block).
  • Inter-frame prediction only needs to encode the optimal MV, reference frame index and residual value of the coded block into the code stream and transmit it to the decoder.
  • the decoder side finds the corresponding reference block in the reference frame according to the optimal MV and reference frame index, and then adds the residual value after decoding to recover the original pixel value of the decoded block.
  • Inter-frame prediction mainly needs to consume bit rate to encode the residual block.
  • Traditional inter-frame prediction directly encodes the actual residual obtained after prediction. However, for most complex motion situations, the value of the residual block is very large, which leads to a very high bit rate of the coding residual.
  • a residual coding method including: determining a residual of a reference block of a block to be coded, wherein the reference block of the block to be coded includes: At least two first reference blocks in the time domain; or, at least two second reference blocks of the block to be coded in the spatial domain; or, at least one first reference block of the block to be coded in the time domain, and At least one second reference block of the to-be-coded block in the spatial domain; based on the residual of the reference block of the to-be-coded block, predict the residual of the to-be-coded block to obtain the Prediction residual; encoding the residual difference between the prediction residual of the block to be coded and the actual residual of the block to be coded into a bitstream.
  • a residual decoding method including: determining a prediction block of a block to be decoded based on a motion vector MV parsed from a code stream, and obtaining the residual of the reference block of the block to be decoded Difference, wherein the reference block of the block to be decoded includes: at least two first reference blocks of the block to be decoded in the time domain; or, at least two second reference blocks of the block to be decoded in the spatial domain Reference block; or, at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the spatial domain; based on the reference block of the block to be decoded Residual, predict the residual of the block to be decoded to obtain the prediction residual of the block to be decoded; according to the prediction residual of the block to be decoded and the to-be decoded parsed from the code stream The residual difference value between the prediction residual of the block and the actual residual
  • a residual coding device including: an encoder-side reference residual determining module configured to determine the residual of a reference block of the block to be coded, wherein The reference block includes: at least two first reference blocks of the block to be coded in the time domain; or, at least two second reference blocks of the block to be coded in the spatial domain; or, the block to be coded At least one first reference block in the time domain and at least one second reference block in the space domain of the block to be coded; the encoder-side residual prediction module is set to be based on the reference block of the block to be coded Residual, predicting the residual of the block to be coded to obtain the prediction residual of the block to be coded; the residual coding module is set to combine the prediction residual of the block to be coded and the block to be coded The residual difference between the actual residuals of the blocks is encoded into the bitstream.
  • an encoder-side reference residual determining module configured to determine the residual of a reference block of the block to be coded, wherein The reference block includes:
  • a residual decoding device including: a decoder-side reference residual determining module, configured to determine the prediction block of the block to be decoded based on the motion vector MV parsed from the code stream, and Obtain the residual of the reference block of the block to be decoded, wherein the reference block of the block to be decoded includes: at least two first reference blocks of the block to be decoded in the time domain; or, Decode at least two second reference blocks of the block in the spatial domain; or, at least one first reference block of the block to be decoded in the time domain and at least one second reference block of the block to be decoded in the spatial domain; decode;
  • the device-side residual prediction module is configured to predict the residual of the block to be decoded based on the residual of the reference block of the block to be decoded to obtain the prediction residual of the block to be decoded; residual decoding Module, set to be based on the prediction residual of the block to be decoded and the residual difference between
  • a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.
  • an electronic device including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute any one of the foregoing Steps in the method embodiment.
  • FIG. 1 is a hardware structural block diagram of a mobile terminal 10 provided with an inter-frame prediction module capable of applying a residual coding method and a residual decoding method according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a residual coding method according to Embodiment 1 of the present disclosure
  • FIG. 3 is a structural block diagram of a residual coding device according to Embodiment 1 of the present disclosure.
  • FIG. 4 is a flowchart of a residual decoding method according to Embodiment 2 of the present disclosure.
  • FIG. 5 is a structural block diagram of a residual decoding device according to Embodiment 2 of the present disclosure.
  • FIG. 6 is a schematic diagram of a convolutional neural network for generating a residual prediction module according to Embodiment 3 of the present disclosure
  • FIG. 7 is a schematic diagram of selecting a time-domain reference residual block in the case of a P frame and a B frame according to Embodiment 3 of the present disclosure
  • FIG. 8 is a schematic diagram of selecting a spatial reference residual block according to Embodiment 3 of the present disclosure.
  • FIG. 9 is a schematic diagram of a method for improving the performance of video inter-frame coding by using spatio-temporal correlation according to Embodiment 3 of the present disclosure.
  • the embodiments of the present disclosure provide a solution for improving coding performance (for example, video inter-frame coding performance) using spatio-temporal correlation.
  • the solution uses a residual prediction module to predict the residual value of a coded block in video inter-frame prediction to reduce The bit rate of the coding residual is used to improve the coding efficiency of inter-frame prediction in video coding.
  • the codec adds a residual prediction module to the inter-frame prediction, and uses the residuals of the reference block in the spatial and temporal domains of the current block to be coded to predict the residual value of the current block to be coded (hereinafter referred to as prediction residual).
  • the encoder Compared with the coding scheme that directly encodes the residual block of the block to be coded into the code stream, the encoder only needs to transmit the difference between the actual residual and the predicted residual of the current block, thereby reducing the bits required for the coding residual Rate, in order to achieve the effect of saving bit rate.
  • Embodiment 1 describes a residual coding scheme that can reduce the bit rate required for encoding residuals from the encoder side.
  • Embodiment 2 of the present application describes the correct analysis to be decoded corresponding to the encoder side from the decoder side.
  • the residual decoding scheme of the block can be respectively applied to the inter-frame prediction module of the codec side (for example, it can be applied to the inter-frame prediction of the existing video coding standard).
  • these inter-frame prediction modules can be set in a mobile terminal, a computer terminal or a similar computing device. Taking the inter-frame prediction module set on a mobile terminal as an example, FIG.
  • the mobile terminal 10 may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. , The corresponding program can be run on it to realize the function of the inter-frame prediction module that can apply the residual coding method and the residual decoding method) and the memory 104 for storing data.
  • processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA.
  • the corresponding program can be run on it to realize the function of the inter-frame prediction module that can apply the residual coding method and the residual decoding method) and the memory 104 for storing data.
  • the above-mentioned mobile terminal can also It includes a transmission device 106 and an input/output device 108 for communication functions.
  • a transmission device 106 and an input/output device 108 for communication functions.
  • the structure shown in FIG. 1 is only for illustration, and does not limit the structure of the above-mentioned mobile terminal.
  • the mobile terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration from that shown in FIG.
  • the memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as the encoder-side embodiment provided in Embodiment 1 of the present disclosure and the computer program corresponding to the decoder-side embodiment provided in Embodiment 2 102 executes various functional applications and data processing by running a computer program stored in the memory 104, that is, realizes the above-mentioned method.
  • the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the mobile terminal 10 via a network.
  • networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission device 106 is used to receive or send data via the network.
  • a specific example of the aforementioned network may include a wireless network provided by the communication provider of the mobile terminal 10.
  • the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station so as to communicate with the Internet.
  • the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
  • RF Radio Frequency
  • FIG. 2 is a flowchart of the residual coding method according to Embodiment 1 of the present disclosure. As shown in FIG. 2, the process includes the following steps:
  • Step S202 Determine the residual of the reference block of the block to be coded (in the embodiment of the present disclosure, this term is also called residual block, reference residual, reference residual block), wherein
  • the reference block includes: at least two first reference blocks of the block to be coded in the time domain; or, at least two second reference blocks of the block to be coded in the spatial domain; or, the block to be coded At least one first reference block in the time domain and at least one second reference block in the spatial domain of the block to be coded;
  • Step S204 Predict the residual of the block to be coded based on the residual of the reference block of the block to be coded to obtain the prediction residual of the block to be coded;
  • Step S206 encoding the residual difference between the prediction residual of the block to be coded and the actual residual of the block to be coded into a code stream (for example, a video inter-frame coding code stream).
  • a code stream for example, a video inter-frame coding code stream
  • the residual of the reference block mentioned in step S202 refers to the actual coding residual of the reference block itself obtained according to the traditional method in the coding process. Because the reference block has been coded, the residual of the reference block can be passed Obtained by technical means.
  • the actual residual of the block to be coded mentioned in step S206 is the difference between the pixel value of the original image block of the block to be coded and the pixel value of the prediction block of the block to be coded,
  • the prediction block of the block to be coded is a block obtained after motion compensation is performed on the reference block of the block to be coded.
  • the residual of the block to be coded is predicted according to the residual of the reference block to obtain the prediction residual, and the residual difference between the predicted residual and the actual residual is encoded into the code stream, because the encoder only needs to transmit
  • the difference between the actual residual of the current block to be coded and the prediction residual, and the size of the residual difference is much lower than the actual residual, so the bit rate required for the coding residual is effectively reduced, and the code saving is achieved Rate effect.
  • selecting an appropriate reference block in view of the continuity of the video image, selecting an appropriate reference block can accurately predict the residual of the block to be coded.
  • the first reference block of the block to be coded in the time domain may include: the block to be coded The optimal prediction unit block PU in the forward reference frame is recorded as the first optimal PU; and/or, the optimal PU of the block to be coded in the frame before the forward reference frame is recorded as The second best PU.
  • the first reference block of the block to be coded in the time domain includes: the optimal PU of the block to be coded in the forward reference frame , Marked as the third optimal PU; and/or, the optimal PU of the block to be coded in the backward reference frame, recorded as the fourth optimal PU.
  • the first reference block of the block to be coded in the time domain can be determined in the following manner: It is estimated that the first optimal PU of the block to be encoded is determined in the forward reference frame, and the motion vector MV of the block to be encoded relative to the first optimal PU is determined; according to the first The position of the optimal PU and the MV are determined by motion estimation in the previous frame of the forward reference frame.
  • the second reference block of the block to be encoded in the spatial domain is located in the image frame where the block to be encoded is located, and is adjacent to the block to be encoded in the spatial domain.
  • the second reference block of the block to be coded in the spatial domain includes: in the image frame where the block to be coded is located, the left block adjacent to the block to be coded and /Or the upper block (for example, it may be the adjacent left and/or upper block in the same size space).
  • the first reference block of the block to be coded in the time domain and/or the second reference block of the block to be coded in the spatial domain can be determined in the above-mentioned manner.
  • the residual error corresponding to the first reference block in the time domain can be called a time domain reference residual block
  • the residual error corresponding to the second reference block in the spatial domain can be called a spatial reference residual block, based on at least one time domain reference
  • the residual block and/or at least one spatial reference residual block can predict the residual information of the block to be coded.
  • two time-domain reference residual blocks and two spatial-domain reference residual blocks can be selected, and residual prediction is performed based on these four reference residual blocks.
  • the reference block of the block to be coded includes two first reference blocks whose corresponding residuals in the time domain of the block to be coded are not all zeros, and the block to be coded The corresponding residuals in the spatial domain are two second reference blocks that are not all zeros.
  • one time-domain reference residual block and one spatial-domain reference residual block can be selected and based on These two reference residual blocks are used for residual prediction.
  • the reference block of the block to be coded includes: a first reference block in which the corresponding residual of the block to be coded in the time domain is not all zeros, and the block to be coded The corresponding residual in the spatial domain is a second reference block that is not all zero.
  • the reference block of the block to be coded includes two first reference blocks whose corresponding residuals in the time domain of the block to be coded are not all zeros, and the block to be coded The corresponding residual in the spatial domain is a second reference block that is not all zero.
  • the reference block of the block to be coded includes: a first reference block in which the corresponding residual of the block to be coded in the time domain is not all zeros, and the block to be coded The corresponding residuals in the spatial domain are two second reference blocks that are not all zeros.
  • the reference block of the block to be coded includes two first reference blocks whose corresponding residuals in the time domain of the block to be coded are not all zeros.
  • two spatial reference residual blocks may be selected, and residual prediction is performed based on the two spatial reference residual blocks.
  • the reference blocks of the to-be-coded block include: two second reference blocks whose corresponding residuals in the spatial domain of the to-be-coded block are not all zeros.
  • predicting the residual of the to-be-coded block based on the residual of the reference block can be implemented in a variety of prediction methods.
  • predicting the residual of the to-be-coded block, and the manner of obtaining the prediction residual of the to-be-coded block may include One of the following:
  • the residual prediction model is a residual prediction model that can be trained by a deep learning network,
  • the training samples include: the residuals of the reference blocks of the coding blocks with known residuals, and the residuals of the coding blocks with known residuals Actual residual
  • linear weighted sum of a single weight can be calculated using the following formula:
  • W 1 and W 2 are weights, ResiA(i,j),ResiB(i,j) are the pixel values of the selected reference residual block at the pixel point (i,j), and ReiPred(i,j) is the prediction residual The pixel value of the difference block at the pixel point (i, j).
  • linear weighted sum of multiple weights can be calculated using the following formula:
  • W1 ij and W2 ij are the weight values corresponding to each pixel of the reference residual block, and they can be obtained through training.
  • ResiA(i,j),ResiB(i,j) is the pixel value of the selected reference residual block at pixel point (i,j)
  • ReiPred(i,j) is the predicted residual block at pixel point (i, j) Pixel value at.
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present disclosure can be embodied in the form of a software product in essence or a part that contributes to the prior art.
  • the computer software product can be stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in the embodiment of the present disclosure.
  • a residual coding device is provided corresponding to the above-mentioned residual coding method, which is used to implement the above-mentioned residual coding method, and what has been explained will not be repeated.
  • the term "module” may be a combination of software and/or hardware that implements predetermined functions. Although the device described below is preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.
  • Fig. 3 is a structural block diagram of a residual coding device according to Embodiment 1 of the present disclosure. As shown in Fig. 3, the device includes:
  • the encoder-side reference residual determination module 32 is configured to determine the residual of the reference block of the block to be encoded, wherein the reference block of the block to be encoded includes: at least two of the blocks to be encoded in the time domain First reference block; or, at least two second reference blocks of the block to be coded in the spatial domain; or, at least one first reference block of the block to be coded in the time domain and the block to be coded in the spatial domain At least one second reference block above;
  • the encoder-side residual prediction module 34 is configured to predict the residual of the block to be encoded based on the residual of the reference block of the block to be encoded to obtain the prediction residual of the block to be encoded;
  • the residual coding module 36 is configured to encode the residual difference between the prediction residual of the block to be coded and the actual residual of the block to be coded into a bitstream.
  • each of the above modules can be implemented by software or hardware.
  • it can be implemented in the following manner, but not limited to this: the above modules are all located in the same processor; or, the above modules are combined in any combination The forms are located in different processors.
  • This embodiment also provides a storage medium in which a computer program is stored, wherein the computer program is set to execute the steps in the above residual coding method in this embodiment when running.
  • the foregoing storage medium may be configured to store a computer program for executing the following steps:
  • the reference block includes: at least two first reference blocks of the block to be coded in the time domain; or, at least two second reference blocks of the block to be coded in the spatial domain; or, the block to be coded is At least one first reference block in the time domain and at least one second reference block in the spatial domain of the block to be coded;
  • S3 Encode the residual difference between the prediction residual of the block to be coded and the actual residual of the block to be coded into a code stream (for example, a video inter-frame coding code stream).
  • a code stream for example, a video inter-frame coding code stream.
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (RAM for short), mobile hard disk, magnetic disk Various media that can store computer programs such as discs or optical discs.
  • An embodiment of the present disclosure also provides an electronic device including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps of the above-mentioned residual coding method in this embodiment.
  • the above electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the above processor, and the input/output device is connected to the above processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • the reference block includes: at least two first reference blocks of the block to be coded in the time domain; or, at least two second reference blocks of the block to be coded in the spatial domain; or, the block to be coded is At least one first reference block in the time domain and at least one second reference block in the spatial domain of the block to be coded;
  • S3 Encode the residual difference between the prediction residual of the block to be coded and the actual residual of the block to be coded into a code stream (for example, a video inter-frame coding code stream).
  • a code stream for example, a video inter-frame coding code stream.
  • FIG. 4 is a flowchart of the residual decoding method according to Embodiment 2 of the present disclosure. As shown in FIG. 4, the process includes the following steps:
  • Step S402 Determine the prediction block of the block to be decoded based on the motion vector MV in the code stream (for example, the video inter-coding code stream), and obtain the residual of the reference block of the block to be decoded (in the embodiment of the present disclosure, This term is also called residual block, reference residual, reference residual block), where the reference block of the block to be decoded includes: at least two first blocks of the block to be decoded in the time domain Reference block; or, at least two second reference blocks of the block to be decoded in the spatial domain; or, at least one first reference block of the block to be decoded in the time domain and a reference block of the block to be decoded in the spatial domain At least one second reference block;
  • Step S404 predicting the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, to obtain the prediction residual of the block to be decoded;
  • Step S406 According to the prediction residual of the block to be decoded and the residual difference between the prediction residual of the block to be decoded and the actual residual of the block to be decoded parsed from the code stream, Determine the actual residual of the block to be decoded.
  • the residual of the reference block mentioned in step S402 refers to the actual coding residual of the reference block itself obtained in the coding process according to the traditional method. Because the reference block has been coded, the residual of the reference block can be It can be obtained through technical means.
  • step S402 the predicted block of the block to be decoded is a block obtained after motion compensation is performed on the block to be decoded.
  • the actual residual of the block to be decoded mentioned in step S406 is the difference between the pixel value of the original image block of the block to be decoded and the pixel value of the predicted block of the block to be decoded.
  • the decoder side predicts the residual error of the block to be decoded based on the residual error of the reference block of the block to be decoded, and according to the prediction residual and code
  • the residual difference value carried in the stream determines the actual residual of the block to be decoded, which can ensure that at the decoder side, the code stream based on the saved code rate can still be correctly restored to obtain the actual residual of the block to be decoded.
  • the first reference block of the block to be decoded in the time domain includes: The optimal prediction unit block PU in the forward reference frame is recorded as the first optimal PU; and/or, the optimal PU of the block to be decoded in the frame before the forward reference frame is recorded as the first optimal PU; 2.
  • Optimal PU in a case where the image frame in which the block to be decoded is located is a P frame, the first reference block of the block to be decoded in the time domain includes: The optimal prediction unit block PU in the forward reference frame is recorded as the first optimal PU; and/or, the optimal PU of the block to be decoded in the frame before the forward reference frame is recorded as the first optimal PU; 2.
  • Optimal PU is a case where the image frame in which the block to be decoded is located is a P frame.
  • the first reference block of the block to be decoded in the time domain includes: The optimal PU in the forward reference frame is recorded as the third optimal PU; and/or, the optimal PU of the block to be decoded in the backward reference frame is recorded as the fourth optimal PU.
  • the first reference block of the block to be decoded in the time domain is determined in the following manner: Determine the first optimal PU of the block to be decoded in the forward reference frame according to the MV parsed in the bitstream; according to the first optimal PU before the forward reference frame For co-located PUs in one frame, the second optimal PU is determined in a frame before the forward reference frame through motion estimation.
  • the second reference block of the block to be decoded in the spatial domain is located in the image frame where the block to be decoded is located, and is adjacent to the block to be decoded in the spatial domain.
  • the second reference block of the block to be decoded in the spatial domain includes: in the image frame where the block to be decoded is located, the left block adjacent to the block to be decoded and /Or the upper block (for example, it may be the adjacent left and/or upper block in the same size space).
  • the first reference block of the block to be decoded in the time domain and/or the second reference block of the block to be decoded in the spatial domain can be determined in the above-mentioned manner.
  • the residual error corresponding to the first reference block in the time domain can be called a time domain reference residual block
  • the residual error corresponding to the second reference block in the spatial domain can be called a spatial reference residual block, based on at least one time domain reference
  • the residual block and/or at least one spatial reference residual block can predict the residual information of the block to be decoded.
  • two time-domain reference residual blocks and two spatial-domain reference residual blocks can be selected, and residual prediction is performed based on these four reference residual blocks.
  • the The reference block of the block to be decoded includes: two first reference blocks whose corresponding residuals of the block to be decoded in the time domain are not all zeros, and the reference block of the block to be decoded in the spatial domain The corresponding residuals are two second reference blocks that are not all zeros.
  • one time-domain reference residual block and one spatial-domain reference residual block can be selected and based on These two reference residual blocks are used for residual prediction.
  • the reference block of the block to be decoded includes: a first reference block in which the corresponding residual of the block to be decoded in the time domain is not all zeros, and the block to be decoded The corresponding residual in the spatial domain is a second reference block that is not all zero.
  • the reference block of the block to be decoded includes: two first reference blocks whose corresponding residuals in the time domain of the block to be decoded are not all zeros, and the block to be decoded The corresponding residual in the spatial domain is a second reference block that is not all zero.
  • the reference block of the block to be decoded includes: a first reference block in which the corresponding residual of the block to be decoded in the time domain is not all zeros, and the block to be decoded The corresponding residuals in the spatial domain are two second reference blocks that are not all zeros.
  • the reference block of the block to be decoded includes two first reference blocks whose corresponding residuals in the time domain of the block to be decoded are not all zeros.
  • two spatial reference residual blocks may be selected, and residual prediction is performed based on the two spatial reference residual blocks.
  • the reference block of the block to be decoded includes two second reference blocks whose corresponding residuals in the spatial domain of the block to be decoded are not all zeros.
  • predicting the residual of the to-be-decoded block based on the residual of the reference block can be implemented in a variety of prediction methods.
  • predicting the residual of the block to be decoded based on the residual of the reference block of the block to be decoded, and the manner of obtaining the prediction residual of the block to be decoded may include One of the following:
  • linear weighted sum of a single weight can be calculated using the following formula:
  • W 1 and W 2 are weights, ResiA(i,j),ResiB(i,j) are the pixel values of the selected reference residual block at the pixel point (i,j), and ReiPred(i,j) is the prediction residual The pixel value of the difference block at the pixel point (i, j).
  • linear weighted sum of multiple weights can be calculated using the following formula:
  • W1 ij and W2 ij are the weight values corresponding to each pixel of the reference residual block, and they can be obtained through training.
  • ResiA(i,j),ResiB(i,j) is the pixel value of the selected reference residual block at pixel point (i,j)
  • ReiPred(i,j) is the predicted residual block at pixel point (i, j) Pixel value at.
  • the difference between the prediction residual of the block to be decoded and the prediction residual of the block to be decoded and the actual residual of the block to be decoded parsed from the code stream After determining the actual residual difference of the block to be decoded, it further includes: adding the actual residual error to the predicted block of the block to be decoded to restore the block to be decoded The original image block of the block.
  • the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present disclosure can be embodied in the form of a software product in essence or a part that contributes to the prior art.
  • the computer software product can be stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes a number of instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in the embodiment of the present disclosure.
  • a residual decoding device is provided corresponding to the above-mentioned residual decoding method, which is used to implement the above-mentioned residual decoding method, and what has been explained will not be repeated.
  • the term "module” can implement a combination of software and/or hardware with predetermined functions.
  • the device described below is preferably implemented by software, implementation by hardware or a combination of software and hardware is also possible and conceived.
  • Fig. 5 is a structural block diagram of a residual decoding device according to Embodiment 2 of the present disclosure. As shown in Fig. 5, the device includes:
  • the decoder-side reference residual determination module 52 is configured to determine the prediction block of the block to be decoded based on the motion vector MV parsed from the code stream, and obtain the residual of the reference block of the block to be decoded, wherein
  • the reference block of the block includes: at least two first reference blocks of the block to be decoded in the time domain; or, at least two second reference blocks of the block to be decoded in the spatial domain; or, At least one first reference block of the decoded block in the time domain and at least one second reference block of the block to be decoded in the spatial domain;
  • the decoder-side residual prediction module 54 is configured to predict the residual of the block to be decoded based on the residual of the reference block of the block to be decoded to obtain the prediction residual of the block to be decoded;
  • the residual decoding module 56 is configured to determine the difference between the prediction residual of the to-be-decoded block and the prediction residual of the to-be-decoded block parsed from the code stream and the actual residual of the to-be-decoded block.
  • the residual difference value determines the actual residual of the block to be decoded.
  • each of the above modules can be implemented by software or hardware.
  • it can be implemented in the following manner, but not limited to this: the above modules are all located in the same processor; or, the above modules are combined in any combination The forms are located in different processors.
  • This embodiment also provides a storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in the above-mentioned residual decoding method in this embodiment when the computer program is run.
  • the foregoing storage medium may be configured to store a computer program for executing the following steps:
  • the reference block of the block to be decoded includes: at least two first references of the block to be decoded in the time domain Block; or, at least two second reference blocks of the block to be decoded in the spatial domain; or, at least one first reference block of the block to be decoded in the time domain and at least one of the block to be decoded in the spatial domain A second reference block;
  • S3 Determine according to the prediction residual of the block to be decoded and the residual difference between the prediction residual of the block to be decoded and the actual residual of the block to be decoded parsed from the code stream. The actual residual of the block to be decoded.
  • the foregoing storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, ROM for short), Random Access Memory (RAM for short), mobile hard disk, magnetic disk Various media that can store computer programs such as discs or optical discs.
  • This embodiment also provides an electronic device including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps of the above residual decoding method in this embodiment.
  • the above electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the above processor, and the input/output device is connected to the above processor.
  • the foregoing processor may be configured to execute the following steps through a computer program:
  • the reference block of the block to be decoded includes: at least two first references of the block to be decoded in the time domain Block; or, at least two second reference blocks of the block to be decoded in the spatial domain; or, at least one first reference block of the block to be decoded in the time domain and at least one of the block to be decoded in the spatial domain A second reference block;
  • S3 Determine according to the prediction residual of the block to be decoded and the residual difference between the prediction residual of the block to be decoded and the actual residual of the block to be decoded parsed from the code stream. The actual residual of the block to be decoded.
  • the following embodiment 3 takes the training of the residual prediction model (also referred to as the residual prediction module) through the deep learning network as an example, and describes in detail the specific implementation of the residual coding and decoding scheme. It should be noted that, for the solution of using linear weighted sum to perform residual prediction, the principle of overall residual coding and decoding is similar to the following embodiment, and will not be repeated.
  • the residual prediction model also referred to as the residual prediction module
  • This embodiment designs a method for the coding unit of video coding to improve the performance of video inter-frame coding by using spatio-temporal correlation.
  • the encoder will call this method to complete the inter-frame prediction. The content involved in the method is described in detail below.
  • the residual prediction module can be generated by a convolutional neural network.
  • 6 is a schematic diagram of a convolutional neural network for generating a residual prediction module according to Embodiment 3 of the present disclosure.
  • the convolutional neural network uses two temporal reference residual blocks related to the current coding block And two spatial reference residual blocks as input (it can also be other numbers of time-domain reference residual blocks and spatial-domain reference residual blocks, or only time-domain reference residual blocks or only spatial-domain reference residual blocks), after the volume Product and pooling extract the characteristic information of the residual image, and then output the prediction residual through the deconvolution process.
  • the input of the convolutional neural network is four reference residual blocks, two reference residual blocks are selected in the time domain reference frame, and two reference residual blocks are selected in the adjacent spatial domain.
  • FIG. 7 is a schematic diagram of selecting a time domain reference residual block in the case of a P frame and a B frame according to Embodiment 3 of the present disclosure.
  • the optimal prediction unit block (PU) is found in the forward reference frame through motion estimation, and the current The motion vector (MV) of the coding block. Then, based on the position of the optimal reference PU of the current encoding block and the MV of the current encoding block, another optimal reference PU is searched through motion estimation from the previous frame of the forward reference frame, and the two optimal reference PUs The corresponding residual block is input as two time-domain reference residual blocks of the P frame.
  • the B frame there is a forward reference frame and a backward reference frame.
  • a residual block of the optimal PU is respectively found through motion estimation as the time domain reference residual block input.
  • FIG. 8 is a schematic diagram of selecting a spatial reference residual block according to Embodiment 3 of the present disclosure. As shown in FIG. 8, the left and upper blocks adjacent to the current block and the same size are selected as the reference block, and the residual block corresponding to the reference block is used as the input of the spatial reference residual block.
  • the output of the residual prediction module is the prediction residual block of the current block.
  • the prediction residual block can be subtracted from the actual residual block to obtain the coding residual.
  • the prediction residual block plus the coding residual can recover the actual residual block.
  • FIG. 9 is a schematic diagram of a method for improving the performance of video inter-frame coding by using spatio-temporal correlation in Embodiment 3 of the present disclosure.
  • the implementation of the residual prediction module through a neural network is taken as an example, showing the encoder and decoding
  • the residual coding and decoding operations at the device are described in detail below.
  • the operation of the encoder includes the following four steps.
  • the first step is to pre-encode the video sequence provided by the test standard, and extract the residual data of the four reference blocks and the residual data of the current block as training data.
  • the second step is to read the to-be-coded block to be coded from the input video image, obtain the prediction block of the to-be-coded block through motion estimation prediction, and subtract the pixel value of the prediction block from the original image pixel value to obtain the traditional residual block.
  • the motion vector MV obtained by motion estimation is searched for two time-domain reference residual blocks as shown in Figure 7, and then two spatial-domain reference residual blocks are obtained according to Figure 8.
  • the obtained four reference residual blocks are input into the residual prediction module to obtain the prediction residual block.
  • the fourth step is to use the difference between the traditional residual block and the prediction residual block as the coding residual of the current block to be coded, and write it into the bitstream file.
  • the traditional residual block is the residual value to be coded obtained by the current coding block according to the traditional inter-frame prediction method
  • the prediction residual is the residual value predicted by the residual prediction module.
  • the reference block is found in the reference frame by motion estimation, and after the motion compensation operation, the prediction block is obtained.
  • the difference between the original block and the prediction block is the traditional inter prediction mode The actual residual block.
  • a residual prediction module is established to predict the residual of the current block.
  • the input of the residual prediction module is the two reference residual blocks in the time domain and the space domain corresponding to the reference block of the current coding block.
  • the output of the residual prediction module is the prediction residual of the current block, the actual residual block and the prediction
  • the difference of the residual block is used as the coding residual, and the subsequent transformation, quantization and entropy coding are performed.
  • the operation of the decoder includes the following three steps.
  • the first step is to embed the residual prediction module into the inter-frame prediction of the decoder.
  • the second step is to obtain the prediction block of the current block to be decoded through the motion vector MV parsed by the code stream, and then find two time-domain reference residual blocks as shown in Fig. 7, and obtain two spatial domains as shown in Fig. 8 With reference to the residual block, the four obtained reference residual blocks are input into the residual prediction module to obtain the prediction residual block. In this step, the four reference residual blocks corresponding to the current block are generated in the same manner as the four reference residual blocks on the encoder side.
  • the third step is to read the coding residual of the block to be decoded from the code stream, then add the prediction residual block to the coding residual to restore the actual residual of the block to be decoded, and finally add the prediction block to restore the original image block.
  • the same residual prediction module as the encoder is established to predict the residual of the current block.
  • the input of the residual prediction module is the two reference residual blocks in the time domain and the space domain corresponding to the reference block of the current decoded block.
  • the search method of the reference block is the same as the encoder end, and the output is the prediction residual of the current decoded block. Adding the decoded residual and the predicted block of the block to be decoded can complete the reconstruction of the current block.
  • modules or steps of the present disclosure can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, and in some cases, can be different from here.
  • the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, the present disclosure is not limited to any specific hardware and software combination.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供了残差编码、解码方法及装置、存储介质及电子装置。在残差编码方法中,包括:确定待编码块的参考块的残差;基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流。

Description

残差编码、解码方法及装置、存储介质及电子装置 技术领域
本公开涉及通信领域,具体而言,涉及残差编码、解码方法及装置、存储介质及电子装置。
背景技术
近些年,随着网络通信和多媒体技术的快速发展,视频内容开始以高分辨率和超高分辨率形式呈现给观众。与标准分辨率视频相比,视频的分辨率越高,人们观看视频内容的视觉效果体验越好。同时,高分辨率视频也给视频编码技术提出了更高的要求。为了应对这一挑战,国际联合视频编码小组(Joint Collaborative Team on Video Coding,简称为JCT-VC)开发了新一代高效视频编码标准(High Efficiency Video Coding,简称为HEVC)。与上一代视频编码标准H.264/MPEG-4AVC相比,HEVC提升了50%的压缩效率,并保持了原来相同的视觉质量。2015年9月1日,谷歌宣布与亚马逊、思科、英特尔、微软、火狐、奈飞(Netflix)成立开放媒体联盟(Alliance of Open Media,简称为AOM),AOM开发了新一代视频编码格式AV1。2017年12月,中国数字音视频编解码技术标准(Audio Video Coding Standard,简称为AVS)提出新一代AVS3视频编码。联合视频探索团队(Joint Video Exploration Team,简称为JVET)于2018年4月10日美国圣地亚哥会议上确定最新一代视频编码标准的名称为通用视频编码(Versatile Video Coding,简称为VVC),主要目标是改进现有HEVC,提供更高的压缩性能,同时会针对新兴应用(360°全景视频和高动态范围成像(High Dynamic Range Imaging,简称为HDR或HDRI))进行优化。VVC预计在2020年之前完成标准化,目前提出的方案相对于HEVC提高已经达到40%以上。
帧间预测是视频编码标准HEVC/AV1/AVS中最核心的组成部分, 它利用视频时间域的相关性,使用时域相邻已编码图像像素预测当前图像的像素,以达到有效去除视频时域冗余的目的。目前主流的视频编码标准都是采用基于块的运动补偿技术,原理是为当前图像的每个像素块在之前已编码图像中通过运动估计寻找一个最佳匹配块。用于预测的图像称为参考图像,参考块到当前像素块的位移称为运动矢量(motion vector,简称为MV),当前块原始像素值与参考块进行运动补偿后的预测块的像素值之间的差值称为残差(也可以称为残差值、残差块)。帧间预测只需要将编码块的最优MV、参考帧索引和残差值经编码后写入码流传输给解码器端。解码器端根据最优MV和参考帧索引在参考帧找到对应的参考块,然后再加上解码后残差值,就可以恢复出该解码块的原始像素值。帧间预测主要需要消耗比特率的是残差块的编码,传统的帧间预测直接将预测后得到的实际残差进行编码。然而,针对大多数复杂运动情况下,残差块的值非常大,这会导致编码残差的比特率非常高。
公开内容
根据本公开的一个实施例,提供了一种残差编码方法,包括:确定待编码块的参考块的残差,其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流。
根据本公开的另一个实施例,提供了一种残差解码方法,包括:基于码流中解析出来的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差,其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考 块;基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;根据所述待解码块的预测残差和从所述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差。
根据本公开的又一个实施例,提供了一种残差编码装置,包括:编码器端参考残差确定模块,设置为确定待编码块的参考块的残差,其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;编码器端残差预测模块,设置为基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;残差编码模块,设置为将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流。
根据本公开的又一个实施例,提供了一种残差解码装置,包括:解码器端参考残差确定模块,设置为基于码流中解析出来的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差,其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;解码器端残差预测模块,设置为基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;残差解码模块,设置为根据所述待解码块的预测残差和从所述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差。
根据本公开的又一个实施例,还提供了一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
根据本公开的又一个实施例,还提供了一种电子装置,包括存 储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行上述任一项方法实施例中的步骤。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1是本公开实施例的一种设置有能够应用残差编码方法和残差解码方法的帧间预测模块的移动终端10的硬件结构框图;
图2是根据本公开实施例1的残差编码方法的流程图;
图3是根据本公开实施例1的残差编码装置的结构框图;
图4是根据本公开实施例2的残差解码方法的流程图;
图5是根据本公开实施例2的残差解码装置的结构框图;
图6是根据本公开实施例3的用于生成残差预测模块的卷积神经网络的示意图;
图7是根据本公开实施例3的P帧和B帧情况下时域参考残差块的选取示意图;
图8是根据本公开实施例3的空域参考残差块的选取示意图;
图9是本公开实施例3的一种利用时空相关性提升视频帧间编码性能的方法示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本公开。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本公开实施例提供了一种利用时空相关性提升编码性能(例如,视频帧间编码性能)的方案,该方案在视频帧间预测中利用残差预测 模块预测编码块的残差值,以减少编码残差的码率,用于提升视频编码中帧间预测编码效率。编解码器在帧间预测中加入残差预测模块,利用当前待编码块的空域和时域上参考块的残差,预测当前待编码块的残差值(以下称之为预测残差)。相比于直接将待编码块的残差块编码进码流的编码方案,编码器只需要传输当前块的实际残差与预测残差之间的差值,从而减少编码残差所需的比特率,以达到节省码率的效果。
以下实施例1从编码器端描述了一种能够减少编码残差所需的比特率的残差编码方案,本申请实施例2从解码器端描述了与编码器端相对应的正确解析待解码块的残差块的残差解码方案。实施例1提供的编码器端的实施例以及实施例2所提供的解码器端的实施例可以分别应用在编解码器端的帧间预测模块中(例如,可应用于现有视频编码标准的帧间预测模块中),这些帧间预测模块可以设置在移动终端、计算机终端或者类似的运算装置中。以帧间预测模块设置在移动终端上为例,图1是本公开实施例的一种设置有能够应用残差编码方法和残差解码方法的帧间预测模块的移动终端10的硬件结构框图。如图1所示,移动终端10可以包括一个或多个(图1中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置,其上可以运行相应的程序用于实现能够应用残差编码方法和残差解码方法的帧间预测模块的功能)和用于存储数据的存储器104,在一些实施方式中,上述移动终端还可以包括用于通信功能的传输设备106以及输入输出设备108。本领域普通技术人员可以理解,图1所示的结构仅为示意,其并不对上述移动终端的结构造成限定。例如,移动终端10还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。
存储器104可用于存储计算机程序,例如,应用软件的软件程序以及模块,如本公开实施例1提供的编码器端的实施例以及实施例2所提供的解码器端的实施例对应的计算机程序,处理器102通过运行存储在存储器104内的计算机程序,从而执行各种功能应用以及数据处理,即实现上述的方法。存储器104可包括高速随机存储器,还 可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至移动终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输装置106用于经由网络接收或者发送数据。上述的网络的具体实例可包括移动终端10的通信供应商提供的无线网络。在一些实例中,传输装置106包括网络适配器(Network Interface Controller,简称为NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一些实例中,传输装置106可以为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
以下分别通过实施例1和2从编码器端和解码器端描述残差编码方案。
实施例1
在本实施例中提供了一种残差编码方法,图2是根据本公开实施例1的残差编码方法的流程图,如图2所示,该流程包括如下步骤:
步骤S202,确定待编码块的参考块的残差(在本公开实施例中,这一术语也被称为残差块、参考残差、参考残差块),其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;
步骤S204,基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;
步骤S206,将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流(例如,视频帧间编码码流)。
在步骤S202中所提及的参考块的残差,指的是在编码过程中按照传统方法得到参考块本身的实际编码残差,因为参考块已经完成编码,所以参考块的残差是可以通过技术手段获得的。
在步骤S206中所提及的所述待编码块的所述实际残差为所述待编码块的原始图像块的像素值与所述待编码块的预测块的像素值之间的差值,其中,所述待编码块的所述预测块是对所述待编码块的所述参考块进行运动补偿之后得到的块。
通过上述步骤,根据参考块的残差对待编码块的残差进行预测得到预测残差,并将预测残差和实际残差之间的残差差值编码进码流,因为编码器只需要传输当前待编码块的实际残差与预测残差之间的差值,而残差差值的大小远远低于实际残差,所以有效减少了编码残差所需的比特率,达到了节省码率的效果。
关于待编码块的参考块的选取,鉴于视频图像的连续性,选择合适的参考块能够对待编码块的残差进行准确的预测。
在至少一个示例性实施方式中,在所述待编码块所在的图像帧为P帧的情况下,所述待编码块在所述时域上的第一参考块可以包括:所述待编码块在前向参考帧中的最优预测单元块PU,记为第一最优PU;和/或,所述待编码块在所述前向参考帧的前一帧中的最优PU,记为第二最优PU。
在所述待编码块所在的图像帧为B帧的情况下,所述待编码块在所述时域上的第一参考块包括:所述待编码块在前向参考帧中的最优PU,记为第三最优PU;和/或,所述待编码块在后向参考帧中的最优PU,记为第四最优PU。
在至少一个示例性实施方式中,在所述待编码块所在的图像帧为P帧的情况下,所述待编码块在所述时域上的第一参考块可以通过以下方式确定:通过运动估计在所述前向参考帧中确定所述待编码块的所述第一最优PU,并确定所述待编码块相对于所述第一最优PU的运动矢量MV;根据所述第一最优PU的位置和所述MV,通过运动估计在所述前向参考帧的前一帧中确定所述第二最优PU。
在至少一个示例性实施方式中,所述待编码块在所述空域上的第二参考块位于所述待编码块所在的图像帧中,且与所述待编码块在所述空域上相邻。
在至少一个示例性实施方式中,所述待编码块在所述空域上的 第二参考块包括:所述待编码块所在的图像帧中,与所述待编码块相邻的左边的块和/或上边的块(例如,可以是具有相同尺寸的空域上相邻的左边和/或上边的块)。
本领域技术人员应当理解,在实际应用中,可以通过上述方式确定待编码块在时域上的第一参考块,和/或,待编码块在空域上的第二参考块。时域上的第一参考块所对应的残差可以称为时域参考残差块,空域上的第二参考块所对应的残差可以称为空域参考残差块,基于至少一个时域参考残差块和/或至少一个空域参考残差块,可以预测出待编码块的残差信息。
在一些实施方式中,为了数据能够对称考虑,可以选取两个时域参考残差块和两个空域参考残差块,并基于这四个参考残差块进行残差预测。此时,所述待编码块的所述参考块包括:所述待编码块在所述时域上的对应的残差为非全零的两个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的两个第二参考块。
在一些实施方式中,在仅有一个时域参考残差块和一个空域参考残差块为非全零块的时候,可以选取一个时域参考残差块和一个空域参考残差块,并基于这两个参考残差块进行残差预测。此时,所述待编码块的所述参考块包括:所述待编码块在所述时域上的对应的残差为非全零的一个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的一个第二参考块。
在一些实施方式中,在仅有两个时域参考残差块和一个空域参考残差块为非全零块的时候,可以选取两个时域参考残差块和一个空域参考残差块,并基于这三个参考残差块进行残差预测。此时,所述待编码块的所述参考块包括:所述待编码块在所述时域上的对应的残差为非全零的两个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的一个第二参考块。
在一些实施方式中,在仅有一个时域参考残差块和两个空域参考残差块为非全零块的时候,可以选取一个时域参考残差块和两个空域参考残差块,并基于这三个参考残差块进行残差预测。此时,所述待编码块的所述参考块包括:所述待编码块在所述时域上的对应的残 差为非全零的一个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的两个第二参考块。
在一些实施方式中,在仅有时域上两个时域参考残差块是非全零块的时候,可以选取两个时域参考残差块,并基于这两个时域参考残差块进行残差预测。此时,所述待编码块的所述参考块包括:所述待编码块在所述时域上的对应的残差为非全零的两个第一参考块。
在一些实施方式中,在仅有空域上两个参考残差块是非全零块的时候,可以选取两个空域参考残差块,并基于这两个空域参考残差块进行残差预测。此时,所述待编码块的所述参考块包括:所述待编码块在所述空域上的对应的残差为非全零的两个第二参考块。
基于参考块的残差对待编码块的残差进行预测的过程,可以采用多种预测方式来实现。在至少一个示例性实施方式中,基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差的方式可以包括以下之一:
(1)将所述参考块的所述残差输入残差预测模型得到所述待编码块的预测残差,其中,所述残差预测模型是可以用深度学习网络训练的残差预测模型,也就是说,其是基于训练样本采用深度学习网络训练得到的,所述训练样本包括:具有已知残差的编码块的参考块的残差、以及所述具有已知残差的编码块的实际残差;
(2)对所述参考块的所述残差进行线性加权,得到所述待编码块的预测残差,其中,所述线性加权包括单一权重的线性加权或多权重的线性加权。
例如单一权重的线性加权和可以采用以下公式计算:
ReiPred(i,j)=W 1ResiA(i,j)+W 2ResiB(i,j)
W 1和W 2为权重,ResiA(i,j),ResiB(i,j)是选取的参考残差块在像素点(i,j)处的像素值,ReiPred(i,j)是预测残差块在像素点(i,j)处的像素值。
例如多权重的线性加权和可以采用以下公式计算:
Figure PCTCN2020100558-appb-000001
W1 ij和W2 ij是参考残差块每一个像素点对应的权重值,它们可 以通过训练获得。ResiA(i,j),ResiB(i,j)是选取的参考残差块在像素点(i,j)处的像素值,ReiPred(i,j)是预测残差块在像素点(i,j)处的像素值。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可存储在存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使终端设备(可以是手机、计算机、服务器、或者网络设备等)执行本公开的实施例所述的方法。
在本实施例中,对应于上述残差编码方法提供了一种残差编码装置,用于实现上述残差编码方法,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以是实现预定功能的软件和/或硬件的组合。尽管以下所描述的装置较佳地以软件来实现,但是硬件、或者软件和硬件的组合的实现也是可能并被构想的。
图3是根据本公开实施例1的残差编码装置的结构框图,如图3所示,该装置包括:
编码器端参考残差确定模块32,设置为确定待编码块的参考块的残差,其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;
编码器端残差预测模块34,设置为基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;
残差编码模块36,设置为将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同 一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
本实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行本实施例中以上残差编码方法中的步骤。
在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,确定待编码块的参考块的残差(在本公开实施例中,这一术语也被称为残差块、参考残差、参考残差块),其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;
S2,基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;
S3,将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流(例如,视频帧间编码码流)。
在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本公开的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行本实施例中上述残差编码方法的步骤。
在一些实施方式中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,确定待编码块的参考块的残差(在本公开实施例中,这一 术语也被称为残差块、参考残差、参考残差块),其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;
S2,基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;
S3,将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流(例如,视频帧间编码码流)。
本实施例中的具体示例可以参考上述的示例,在此不再赘述。
实施例2
在本实施例中提供了一种残差解码方法,图4是根据本公开实施例2的残差解码方法的流程图,如图4所示,该流程包括如下步骤:
步骤S402,基于码流(例如,视频帧间编码码流)中的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差(在本公开实施例中,这一术语也被称为残差块、参考残差、参考残差块),其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;
步骤S404,基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;
步骤S406,根据所述待解码块的预测残差和从所述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差。
在步骤S402中所提及的参考块的残差,指的是在编码过程中按照传统方法得到的参考块本身的实际编码残差,因为参考块已经完成编码,所以参考块的残差是可以通过技术手段可以获得的。
在步骤S402中,所述待解码块的所述预测块是对所述待解码块进行运动补偿之后得到的块。
在步骤S406中所提及的所述待解码块的所述实际残差为所述待解码块的原始图像块的像素值与所述待解码块的预测块的像素值之间的差值。
通过上述步骤,由于在解码器端布局有与编码器端相同的残差预测过程,解码器端基于待解码块的参考块的残差预测待解码块的残差,并根据预测残差和码流中携带的残差差值确定待解码块的实际残差,能够保证在解码器端,基于节省码率的码流仍然能够正确恢复得到待解码块的实际残差。
在至少一个示例性实施方式中,在所述待解码块所在的图像帧为P帧的情况下,所述待解码块在所述时域上的第一参考块包括:所述待解码块在前向参考帧中的最优预测单元块PU,记为第一最优PU;和/或,所述待解码块在所述前向参考帧的前一帧中的最优PU,记为第二最优PU。
在至少一个示例性实施方式中,在所述待解码块所在的图像帧为B帧的情况下,所述待解码块在所述时域上的第一参考块包括:所述待解码块在前向参考帧中的最优PU,记为第三最优PU;和/或,所述待解码块在后向参考帧中的最优PU,记为第四最优PU。
需要说明的是,为了保证残差预测的准确性,在解码器端采用和编码器端相同的参考块选取方式,以保证和编码器端通过残差预测所得到的预测残差相一致。
在至少一个示例性实施方式中,在所述待解码块所在的图像帧为P帧的情况下,所述待解码块在所述时域上的第一参考块通过以下方式确定:根据所述码流中解析出来的所述MV,确定所述待解码块在所述前向参考帧中的所述第一最优PU;根据所述第一最优PU在所述前向参考帧的前一帧中的同位PU,通过运动估计在所述前向参考帧的前一帧中确定所述第二最优PU。
在至少一个示例性实施方式中,所述待解码块在所述空域上的第二参考块位于所述待解码块所在的图像帧中,且与所述待解码块在所述空域上相邻。
在至少一个示例性实施方式中,所述待解码块在所述空域上的 第二参考块包括:所述待解码块所在的图像帧中,与所述待解码块相邻的左边的块和/或上边的块(例如可以是具有相同尺寸的空域上相邻的左边和/或上边的块)。
本领域技术人员应当理解,在实际应用中,可以通过上述方式确定待解码块在时域上的第一参考块,和/或,待解码块在空域上的第二参考块。时域上的第一参考块所对应的残差可以称为时域参考残差块,空域上的第二参考块所对应的残差可以称为空域参考残差块,基于至少一个时域参考残差块和/或至少一个空域参考残差块,可以预测出待解码块的残差信息。
在一些实施方式中,为了数据能够对称考虑,可以选取两个时域参考残差块和两个空域参考残差块,并基于这四个参考残差块进行残差预测,此时,所述待解码块的所述参考块包括:所述待解码块在所述时域上的对应的残差为非全零的两个第一参考块,和所述待解码块在所述空域上的对应的残差为非全零的两个第二参考块。
在一些实施方式中,在仅有一个时域参考残差块和一个空域参考残差块为非全零块的时候,可以选取一个时域参考残差块和一个空域参考残差块,并基于这两个参考残差块进行残差预测。此时,所述待解码块的所述参考块包括:所述待解码块在所述时域上的对应的残差为非全零的一个第一参考块,和所述待解码块在所述空域上的对应的残差为非全零的一个第二参考块。
在一些实施方式中,在仅有两个时域参考残差块和一个空域参考残差块为非全零块的时候,可以选取两个时域参考残差块和一个空域参考残差块,并基于这三个参考残差块进行残差预测。此时,所述待解码块的所述参考块包括:所述待解码块在所述时域上的对应的残差为非全零的两个第一参考块,和所述待解码块在所述空域上的对应的残差为非全零的一个第二参考块。
在一些实施方式中,在仅有一个时域参考残差块和两个空域参考残差块为非全零块的时候,可以选取一个时域参考残差块和两个空域参考残差块,并基于这三个参考残差块进行残差预测。此时,所述待解码块的所述参考块包括:所述待解码块在所述时域上的对应的残 差为非全零的一个第一参考块,和所述待解码块在所述空域上的对应的残差为非全零的两个第二参考块。
在一些实施方式中,在仅有时域上两个时域参考残差块是非全零块的时候,可以选取两个时域参考残差块,并基于这两个时域参考残差块进行残差预测。此时,所述待解码块的所述参考块包括:所述待解码块在所述时域上的对应的残差为非全零的两个第一参考块。
在一些实施方式中,在仅有空域上两个参考残差块是非全零块的时候,可以选取两个空域参考残差块,并基于这两个空域参考残差块进行残差预测。此时,所述待解码块的所述参考块包括:所述待解码块在所述空域上的对应的残差为非全零的两个第二参考块。
基于参考块的残差对待解码块的残差进行预测的过程,可以采用多种预测方式来实现。在至少一个示例性实施方式中,基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差的方式可以包括以下之一:
(1)将所述参考块的所述残差输入残差预测模型得到所述待解码块的预测残差,其中,所述残差预测模型与编码器端的残差预测模型相同;
(2)对所述参考块的所述残差进行线性加权,得到所述待解码块的预测残差,其中,所述线性加权与编码器端的线性加权相同。
例如单一权重的线性加权和可以采用以下公式计算:
ReiPred(i,j)=W 1ResiA(i,j)+W 2ResiB(i,j)
W 1和W 2为权重,ResiA(i,j),ResiB(i,j)是选取的参考残差块在像素点(i,j)处的像素值,ReiPred(i,j)是预测残差块在像素点(i,j)处的像素值。
例如多权重的线性加权和可以采用以下公式计算:
Figure PCTCN2020100558-appb-000002
W1 ij和W2 ij是参考残差块每一个像素点对应的权重值,它们可以通过训练获得。ResiA(i,j),ResiB(i,j)是选取的参考残差块在像素点(i,j)处的像素值,ReiPred(i,j)是预测残差块在像素点(i,j)处的像素值。
在至少一个示例性实施方式中,在根据所述待解码块的预测残差和从所述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差之后,还包括:在所述待解码块的所述预测块的基础上加上所述实际残差,恢复出所述待解码块的原始图像块。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可存储在存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得终端设备(可以是手机、计算机、服务器、或者网络设备等)执行本公开的实施例所述的方法。
在本实施例中,对应于上述残差解码方法提供了一种残差解码装置,用于实现上述残差解码方法,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下所描述的装置较佳地以软件来实现,但是硬件、或者软件和硬件的组合的实现也是可能并被构想的。
图5是根据本公开实施例2的残差解码装置的结构框图,如图5所示,该装置包括:
解码器端参考残差确定模块52,设置为基于码流中解析出来的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差,其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;
解码器端残差预测模块54,设置为基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;
残差解码模块56,设置为根据所述待解码块的预测残差和从所 述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述各个模块以任意组合的形式分别位于不同的处理器中。
本实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行本实施例中上述残差解码方法中的步骤。
在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的计算机程序:
S1,基于码流(例如,视频帧间编码码流)中的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差(在本公开实施例中,这一术语也被称为残差块、参考残差、参考残差块),其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;
S2,基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;
S3,根据所述待解码块的预测残差和从所述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差。
在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储计算机程序的介质。
本实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行本实施例中上述残差解码方法的步骤。
在一些实施方式中,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。
在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:
S1,基于码流(例如,视频帧间编码码流)中的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差(在本公开实施例中,这一术语也被称为残差块、参考残差、参考残差块),其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;
S2,基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;
S3,根据所述待解码块的预测残差和从所述码流中解析出的所述待解码块的预测残差和所述待解码块的实际残差之间的残差差值,确定所述待解码块的实际残差。
本实施例中的具体示例可以参考上述的示例,在此不再赘述。
以下实施例3以通过深度学习网络训练残差预测模型(也可以称为残差预测模块)为例,详细描述了残差编解码方案的具体实施方式。需要说明的是,针对采用线性加权和的方式来进行残差预测的方案,整体残差编解码的原理与以下实施例相类似,不再赘述。
实施例3
本实施例针对视频编码的编码单元设计一种利用时空相关性提升视频帧间编码性能的方法,在实际使用中,编码器将调用该方法来完成帧间预测。以下对该方法中所涉及的内容进行详细的描述。
(1)残差预测模块
残差预测模块可以通过卷积神经网络生成。图6是根据本公开实施例3的用于生成残差预测模块的卷积神经网络的示意图,如图6所示,该卷积神经网络以当前编码块相关的两个时域参考残差块和两 个空域参考残差块作为输入(也可以是其他数量的时域参考残差块和空域参考残差块,还可以只有时域参考残差块或只有空域参考残差块),经过卷积、池化提取残差图像的特征信息,再通过反卷积过程输出预测残差。
卷积神经网络的输入是四个参考残差块,时域参考帧中选取两个参考残差块,相邻空域中选取两个参考残差块。
在时域上,根据P帧和B帧的预测方式的不同进行不同的操作。图7是根据本公开实施例3的P帧和B帧情况下时域参考残差块的选取示意图。
如图7所示,对于P帧,由于它是单向预测,只有一个前向参考帧,首先通过运动估计在前向参考帧中找到最优的预测单元块(Prediction Unit,PU),生成当前编码块的运动矢量(motion vector,MV)。然后,基于当前编码块的最优参考PU的位置和当前编码块的MV,从前向参考帧的前一帧中通过运动估计又搜索到另一个最优参考PU,将这两个最优参考PU所对应的残差块作为P帧的两个时域参考残差块输入。
对于B帧,有一个前向参考帧和后向参考帧,在前向参考帧和后向参考帧中通过运动估计各找到一个最优PU的残差块作为时域参考残差块输入。
在空域上,图8是根据本公开实施例3的空域参考残差块的选取示意图。如图8所示,选取与当前块相邻并同尺寸的左边、上边的块作为参考块,将参考块对应的残差块作为空域参考残差块的输入。
残差预测模块的输出为当前块的预测残差块,在编码器端,预测残差块可以与实际残差块相减得到编码残差。在解码器端,预测残差块加上编码残差可以恢复出实际残差块。
(2)基于时空相关性的帧间编码块残差预测方法
图9是本公开实施例3的一种利用时空相关性提升视频帧间编码性能的方法示意图,在该图中,以残差预测模块通过神经网络实现为例,示出了编码器端和解码器端的的残差编码和解码操作,以下分别进行详细的描述。
编码器端的操作包括以下四步。
第一步,预编码通测标准提供的视频序列,提取四个参考块的残差数据和当前块的残差数据作为训练数据。用图6搭建的深度神经网络进行网络训练,然后将训练好的深度神经网络嵌入编码器的帧间预测中。
第二步,从输入的视频图像中读取需要编码的待编码块,通过运动估计预测得到待编码块的预测块,用原始图像像素值减去预测块像素值得到传统残差块。
第三步,根据P帧和B帧预测方式的不同,通过运动估计得到的运动矢量MV搜索到如图7所示的两个时域参考残差块,然后根据图8得到两个空域参考残差块,将得到的四个参考残差块输入残差预测模块中得到预测残差块。
第四步,将传统残差块和预测残差块之间的差值作为当前待编码块的编码残差,写入码流文件中。在本步骤中,传统残差块为当前编码块按传统帧间预测方法得到的待编码残差值,预测残差为通过残差预测模块预测得到的残差值。
综上,可以看到,在编码器端,利用运动估计在参考帧中找到参考块,进行运动补偿操作之后,得到了预测块,原始块与预测块的差值即为传统帧间预测模式下的实际残差块。在这一过程中建立残差预测模块来预测当前块的残差。残差预测模块的输入为当前编码块的参考块所对应的时域和空域上各两个参考残差块,残差预测模块的输出则为当前块的预测残差,实际残差块与预测残差块的差值作为编码残差,进行后续的变换、量化和熵编码。
解码器端的操作包括以下三步。
第一步,将残差预测模块嵌入解码器帧间预测中。
第二步,通过码流解析出来的运动矢量MV得到当前待解码块的预测块,然后找到如图7所示的两个时域参考残差块,以及得到如图8所示的两个空域参考残差块,将得到的四个参考残差块输入残差预测模块中得到预测残差块。在本步骤中,对应当前块的四个参考残差块的产生方式与编码器端的四个参考残差块的产生方式一致。
第三步,从码流中读取待解码块的编码残差,然后编码残差加上预测残差块恢复待解码块的实际残差,最后再加上预测块恢复出原始图像块。
综上,可以看到,在解码器端的帧间预测解码操作中,建立与编码器相同的残差预测模块来预测当前块的残差。残差预测模块的输入为当前解码块的参考块所对应的时域和空域上各两个参考残差块,参考块的寻找方式与编码器端相同,输出为当前解码块的预测残差,与解码后的残差和待解码块的预测块相加即可完成当前块的重建工作。
显然,本领域的技术人员应该明白,上述的本公开的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,在一些实施方式中,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本公开不限制于任何特定的硬件和软件结合。
以上所述仅为本公开的示例实施例而已,并不用于限制本公开,对于本领域的技术人员来说,本公开可以有各种更改和变化。凡在本公开的原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (19)

  1. 一种残差编码方法,包括:
    确定待编码块的参考块的残差,其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;
    基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;
    将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流。
  2. 根据权利要求1所述的方法,其中,
    在所述待编码块所在的图像帧为P帧的情况下,所述待编码块在所述时域上的第一参考块包括:所述待编码块在前向参考帧中的最优预测单元块PU,记为第一最优PU;和/或,所述待编码块在所述前向参考帧的前一帧中的最优PU,记为第二最优PU;
    或者,
    在所述待编码块所在的图像帧为B帧的情况下,所述待编码块在所述时域上的第一参考块包括:所述待编码块在前向参考帧中的最优PU,记为第三最优PU;和/或,所述待编码块在后向参考帧中的最优PU,记为第四最优PU。
  3. 根据权利要求2所述的方法,其中,在所述待编码块所在的图像帧为P帧的情况下,所述待编码块在所述时域上的第一参考块通过以下方式确定:
    通过运动估计在所述前向参考帧中确定所述待编码块的所述第一最优PU,并确定所述待编码块相对于所述第一最优PU的运动矢量MV;
    根据所述第一最优PU的位置和所述MV,通过运动估计在所述前向参考帧的前一帧中确定所述第二最优PU。
  4. 根据权利要求1所述的方法,其中,所述待编码块在所述空域上的第二参考块位于所述待编码块所在的图像帧中,且与所述待编码块在所述空域上相邻。
  5. 根据权利要求4所述的方法,其中,所述待编码块在所述空域上的第二参考块包括:所述待编码块所在的图像帧中,与所述待编码块相邻的左边的块和/或上边的块。
  6. 根据权利要求1-5中任一项所述的方法,其中,所述待编码块的所述参考块包括:
    所述待编码块在所述时域上的对应的残差为非全零的两个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的两个第二参考块;或者,
    所述待编码块在所述时域上的对应的残差为非全零的一个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的一个第二参考块;或者,
    所述待编码块在所述时域上的对应的残差为非全零的两个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的一个第二参考块;或者,
    所述待编码块在所述时域上的对应的残差为非全零的一个第一参考块,和所述待编码块在所述空域上的对应的残差为非全零的两个第二参考块;或者,
    所述待编码块在所述时域上的对应的残差为非全零的两个第一参考块;或者,
    所述待编码块在所述空域上的对应的残差为非全零的两个第二参考块。
  7. 根据权利要求1-5中任一项所述的方法,其中,基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差包括:
    将所述参考块的所述残差输入残差预测模型得到所述待编码块的预测残差,其中,所述残差预测模型是基于训练样本采用深度学习网络训练得到的,所述训练样本包括:具有已知残差的编码块的参考块的残差、以及所述具有已知残差的编码块的实际残差;
    或者,
    对所述参考块的所述残差进行线性加权,得到所述待编码块的预测残差,其中,所述线性加权包括单一权重的线性加权或多权重的线性加权。
  8. 根据权利要求1-5中任一项所述的方法,其中,所述待编码块的所述实际残差为所述待编码块的原始图像块的像素值与所述待编码块的预测块的像素值之间的差值,其中,所述待编码块的所述预测块是对所述编码块的所述参考块进行运动补偿之后得到的块。
  9. 一种残差解码方法,包括:
    基于码流中解析出来的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差,其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;
    基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;
    根据所述待解码块的所述预测残差和从所述码流中解析出的残差差值,确定所述待解码块的实际残差,其中,所述残差差值是所述待解码块的所述预测残差和所述待解码块的实际残差之间的差值。
  10. 根据权利要求9所述的方法,其中,
    在所述待解码块所在的图像帧为P帧的情况下,所述待解码块在所述时域上的第一参考块包括:所述待解码块在前向参考帧中的最优预测单元块PU,记为第一最优PU;和/或,所述待解码块在所述前向参考帧的前一帧中的最优PU,记为第二最优PU;
    或者,
    在所述待解码块所在的图像帧为B帧的情况下,所述待解码块在所述时域上的第一参考块包括:所述待解码块在前向参考帧中的最优PU,记为第三最优PU;和/或,所述待解码块在后向参考帧中的最优PU,记为第四最优PU。
  11. 根据权利要求10所述的方法,其中,在所述待解码块所在的图像帧为P帧的情况下,所述待解码块在所述时域上的第一参考块通过以下方式确定:
    根据所述码流中解析出来的所述MV,确定所述待解码块在所述前向参考帧中的所述第一最优PU;
    根据所述第一最优PU在所述前向参考帧的前一帧中的同位PU,通过运动估计在所述前向参考帧的前一帧中确定所述第二最优PU。
  12. 根据权利要求9所述的方法,其中,所述待解码块在所述空域上的第二参考块位于所述待解码块所在的图像帧中,且与所述待解码块在所述空域上相邻。
  13. 根据权利要求12所述的方法,其中,所述待解码块在所述空域上的第二参考块包括:所述待解码块所在的图像帧中,与所述待解码块相邻的左边的块和/或上边的块。
  14. 根据权利要求9-13中任一项所述的方法,其中,基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差包括:
    将所述参考块的所述残差输入残差预测模型得到所述待解码块的所述预测残差,其中,所述残差预测模型与编码器端的残差预测模型相同;
    或者,
    对所述参考块的所述残差进行线性加权,得到所述待解码块的所述预测残差,其中,所述线性加权与编码器端的线性加权相同。
  15. 根据权利要求9-13中任一项所述的方法,其中,在根据所述待解码块的所述预测残差和从所述码流中解析出的残差差值,确定所述待解码块的实际残差之后,还包括:
    在所述待解码块的所述预测块的基础上加上所述实际残差,恢复出所述待解码块的原始图像块。
  16. 一种残差编码装置,包括:
    编码器端参考残差确定模块,设置为确定待编码块的参考块的残差,其中,所述待编码块的所述参考块包括:所述待编码块在时域上的至少两个第一参考块;或,所述待编码块在空域上的至少两个第二参考块;或,所述待编码块在时域上的至少一个第一参考块以及所述待编码块在空域上的至少一个第二参考块;
    编码器端残差预测模块,设置为基于所述待编码块的所述参考块的残差,对所述待编码块的残差进行预测,得到所述待编码块的预测残差;
    残差编码模块,设置为将所述待编码块的所述预测残差和所述待编码块的实际残差之间的残差差值编码进码流。
  17. 一种残差解码装置,包括:
    解码器端参考残差确定模块,设置为基于码流中解析出来的运动矢量MV确定待解码块的预测块,并获取所述待解码块的参考块的残差,其中,所述待解码块的所述参考块包括:所述待解码块在时域上的至少两个第一参考块;或,所述待解码块在空域上的至少两个第 二参考块;或,所述待解码块在时域上的至少一个第一参考块以及所述待解码块在空域上的至少一个第二参考块;
    解码器端残差预测模块,设置为基于所述待解码块的所述参考块的残差,对所述待解码块的残差进行预测,得到所述待解码块的预测残差;
    残差解码模块,设置为根据所述待解码块的所述预测残差和从所述码流中解析出的残差差值,确定所述待解码块的实际残差,其中,所述残差差值是所述待解码块的所述预测残差和所述待解码块的实际残差之间的差值。
  18. 一种存储介质,其存储有计算机程序,其中,所述计算机程序被设置为运行时执行所述权利要求1至15任一项中所述的方法。
  19. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为运行所述计算机程序以执行所述权利要求1至15任一项中所述的方法。
PCT/CN2020/100558 2019-07-22 2020-07-07 残差编码、解码方法及装置、存储介质及电子装置 WO2021012942A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910663278.X 2019-07-22
CN201910663278.XA CN112261409A (zh) 2019-07-22 2019-07-22 残差编码、解码方法及装置、存储介质及电子装置

Publications (1)

Publication Number Publication Date
WO2021012942A1 true WO2021012942A1 (zh) 2021-01-28

Family

ID=74193180

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/100558 WO2021012942A1 (zh) 2019-07-22 2020-07-07 残差编码、解码方法及装置、存储介质及电子装置

Country Status (2)

Country Link
CN (1) CN112261409A (zh)
WO (1) WO2021012942A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115695812A (zh) * 2021-07-30 2023-02-03 中兴通讯股份有限公司 视频编码、视频解码方法、装置、电子设备和存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007024106A1 (en) * 2005-08-24 2007-03-01 Samsung Electronics Co., Ltd. Method for enhancing performance of residual prediction and video encoder and decoder using the same
CN101228796A (zh) * 2005-07-21 2008-07-23 三星电子株式会社 根据方向帧内残余预测对视频信号编码和解码的方法和设备
CN101478672A (zh) * 2008-01-04 2009-07-08 华为技术有限公司 视频编码、解码方法及装置和视频处理系统
CN102148989A (zh) * 2011-04-22 2011-08-10 西安交通大学 一种h.264中全零块检测的方法
GB2506853A (en) * 2012-09-28 2014-04-16 Canon Kk Image Encoding / Decoding Including Determination of Second Order Residual as Difference Between an Enhancement and Reference Layer Residuals
CN103916672A (zh) * 2014-03-21 2014-07-09 华为技术有限公司 一种数据编解码方法、相关装置及系统
CN106063272A (zh) * 2014-01-02 2016-10-26 世宗大学校产学协力团 编码多视图视频的方法及其设备和解码多视图视频的方法及其设备
CN109121465A (zh) * 2016-05-06 2019-01-01 Vid拓展公司 用于运动补偿残差预测的系统和方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101228796A (zh) * 2005-07-21 2008-07-23 三星电子株式会社 根据方向帧内残余预测对视频信号编码和解码的方法和设备
WO2007024106A1 (en) * 2005-08-24 2007-03-01 Samsung Electronics Co., Ltd. Method for enhancing performance of residual prediction and video encoder and decoder using the same
CN101478672A (zh) * 2008-01-04 2009-07-08 华为技术有限公司 视频编码、解码方法及装置和视频处理系统
CN102148989A (zh) * 2011-04-22 2011-08-10 西安交通大学 一种h.264中全零块检测的方法
GB2506853A (en) * 2012-09-28 2014-04-16 Canon Kk Image Encoding / Decoding Including Determination of Second Order Residual as Difference Between an Enhancement and Reference Layer Residuals
CN106063272A (zh) * 2014-01-02 2016-10-26 世宗大学校产学协力团 编码多视图视频的方法及其设备和解码多视图视频的方法及其设备
CN103916672A (zh) * 2014-03-21 2014-07-09 华为技术有限公司 一种数据编解码方法、相关装置及系统
CN109121465A (zh) * 2016-05-06 2019-01-01 Vid拓展公司 用于运动补偿残差预测的系统和方法

Also Published As

Publication number Publication date
CN112261409A (zh) 2021-01-22

Similar Documents

Publication Publication Date Title
Hu et al. Learning end-to-end lossy image compression: A benchmark
US20220353534A1 (en) Transform Kernel Selection and Entropy Coding
TWI536811B (zh) 影像處理方法與系統、解碼方法、編碼器與解碼器
JP2022105007A (ja) ビデオ圧縮における複数ラインのフレーム内予測のための方法および装置
WO2016131229A1 (zh) 用于视频图像编码和解码的方法、编码设备和解码设备
US10091526B2 (en) Method and apparatus for motion vector encoding/decoding using spatial division, and method and apparatus for image encoding/decoding using same
US20140098856A1 (en) Lossless video coding with sub-frame level optimal quantization values
WO2013113217A1 (zh) 解码方法和装置
Abou-Elailah et al. Fusion of global and local motion estimation for distributed video coding
US10271062B2 (en) Motion vector prediction through scaling
US10194147B2 (en) DC coefficient sign coding scheme
US10681374B2 (en) Diversified motion using multiple global motion models
US20230239464A1 (en) Video processing method with partial picture replacement
WO2020159982A1 (en) Shape adaptive discrete cosine transform for geometric partitioning with an adaptive number of regions
JP7448558B2 (ja) 画像エンコーディングおよびデコーディングのための方法およびデバイス
WO2021012942A1 (zh) 残差编码、解码方法及装置、存储介质及电子装置
Hu et al. Complexity-guided slimmable decoder for efficient deep video compression
JP2024045388A (ja) 画像デコーディング方法、デコーダ及びコンピューター記憶媒体
US9210424B1 (en) Adaptive prediction block size in video coding
JP7437426B2 (ja) インター予測方法および装置、機器、記憶媒体
WO2013064114A1 (zh) 一种图像块信号分量采样点的帧内解码方法和装置
Guo et al. Enhanced motion compensation for deep video compression
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
CN114449286A (zh) 一种视频编码方法、解码方法及装置
KR102335184B1 (ko) 복합 모션-보상 예측

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20844788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20844788

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20844788

Country of ref document: EP

Kind code of ref document: A1