WO2022088631A1 - 图像编码方法、图像解码方法及相关装置 - Google Patents

图像编码方法、图像解码方法及相关装置 Download PDF

Info

Publication number
WO2022088631A1
WO2022088631A1 PCT/CN2021/090270 CN2021090270W WO2022088631A1 WO 2022088631 A1 WO2022088631 A1 WO 2022088631A1 CN 2021090270 W CN2021090270 W CN 2021090270W WO 2022088631 A1 WO2022088631 A1 WO 2022088631A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
residual
prediction
current
current coding
Prior art date
Application number
PCT/CN2021/090270
Other languages
English (en)
French (fr)
Inventor
马展
刘浩杰
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022088631A1 publication Critical patent/WO2022088631A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/94Vector quantisation

Definitions

  • the present application relates to the technical field of electronic devices, and in particular, to an image encoding method, an image decoding method, and related apparatuses.
  • Digital video capabilities can be incorporated into a wide range of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-books Readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones, video conferencing devices, video streaming devices, and the like.
  • PDAs personal digital assistants
  • laptop or desktop computers tablet computers
  • e-books Readers digital cameras
  • digital recording devices digital media players
  • video game devices video game consoles
  • cellular or satellite radio telephones video conferencing devices
  • video streaming devices and the like.
  • Digital video devices implement video compression techniques, such as those provided by MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 Advanced Video Coding (AVC), ITU- A standard defined by the TH.265 high efficiency video coding (HEVC) standard and those video compression techniques described in extensions to the standard to transmit and receive digital video information more efficiently.
  • Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing these video codec techniques.
  • the embodiments of the present application provide an image encoding method, an image decoding method, and a related device, so as to realize adaptive dynamic residual error compensation and effectively encode different forms of inter-frame residual information.
  • an embodiment of the present application provides an image encoding method, including:
  • the current coding block includes a currently processed video frame or a coding unit obtained by dividing the currently processed video frame;
  • a binary code stream of the current coding block is generated using the probability of each pixel.
  • the solution of the present application performs adaptive dynamic residual compensation on the current prediction frame and obtains the final inter-frame reconstruction, which can effectively encode different forms of inter-frame residual information.
  • an embodiment of the present application provides an image decoding method, including:
  • the current decoding block includes a code stream of a currently processed video frame or a decoding unit obtained by dividing the currently processed video frame;
  • a reconstructed block of the current decoding block is determined according to the residual block and the prediction block of the current decoding block.
  • the solution of the present application performs adaptive dynamic residual compensation on the current prediction frame and obtains the final inter-frame reconstruction, which can effectively encode different forms of inter-frame residual information.
  • an embodiment of the present application provides an image encoding apparatus, including:
  • an obtaining unit configured to obtain an original residual block of a current coding block, where the current coding block includes a currently processed video frame or a coding unit obtained by dividing the currently processed video frame;
  • the first prediction unit is used to obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model;
  • a quantization unit configured to quantize the transform feature of the current coding block to obtain the quantization feature of the current coding block
  • the second prediction unit is used to determine the probability of each pixel in the quantization feature of the current coding block through a pre-trained probability prediction model
  • a generating unit configured to generate a binary code stream of the current coding block by using the probability of each pixel.
  • an image decoding apparatus including:
  • an acquisition unit configured to acquire a binary code stream of a current decoding block, where the current decoding block includes a code stream of a currently processed video frame or a decoding unit obtained by dividing the currently processed video frame;
  • the first prediction unit is used to transform the binary code stream into the quantized feature of the current decoding block through a pre-trained probability prediction model
  • a second prediction unit configured to determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model
  • a determination unit configured to determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.
  • an embodiment of the present application provides an encoder, including: a processor and a memory coupled to the processor; the processor is configured to execute the method described in the first aspect.
  • an embodiment of the present application provides a decoder, including: a processor and a memory coupled to the processor; the processor is configured to execute the method described in the second aspect above.
  • an embodiment of the present application provides a terminal, where the terminal includes: one or more processors, a memory, and a communication interface; the memory and the communication interface are connected to the one or more processors; The terminal communicates with other devices through the communication interface, and the memory is used for storing computer program code, the computer program code including instructions, when the one or more processors execute the instructions, the terminal executes The method of the first aspect or the second aspect.
  • embodiments of the present application provide a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer is made to execute the first aspect or the second the method described in the aspect.
  • an embodiment of the present application provides a computer program product containing instructions, when the instructions are run on a computer, the instructions cause the computer to execute the method described in the first aspect or the second aspect.
  • FIG. 1 is a schematic block diagram of a coding tree unit in an embodiment of the application
  • FIG. 2 is a schematic block diagram of a CTU and a coding block CU in an embodiment of the present application
  • FIG. 3 is a schematic block diagram of a color format in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an image division manner in an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of an encoding and decoding system in an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a video encoder in an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a video decoder in an embodiment of the present application.
  • FIG. 8A is a schematic flowchart of an image encoding method in an embodiment of the present application.
  • 8B is a schematic diagram of a residual map generated after processing with different thresholds in an embodiment of the present application.
  • 8C is a structural diagram of a feature prediction model in an embodiment of the present application.
  • 9A is a schematic flowchart of an image decoding method in an embodiment of the present application.
  • 9B is a structural diagram of a residual prediction model in an embodiment of the present application.
  • FIG. 10 is a block diagram of a functional unit of an image encoding apparatus in an embodiment of the application.
  • FIG. 11 is a block diagram of another functional unit of the image encoding apparatus in the embodiment of the application.
  • FIG. 12 is a block diagram of a functional unit of an image decoding apparatus in an embodiment of the application.
  • FIG. 13 is a block diagram of another functional unit of the image decoding apparatus in the embodiment of the present application.
  • first, second, etc. may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish a first element from another element.
  • a first client could be referred to as a second client, and similarly, a second client could be referred to as a first client, without departing from the scope of this disclosure.
  • Both the first client and the second client are clients, but they are not the same client.
  • a complete image in a video is usually called a "frame", and a video composed of many frames in chronological order is also called a video sequence (Video Sequence).
  • Video Sequence There are a series of redundant information in video sequences, such as spatial redundancy, temporal redundancy, visual redundancy, information entropy redundancy, structural redundancy, knowledge redundancy, and importance redundancy.
  • a video coding (Video Coding) technology is proposed to reduce the storage space and save the transmission bandwidth.
  • Video coding techniques are also known as video compression techniques.
  • video coding technologies mainly include intra-frame prediction, inter-frame prediction, transform and quantization, entropy coding, and deblocking filtering processing.
  • video compression coding standards such as: MPEG-2 and MPEG-4 Part 10 Advanced Video Coding (AVC) formulated by Motion Picture Experts Group (MPEG), H.263, H.264 and H.265 high-efficiency video codec (High Efficiency Video Coding standard, HEVC) formulated by the International Telecommunication Union (International Telecommunication Union-Telecommunication Standardization Sector, ITU-T), the mainstream
  • MPEG-2 and MPEG-4 Part 10 Advanced Video Coding (AVC) formulated by Motion Picture Experts Group (MPEG), H.263, H.264 and H.265 high-efficiency video codec (High Efficiency Video Coding standard, HEVC) formulated by the International Telecommunication Union (International Telecommunication Union-Telecommunication Standardization Sector, ITU-T)
  • HEVC High Efficiency Video Coding standard
  • ITU-T International Telecommunication Union-Telecommunication Standardization Sector
  • Predictive coding Use the data information of previously coded frames to predict the current frame to be coded.
  • the encoding end obtains a predicted value through prediction, and there is a certain residual value between the predicted value and the actual value. If the prediction is more suitable, the predicted value will be closer to the actual value, and the residual value will be smaller, so that the encoding end can greatly reduce the amount of data by encoding the residual value.
  • the decoding end uses the residual value plus the prediction value to restore and reconstruct the original image.
  • predictive coding is divided into two basic types: intra-frame prediction and inter-frame prediction.
  • Inter prediction is a prediction technology based on motion compensation.
  • the main processing is to determine the motion information of the current block, obtain the reference image block from the reference frame of the current block according to the motion information, and generate the predicted image of the current block.
  • the current block is performed using one of forward prediction, backward prediction or bidirectional prediction, the prediction direction is indicated by the inter prediction direction in the motion information, and the reference image block used to predict the current block in the reference frame is relative to the current block.
  • the displacement vector of is indicated by the motion vector in the motion information, and one motion vector corresponds to one reference frame.
  • the inter prediction of an image block can use only one motion vector to generate a predicted image using pixels in one reference frame, which is called unidirectional prediction; it can also use two motion vectors to combine pixels in two reference frames Generate prediction images, called bidirectional prediction. That is, an image block may generally contain one or two motion vectors. For some multi-hypothesis inter prediction techniques, an image block may contain more than two motion vectors.
  • Inter-frame prediction indicates the reference frame (reference frame) through the reference frame index (reference index, ref_idx), and indicates the position offset of the reference block (reference block) of the current block in the reference frame relative to the current block through the motion vector (motion vector, MV). shift.
  • An MV is a two-dimensional vector, including a horizontal displacement component and a vertical displacement component; an MV corresponds to two frames, and each frame has a picture order count (POC), which is used to indicate the display order of the image. , so an MV also corresponds to a POC difference.
  • the POC difference has a linear relationship with the time interval.
  • the scaling of the motion vector usually adopts the scaling method based on the POC difference, which converts the motion vector between one pair of images into the motion vector between the other pair of images.
  • AMVP Advanced motion vector prediction
  • MVD Motion vector difference
  • the reference frame queue used is determined by the inter prediction direction, and the reference frame pointed to by the current block MV is determined by the reference frame index,
  • One MVP in the MVP list is indicated by the motion vector predictor index as the predictor of the MV of the current block, and one MVP is added to one MVD to obtain one MV.
  • Merge/skip mode the merge index is identified in the code stream, and a merge candidate is selected from the merge candidate list according to the merge index. , the motion information of the current block (including prediction direction, reference frame, motion vector) is determined by this merge candidate.
  • merge mode implies that the current block has residual information
  • skip mode implies that the current block has no residual information (or the residual is 0); these two modes derive motion information in ways it's the same.
  • the fusion candidate is specifically a motion information data structure, which includes various information such as inter-frame prediction direction, reference frame, and motion vector.
  • the current block can select the corresponding fusion candidate from the fusion candidate list according to the fusion index (merge index), and use the motion information of the fusion candidate as the motion information of the current block, or use the motion information of the fusion candidate as the motion information of the fusion candidate. After scaling, it is used as the motion information of the current block.
  • the fusion candidate can be the motion information of the image blocks adjacent to the current block, which is called spatial merge candidate; it can also be the motion information of the image block at the corresponding position of the current block in another coded image. information, called temporal merge candidate.
  • the fusion candidate can also be a bi-predictive fusion candidate (bi-predictive merge candidate) composed of forward motion information of one fusion candidate and backward motion information of another fusion candidate, or a motion vector forced A zero motion vector merge candidate that is a 0 vector.
  • the division of the inter-frame prediction unit includes a 2N ⁇ 2N division method (as shown in A in FIG. 4 ), an N ⁇ N division method (as shown in B in FIG. 4 ), and an N ⁇ 2N division method (as shown in FIG. 4 ).
  • C in Figure 4 2N ⁇ N division (as shown in D in Figure 4), 2N ⁇ nD division (as shown in E in Figure 4), 2N ⁇ nU division (as shown in Figure 4 shown in F), nL ⁇ 2N division method (as shown in G in FIG. 4 ), and nR ⁇ 2N division method (as shown in H in FIG. 4 ).
  • N is any positive integer
  • n x ⁇ N, 0 ⁇ x ⁇ 1.
  • the 2N ⁇ 2N division method is to not divide the image block; the N ⁇ N division method is to divide the image block into four sub-image blocks of equal size; the N ⁇ 2N division method is to divide the image block into two equal-sized sub-image blocks.
  • the vertical bisector of is moved down by n, where D indicates that the image division line moves down relative to the vertical bisector of the image block; the 2N ⁇ nU division method is to divide the image block into two upper and lower sub-image blocks, and the image division line is relative to the image block.
  • the vertical bisector of the block is shifted by n, where U means that the image dividing line is shifted relative to the vertical bisector of the image block; the nL ⁇ 2N division method is to divide the image block into two sub-image blocks, left and right, and the image dividing line is relative to the image block.
  • the vertical bisector of the image block is shifted to the left by n, where L indicates that the image division line is moved to the left relative to the vertical bisector of the image block; the nR ⁇ 2N division method is to divide the image block into left and right sub-image blocks, and the image division line is relatively
  • the vertical bisector of the image block is shifted to the right by n, where R represents that the image dividing line is shifted to the right relative to the vertical bisector of the image block.
  • High Efficiency Video Coding standard (HEVC) technology defines coding tree unit (coding tree unit, CTU), coding block (Coding Unit, CU) , Prediction Unit (PU) and Transform Unit (TU).
  • CTU, CU, PU and TU are all image blocks.
  • Coding tree unit CTU an image is composed of multiple CTUs, a CTU usually corresponds to a square image area, including the luminance pixels and chrominance pixels in this image area (or may only contain luminance pixels, or may only contain The CTU also contains syntax elements that indicate how to divide the CTU into at least one coding block (coding unit, CU), and a method for decoding each coding block to obtain a reconstructed image.
  • the image 10 is composed of a plurality of CTUs (including CTU A, CTU B, CTU C, etc.).
  • the encoding information corresponding to a certain CTU contains the luminance value and/or the chrominance value of the pixels in the square image area corresponding to the CTU.
  • the encoding information corresponding to a certain CTU may also contain syntax elements indicating how to divide the CTU into at least one CU, and a method of decoding each CU to obtain a reconstructed image.
  • An image area corresponding to one CTU may include 64 ⁇ 64, 128 ⁇ 128 or 256 ⁇ 256 pixels.
  • a 64x64 pixel CTU contains a rectangular pixel lattice of 64 columns of 64 pixels, each pixel containing a luma component and/or a chrominance component.
  • a CTU can also correspond to a rectangular image area or an image area of other shapes, and an image area corresponding to a CTU can also be an image area with a different number of pixels in the horizontal direction and the number of pixels in the vertical direction, for example, including 64 ⁇ 128 pixels .
  • Coding block CU usually corresponding to an A ⁇ B rectangular area in the image, including A ⁇ B luminance pixels or/and its corresponding chrominance pixels, A is the width of the rectangle, B is the height of the rectangle, A and B can be the same It can also be different.
  • the values of A and B are usually integer powers of 2, such as 128, 64, 32, 16, 8, and 4.
  • the width involved in the embodiments of the present application refers to the length along the X-axis direction (horizontal direction) in the two-dimensional rectangular coordinate system XoY shown in FIG. 1
  • the height refers to the two-dimensional rectangular coordinate system XoY shown in FIG. 1 .
  • the reconstructed image of a CU can be obtained by adding the predicted image and the residual image.
  • the predicted image is generated by intra-frame prediction or inter-frame prediction. Specifically, it can be composed of one or more prediction blocks (PB), and the residual image is generated by
  • the transform coefficients are generated by inverse quantization and inverse transform processing, which may be specifically composed of one or more transform blocks (transform blocks, TB).
  • a CU includes encoding information, and the encoding information includes information such as prediction mode and transform coefficients. According to the encoding information, the CU is subjected to corresponding decoding processing such as prediction, inverse quantization, and inverse transformation, and a reconstructed image corresponding to the CU is generated.
  • the relationship between the coding tree unit CTU and the coding block CU is shown in FIG. 2 .
  • the digital video compression technology works on video sequences whose color coding method is YCbCr, also called YUV, and whose color format is 4:2:0, 4:2:2 or 4:4:4.
  • Y represents the brightness (Luminance or Luma), that is, the grayscale value
  • Cb represents the blue chrominance component
  • Cr represents the red chrominance component
  • U and V represent the chrominance (Chrominance or Chroma), which is used to describe color and saturation.
  • Spend In the color format, 4:2:0 means that every 4 pixels has 4 luminance components and 2 chrominance components (YYYYCbCr), and 4:2:2 means that every 4 pixels has 4 luminance components and 4 chrominance components.
  • YYYYCbCrCbCr represents full pixel display
  • Figure 3 shows the distribution of each component in different color formats, where the circle is the Y component and the triangle is the UV component.
  • the prediction unit PU is the basic unit of intra prediction and inter prediction.
  • the motion information that defines the image block includes inter prediction direction, reference frame, motion vector, etc.
  • the image block that is being encoded is called the current coding block (CCB), and the image block that is being decoded is called the current decoding.
  • Block (current decoding block, CDB) for example, when prediction processing is being performed on an image block, the current coding block or current decoding block is the prediction block; when residual processing is being performed on an image block, the current coding block or current decoding block is Transform block.
  • the picture in which the current coding block or the current decoding block is located is called the current frame.
  • the image blocks located on the left or upper side of the current block may be inside the current frame and have completed the encoding/decoding process to obtain a reconstructed image, which are called reconstructed blocks; the encoding mode of the reconstructed block, reconstruction Information such as pixels is available.
  • a frame that has completed encoding/decoding before the current frame is encoded/decoded is called a reconstructed frame.
  • the current frame is a unidirectional prediction frame (P frame) or a bidirectional prediction frame (B frame)
  • it has one or two reference frame lists respectively, and the two lists are called L0 and L1 respectively, and each list contains at least one reconstructed frame.
  • frame referred to as the reference frame for the current frame.
  • the reference frame provides reference pixels for inter prediction of the current frame.
  • the transform unit TU processes the residuals of the original image block and the predicted image block.
  • Pixels refer to pixels in an image, such as pixels in coding blocks, pixels in luminance component pixel blocks (also known as luminance pixels), and pixels in chrominance component pixel blocks. (also known as chroma pixels), etc.
  • Sample also known as pixel value, sample value refers to the pixel value of a pixel point
  • the pixel value in the luminance component domain specifically refers to the luminance (ie grayscale value)
  • the pixel value in the chrominance component domain specifically refers to Chroma values (ie, color and saturation)
  • the samples of a pixel specifically include original samples, predicted samples and reconstructed samples.
  • Deep neural networks can optimize the entire system end-to-end based on rate-distortion.
  • Convolutional neural network adopts learnable feature transformation, can be differentiated and quantified, and dynamic probability distribution estimation can more efficiently remove redundancy between video images and obtain a more compact feature space representation of video images. Under the same code rate A higher reconstruction quality can be obtained.
  • hardware acceleration and development based on a specific neural network is conducive to further promoting the acceleration and implementation of learning-based encoding and decoding systems.
  • end-to-end intra-frame coding is directly used to process residual information, without considering the particularity of residual information and the uneven distribution after prediction, and there is no embedded residual sparsification method to approximate traditional residual information. skip mode in the encoding method.
  • the embodiments of the present application provide an image encoding method, an encoding method, and a related apparatus.
  • the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
  • FIG. 5 is a block diagram of an example encoding/decoding system 1 described in the embodiments of the application.
  • the encoding/decoding system 1 includes a video encoder 100 and a video decoder 200.
  • the video encoder 100 and the video decoder 200 are used to implement the present invention.
  • the proposed learning-based end-to-end adaptive inter-frame residual coding method is used to implement the present invention.
  • the codec system 1 includes a source device 10 and a destination device 20 .
  • Source device 10 produces encoded video data. Accordingly, source device 10 may be referred to as a video encoding device.
  • Destination device 20 may decode the encoded video data generated by source device 10 . Accordingly, destination device 20 may be referred to as a video decoding device.
  • Various implementations of source device 10, destination device 20, or both may include one or more processors and a memory coupled to the one or more processors.
  • the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that may be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
  • Source device 10 and destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, etc. , televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
  • Link 30 may include one or more media or devices capable of moving encoded video data from source device 10 to destination device 20 .
  • link 30 may include one or more communication media that enable source device 10 to transmit encoded video data directly to destination device 20 in real-time.
  • source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination device 20 .
  • the one or more communication media may include wireless and/or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations, or other equipment that facilitates communication from source device 10 to destination device 20 .
  • the encoded data may be output from output interface 140 to storage device 40 .
  • the image encoding and decoding techniques of the present application can be applied to video encoding and decoding to support a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (eg, via the Internet), for storage in data storage Encoding of video data on media, decoding of video data stored on data storage media, or other applications.
  • codec system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • the codec system 1 illustrated in FIG. 5 is merely an example, and the techniques of this application may be applicable to a video coding setup (eg, video encoding or video decoding) that does not necessarily include any data communication between an encoding device and a decoding device.
  • data is retrieved from local storage, streamed over a network, and the like.
  • a video encoding device may encode and store data to memory, and/or a video decoding device may retrieve and decode data from memory.
  • encoding and decoding is performed by devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.
  • source device 10 includes video source 120 , video encoder 100 , and output interface 140 .
  • output interface 140 may include a modulator/demodulator (modem) and/or a transmitter.
  • Video source 120 may include a video capture device (eg, a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer to generate video data A graphics system, or a combination of such sources of video data.
  • Video encoder 100 may encode video data from video source 120 .
  • source device 10 transmits the encoded video data directly to destination device 20 via output interface 140 .
  • the encoded video data may also be stored on storage device 40 for later access by destination device 20 for decoding and/or playback.
  • destination device 20 includes input interface 240 , video decoder 200 , and display device 220 .
  • input interface 240 includes a receiver and/or a modem.
  • Input interface 240 may receive encoded video data via link 30 and/or from storage device 40 .
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . Generally, display device 220 displays decoded video data.
  • the display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer units or other hardware and software to handle the encoding of both audio and video in a common data stream or separate data streams.
  • Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the application is implemented in part in software, a device may store instructions for the software in a suitable non-volatile computer-readable storage medium, and the instructions may be executed in hardware using one or more processors Thus, the technology of the present application is implemented. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as a combined encoder in the respective device /decoder (codec) part.
  • codec device /decoder
  • FIG. 6 is an exemplary block diagram of a video encoder 100 described in an embodiment of the present application.
  • the video encoder 100 is used to output the video to the post-processing entity 41 .
  • Post-processing entity 41 represents an example of a video entity that can process encoded video data from video encoder 100, such as a Media Aware Network Element (MANE) or a splicing/editing device.
  • MEM Media Aware Network Element
  • post-processing entity 41 may be an instance of a network entity.
  • post-processing entity 41 and video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 41 may be performed by the same device that includes video encoder 100 implement.
  • post-processing entity 41 is an example of storage device 40 of FIG. 1 .
  • video encoder 100 includes prediction processing unit 108 , filter unit 106 , memory 107 , summer 112 , transformer 101 , quantizer 102 , and entropy encoder 103 .
  • the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109 .
  • the video encoder 100 also includes an inverse quantizer 104 , an inverse transformer 105 and a summer 111 .
  • Filter unit 106 represents one or more loop filters, such as deblocking filters, adaptive loop filters (ALF), and sample adaptive offset (SAO) filters.
  • ALF adaptive loop filters
  • SAO sample adaptive offset
  • filter unit 106 is shown in FIG. 6 as an in-loop filter, in other implementations, filter unit 106 may be implemented as a post-loop filter.
  • the video encoder 100 may further include a video data memory and a division unit (not shown in the figure).
  • FIG. 7 is an exemplary block diagram of a video decoder 200 described in an embodiment of the present application.
  • video decoder 200 includes entropy decoder 203 , prediction processing unit 208 , inverse quantizer 204 , inverse transformer 205 , summer 211 , filter unit 206 , and memory 207 .
  • Prediction processing unit 208 may include inter predictor 210 and intra predictor 209 .
  • video decoder 200 may perform a decoding process that is substantially the inverse of the encoding process described with respect to video encoder 100 from FIG. 6 .
  • video decoder 200 receives from video encoder 100 an encoded video codestream representing image blocks of an encoded video slice and associated syntax elements.
  • the video decoder 200 may receive video data from the network entity 42, and optionally, may also store the video data in a video data storage (not shown in the figure).
  • Video data memory may store video data to be decoded by components of video decoder 200, such as an encoded video codestream.
  • the video data stored in the video data store may be obtained, for example, from the storage device 40, from a local video source such as a camera, via wired or wireless network communication of the video data, or by accessing a physical data storage medium.
  • the video data memory may serve as a decoded picture buffer (CPB) for storing encoded video data from the encoded video codestream.
  • CPB decoded picture buffer
  • Network entity 42 may be, for example, a server, a MANE, a video editor/splicer, or other such device for implementing one or more of the techniques described above.
  • Network entity 42 may or may not include a video encoder, such as video encoder 100 .
  • Network entity 42 may implement portions of the techniques described in this application before network entity 42 sends the encoded video stream to video decoder 200 .
  • network entity 42 and video decoder 200 may be part of separate devices, while in other cases, functionality described with respect to network entity 42 may be performed by the same device that includes video decoder 200 .
  • video decoder 200 may generate an output video stream without being processed by filter unit 206; or, for some image blocks or image frames, entropy decoder 203 of video decoder 200 does not decode quantized coefficients, and accordingly Processing by inverse quantizer 204 and inverse transformer 205 is not required.
  • FIG. 8A is a schematic flowchart of an image encoding method in an embodiment of the present application, and the image encoding method may be applied to the source device 10 in the encoding/decoding system 1 shown in FIG. 5 or the video encoder 100 shown in FIG. 6 .
  • the flow shown in FIG. 8A is described by taking the execution subject as the video encoder 100 shown in FIG. 6 as an example.
  • the image coding method provided by the embodiment of the present application includes:
  • Step 110 Obtain an original residual block of a current coding block, where the current coding block includes a currently processed video frame or a coding unit obtained by dividing the currently processed video frame.
  • the division manner of the coding unit includes various division manners as shown in FIG. 4 , which is not uniquely limited here.
  • the processing efficiency of this method is higher, but the accuracy and performance are lost to a certain extent.
  • the current coding block is a coding unit obtained by dividing the currently processed video frame
  • the minimum data processing granularity is the divided coding unit
  • the overall algorithm processing complexity becomes higher and the processing time becomes longer, but The accuracy and performance are relatively high.
  • Step 120 Obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model.
  • the feature prediction model can implement data processing through the image processor GPU of the local device, and can adopt any commonly used neural network architecture, such as deep neural network (DNN), support vector machine, etc.
  • DNN deep neural network
  • the model The input is the residual block and the output is the transformed feature.
  • Step 130 Quantize the transform feature of the current coding block to obtain the quantized feature of the current coding block.
  • Step 140 Determine the probability of each pixel in the quantized feature of the current coding block by using the pre-trained probability prediction model.
  • Step 150 Generate a binary code stream of the current coding block by using the probability of each pixel.
  • the obtaining the original residual block of the current coding block includes: determining the prediction block of the current coding block; comparing the prediction block of the current coding block with the original image block of the current coding block Do the difference to get the original residual block.
  • the prediction block based on the current coding block Perform numerical transformation and quantization to generate (0, 255) discrete distribution from the original (0, 1) continuous floating point distribution Make a difference with the current coding block X t to obtain the integer signal residual r t ,
  • the performing a difference between the prediction block of the current coding block and the original image block of the current coding block to obtain the original residual block includes: according to the prediction block of the current coding block Perform numerical transformation and quantization to generate the discrete distribution of the prediction block; make a difference between the discrete distribution of the prediction block and the original image block of the current coding block to obtain the original residual block of the integer signal.
  • obtaining the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model includes: renormalizing the original residual block, obtaining a normalized first residual block; performing sparse processing on the first residual block to obtain a processed second residual block; inputting the second residual block into the pre-trained feature prediction model to obtain the transform feature of the current coding block.
  • the energy-based renormalization is used to uniformly normalize the residuals of different distributions after prediction to be between (-1, 1).
  • the energy-based normalization can unify the data.
  • the distribution makes training more stable.
  • energy-based renormalization can use other normalization methods, such as 0-1 normalization, linear function normalization, etc.
  • the goal is to unify the residual distribution with large variance after prediction and speed up the model. Training and convergence rates.
  • the threshold sparseness can allocate more code rates in the moving boundary, occlusion and other areas in the end-to-end encoding under the same code rate constraint, saving more code rates required for the background area.
  • the energy-based renormalization can speed up the training and convergence of the model, making the model more robust to different residual distributions.
  • the re-normalizing the original residual block to obtain the normalized first residual block includes: according to an energy unification mechanism, The residual distribution converges to the same distribution space, and the normalized first residual block is obtained.
  • the different residual distributions of the original residual blocks are converged to the same distribution space to obtain a normalized first residual block, including:
  • the original residual block is normalized to the interval (0, 1) by the following formula
  • the performing thinning processing on the first residual block to obtain the processed second residual block includes: acquiring a preset threshold set, where the preset threshold set includes multiple thresholds ; Screen the target threshold for adapting the current coding block from the preset threshold set; traverse the residual samples of each pixel in the first residual block, and compare the residual samples of pixels whose residual samples are less than the target threshold. The residual samples are set to zero to obtain the processed second residual block.
  • the target threshold can be obtained in the following manner: starting from the smallest threshold in the preset threshold set, performing rate-distortion optimization for each threshold at the encoding end to obtain a corresponding result, and selecting the optimal result from the results corresponding to
  • the threshold of is the most suitable threshold for residual coding of the current frame.
  • the rate-distortion optimization for each threshold means that each time a threshold is selected, one encoding and decoding is required to obtain a corresponding result, and the optimal result is selected from the final result.
  • rt represents the pixel value before normalization
  • m1 represents the first threshold in the preset threshold set
  • m n represents the nth threshold in the preset threshold set.
  • the generated residual maps have different sparsity. The larger the threshold, the sparser the residual, and the smaller the residual space interval that needs to be encoded. By traversing the preset threshold set, the most suitable threshold for residual coding of the current frame can be accurately screened, thereby improving coding efficiency.
  • different thresholds are set, and the normalized residuals are sparsed, so that more effective information can be allocated to the effective pixels.
  • threshold-based sparsification is based on the traditional mode selection method, and the skip mode is implemented to adaptively encode the residual information.
  • the threshold sparsification here can be directly operated on the quantized features.
  • threshold sparseness can allocate more code rates in the moving boundary, occlusion and other areas in the end-to-end encoding under the same code rate constraint, saving more code rates required for background areas.
  • each of the plurality of thresholds is obtained by uniformly sampling the pixels of the current coding block according to a preset sampling interval.
  • the value range of the sampling interval is determined by: generating a residual histogram of the numerical distribution according to the residual distribution of the current frame, and obtaining the interval corresponding to the peak part of 1/ ⁇ of the residual distribution.
  • can be 4, 6, 8, etc., which is not uniquely limited here.
  • each of the plurality of thresholds is obtained by non-uniformly sampling the pixels of the current coding block according to a preset sampling interval, and under normal conditions, no more than 4 thresholds can better Balancing complexity and performance.
  • the quantizing the transform feature of the current coding block to obtain the quantization feature of the current coding block includes: using a differentiable quantization mechanism for the transform feature of the current coding block, The point feature is transformed into a quantized integer feature to obtain the quantized feature of the current coding block.
  • a differentiable quantization method is used for the extracted features, and the floating-point (floating32) features are transformed into quantized integer features; the specific method is forward calculation.
  • Round(.) is the rounding function, positive or negative
  • the mean noise distribution of ; backpropagation approximates this function as a linear function, using 1 as the gradient of the reverse derivation.
  • the feature prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; the first branch It includes three cascaded residual extraction modules and one downsampling module; the second branch includes three cascaded residual extraction modules, one downsampling module and one activation module.
  • the residual extraction module can use any mainstream neural network module, such as residual block, dense connection block, etc.
  • the downsampling module uses a convolution kernel with stride; the other branch uses cascaded convolution layers to extract features And activate it with the sigmoid function to obtain an adaptive mask of spatial-channel wise activation, and adaptively activate the extracted features.
  • the upsampling module can be implemented by transposed convolution.
  • the residual extraction module is used for feature extraction for the input residual block, and multiple residual extraction modules are used for extracting multiple features and stacking, thereby realizing cascade feature extraction.
  • the first branch is the main feature extraction module
  • the module after the sigmoid of the second branch is the self-attention activation mapping module, and the outputs of the two branches are multiplied to generate the final transformed feature.
  • the code rate and the loss function can be determined in the following manner.
  • D(.) is the mean square error MSE function or L2 loss function, is the prediction block of the current coding block, X t is the previous coding block, the integer signal residual is the discrete distribution of the prediction block of the current coding block;
  • L total L+ ⁇ R for the code rate and the loss function
  • L is the reconstruction loss of each frame
  • R is the loss of the code rate constraint
  • the feature prediction of different code rates is obtained by training Model.
  • the feature prediction model can adopt a self-attention mechanism, which can flexibly adjust the number of residual extraction modules used in the two channels according to the needs, and can also use simple convolution to replace the residual extraction module. Applicable codec acceleration and simplification.
  • the first branch and the second branch may respectively include four residual extraction modules, or include four convolution modules respectively.
  • FIG. 9A is a schematic flowchart of an image encoding method in an embodiment of the present application, and the image encoding method can be applied to the destination device 20 in the encoding and decoding system 1 shown in FIG. 5 . or the video decoder 200 shown in FIG. 7 .
  • the flow shown in FIG. 9A is described by taking the execution subject as the video encoder 200 shown in FIG. 7 as an example.
  • the image decoding method provided by the embodiment of the present application includes:
  • Step 210 Acquire a binary code stream of a current decoding block, where the current decoding block includes a code stream of a currently processed video frame or a decoding unit obtained by dividing the currently processed video frame.
  • the division manner of the decoding unit includes various division manners as shown in FIG. 4 , which is not uniquely limited here.
  • the decoding block corresponds to the encoding block involved in the foregoing encoding method embodiments, and may specifically be represented as having the same size.
  • the processing efficiency of this method is higher, but the accuracy and performance are lost to a certain extent.
  • the current coding block is the code stream of the coding unit obtained by dividing the currently processed video frame
  • the minimum data processing granularity is the coding unit after the division
  • the overall algorithm processing complexity becomes high, and the processing time changes long, but relatively high precision and performance.
  • Step 220 Transform the binary code stream into a quantized feature of the current decoding block by using a pre-trained probability prediction model.
  • the transformation is a lossless transformation.
  • Step 230 Determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model.
  • the residual prediction model can specifically realize data processing through the image processor GPU of the local device, and can adopt any commonly used neural network architecture, such as deep neural network DNN, recurrent neural network (RNN), volume Convolutional Neural Network (CNN), etc., the input of this model is quantized features, and the output is residual block.
  • DNN deep neural network
  • RNN recurrent neural network
  • CNN volume Convolutional Neural Network
  • Step 240 Determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.
  • the determining the reconstructed block of the current decoding block according to the original residual block and the prediction block of the current decoding block includes: determining the prediction block of the current decoding block; using the The original residual block performs residual compensation on the prediction block of the current decoding block to obtain a reconstructed block of the current decoding block.
  • the image decoding method in the embodiment of the present application can be specifically explained as the following steps.
  • the code stream corresponds to the secondary code stream of the current decoding block, which may specifically include the public parameter set of the current decoding block and the encoding information of the image of the current decoding block,
  • the value read from the binary code stream is the input of the pre-trained probability prediction model, and the model is run to output the quantized feature of the current decoding block;
  • the reconstructed block or reconstructed image is calculated.
  • the prediction block may be obtained by predicting the current decoding block according to the inter-frame prediction mode carried in the decoding information.
  • the determining the prediction block of the current decoding block includes: performing entropy decoding on the current decoding block to generate a syntax element; determining the inter frame for decoding the current decoding block according to the syntax element a prediction mode; according to the determined inter prediction mode, perform inter prediction on the current decoding block to obtain a prediction block of the current decoding block.
  • the residual prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; the first branch The circuit includes three cascaded residual extraction modules and one upsampling module; the second branch includes three cascaded residual extraction modules, one upsampling module and one activation module.
  • the code rate and the loss function can be determined in the following manner.
  • D(.) is the mean square error MSE function or L2 loss function, is the prediction block of the current coding block, X t is the previous coding block, the integer signal residual is the discrete distribution of the prediction block of the current coding block;
  • L total L+ ⁇ R for the code rate and the loss function, where L is the reconstruction loss of each frame, and R is the loss of the code rate constraint.
  • the residual prediction model can adopt a self-attention mechanism, which can flexibly adjust the number of residual extraction modules used in the two channels as needed, and can also use simple convolution to replace the residual extraction module. Speed up and simplify.
  • the residual prediction model is used for feature extraction for the input residual block, and multiple residual extraction modules are used for extracting multiple features for stacking, thereby realizing cascade feature extraction.
  • the first branch is the main feature extraction module
  • the module after the sigmoid of the second branch is the self-attention activation mapping module
  • the outputs of the two branches are multiplied to generate the final residual block.
  • An embodiment of the present application provides an image encoding apparatus, and the image encoding apparatus may be a video decoder or a video encoder. Specifically, the image encoding apparatus is configured to perform the steps performed by the video decoder in the above decoding method.
  • the image encoding apparatus provided in the embodiments of the present application may include modules corresponding to corresponding steps.
  • the image coding apparatus may be divided into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
  • the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • FIG. 10 shows a possible schematic structural diagram of the image coding apparatus involved in the above embodiment.
  • the image coding apparatus 10 includes an obtaining unit 100 configured to obtain an original residual block of a current coding block, where the current coding block includes a currently processed video frame or is obtained by dividing the currently processed video frame coding unit; the first prediction unit 101 is used for obtaining the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model; the quantization unit 102 is used for The transform feature is quantized to obtain the quantized feature of the current coding block; the second prediction unit 103 is used to determine the probability of each pixel in the quantized feature of the current coding block through a pre-trained probability prediction model; the generating unit 104, for generating a binary code stream of the current coding block by using the probability of each pixel.
  • the obtaining unit 100 is specifically configured to: determine the prediction block of the current coding block; and compare the prediction block of the current coding block with the prediction block of the current coding block.
  • the original image block of the current coding block is subjected to difference to obtain the original residual block.
  • the obtaining unit 100 is specifically configured to: Perform numerical transformation and quantization according to the prediction block of the current coding block to generate a discrete distribution of the prediction block; make a difference between the discrete distribution of the prediction block and the original image block of the current coding block to obtain the original image block of the integer signal residual block.
  • the first prediction unit 101 is specifically used to: re-normalize the original residual block to obtain the normalized first residual block. difference block; perform sparse processing on the first residual block to obtain a processed second residual block; input the second residual block into a pre-trained feature prediction model to obtain the current coding block Transform features.
  • the first prediction unit 101 is specifically configured to: unify according to energy A mechanism is used to converge the different residual distributions of the original residual blocks to the same distribution space to obtain a normalized first residual block.
  • the first residual block is The prediction unit 101 is specifically configured to: extract the minimum pixel value xmin and the maximum pixel value xmax in the original residual block; normalize the original residual block to the interval (0, 1) by the following formula;
  • the first prediction model 101 is specifically used to: obtain a preset threshold set , the preset threshold value set includes a plurality of threshold values; the target threshold value adapted to the current coding block is screened from the preset threshold value set; the pixel value of each pixel in the first residual block is traversed, and the pixel value is The pixel value of the pixel whose value is less than the target threshold is set to zero to obtain the processed second residual block.
  • each of the plurality of thresholds is obtained by uniformly sampling the pixels of the current coding block according to a preset sampling interval.
  • the quantization unit 102 is specifically configured to: transform the current coding block
  • the feature adopts a differentiable quantization mechanism to transform the floating-point feature into a quantized integer feature to obtain the quantized feature of the current coding block.
  • the feature prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; the first branch includes three cascaded A residual error extraction module and a downsampling module; the second branch includes three cascaded residual error extraction modules, a downsampling module and an activation module.
  • the image encoding apparatus 10 provided in this embodiment of the present application includes but is not limited to the above-mentioned modules.
  • the image encoding apparatus 10 may further include a storage unit.
  • the storage unit may be used to store program codes and data of the image encoding apparatus.
  • the image encoding apparatus 11 includes: a processing module 110 and a communication module 111 .
  • the processing module 110 is used to control and manage the actions of the image coding apparatus, for example, to perform the steps performed by the acquisition unit 100, the first prediction unit 101, the quantization unit 102, the second prediction unit 103, and the generation unit 104, and/or to Other processes that perform the techniques described herein.
  • the communication module 111 is used to support the interaction between the image coding apparatus and other devices.
  • the image encoding apparatus may further include a storage module 112, and the storage module 112 is configured to store program codes and data of the image encoding apparatus, for example, to store the content stored in the above-mentioned storage unit.
  • the processing module 110 may be a processor or a controller, such as a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other programmable Logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor may also be a combination implementing computing functions, such as a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication module 111 may be a transceiver, an RF circuit, a communication interface, or the like.
  • the storage module 112 may be a memory.
  • the image encoding apparatus 10 and the image encoding apparatus 11 can both execute the image encoding method shown in FIG. 8A , and the image encoding apparatus 10 and the image encoding apparatus 11 may be video image encoding apparatuses or other devices with video encoding functions.
  • the present application also provides a video encoder, including a non-volatile storage medium, and a central processing unit, where the non-volatile storage medium stores an executable program, and the central processing unit is connected to the non-volatile storage medium.
  • the medium is connected, and the executable program is executed to implement the image encoding method of the embodiment of the present application.
  • An embodiment of the present application provides an image decoding apparatus, and the image decoding apparatus may be a video decoder or a video decoder. Specifically, the image decoding apparatus is configured to perform the steps performed by the video decoder in the above decoding method.
  • the image decoding apparatus provided by the embodiments of the present application may include modules corresponding to corresponding steps.
  • the image decoding apparatus may be divided into functional modules according to the above method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
  • the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • FIG. 12 shows a possible schematic structural diagram of the image decoding apparatus involved in the above embodiment.
  • the image decoding device 12 includes:
  • an obtaining unit 120 configured to obtain a binary code stream of a current decoding block, where the current decoding block includes a code stream of a currently processed video frame or a decoding unit obtained by dividing the currently processed video frame;
  • the first prediction unit 121 is used to transform the binary code stream into the quantization feature of the current decoding block through a pre-trained probability prediction model
  • the second prediction unit 122 is configured to determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model;
  • the determining unit 123 is configured to determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.
  • the determining unit 123 is specifically configured to: determine the The prediction block of the current decoding block; using the original residual block to perform residual compensation on the prediction block of the current decoding block to obtain the reconstructed block of the current decoding block.
  • the determining unit 123 is specifically configured to: perform entropy decoding on the current decoding block to generate a syntax element; according to the determined inter prediction mode, performing inter prediction on the current decoding block to obtain the prediction block of the current decoding block.
  • the residual prediction model includes a first branch and a second branch, the first branch and the second branch are connected in parallel; the first branch includes a cascade of three There are two residual error extraction modules and one upsampling module; the second branch includes three cascaded residual error extraction modules, one upsampling module and one activation module.
  • the image decoding apparatus includes but is not limited to the above-mentioned modules.
  • the image decoding apparatus may further include a storage unit.
  • the storage unit may be used to store program codes and data of the image decoding apparatus.
  • the image decoding apparatus 13 includes: a processing module 130 and a communication module 131.
  • the processing module 130 is used to control and manage the actions of the image decoding apparatus, for example, to perform the steps performed by the acquisition unit 120 , the first prediction unit 121 , the second prediction unit 122 and the determination unit 123 , and/or to perform the steps described herein. other processes of the technology.
  • the communication module 131 is used to support the interaction between the image decoding apparatus and other devices.
  • the image decoding apparatus may further include a storage module 132 , which is used for storing program codes and data of the image decoding apparatus, for example, the content stored in the above-mentioned storage unit 123 .
  • the processing module 130 may be a processor or a controller, such as a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other programmable Logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication module 131 may be a transceiver, an RF circuit, a communication interface, or the like.
  • the storage module 132 may be a memory.
  • the image decoding apparatus 12 and the image decoding apparatus 13 can both execute the image decoding method shown in FIG. 9A , and the image decoding apparatus 12 and the image decoding apparatus 13 may be video image decoding apparatuses or other devices with video decoding functions.
  • the present application also provides a video decoder, including a non-volatile storage medium, and a central processing unit, where the non-volatile storage medium stores an executable program, and the central processing unit is connected to the non-volatile storage medium.
  • the medium is connected, and the executable program is executed to implement the image decoding method of the embodiment of the present application.
  • the present application also provides a terminal, where the terminal includes: one or more processors, a memory, and a communication interface.
  • the memory and the communication interface are coupled with one or more processors; the memory is used to store computer program codes, and the computer program codes include instructions.
  • the terminal executes the image encoding and/or the image coding and/or the embodiments of the present application. or image decoding method.
  • the terminal here can be a video display device, a smart phone, a portable computer, and other devices that can process or play videos.
  • Another embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium includes one or more program codes, and the one or more programs include instructions, when the processor in the decoding device executes the program When coding, the decoding device executes the image encoding method and the image decoding method of the embodiments of the present application.
  • a computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; at least one processor of the decoding device can be obtained from a computer
  • the readable storage medium reads the computer-executable instruction, and at least one processor executes the computer-executable instruction to cause the terminal to implement the image encoding method and the image decoding method of the embodiments of the present application.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • a software program it may take the form of a computer program product, in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wire eg, coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了一种图像编码方法、图像解码方法及相关装置,图像解码方法包括:获取当前编码块的原始残差块,当前编码块包括当前处理的视频帧或者划分当前处理的视频帧而得到的编码单元;根据原始残差块和预先训练好的特征预测模型,得到当前编码块的变换特征;对当前编码块的变换特征进行量化,得到当前编码块的量化特征;通过预先训练好的概率预测模型,确定当前编码块的量化特征中每个像素的概率;利用每个像素的概率生成当前编码块的二进制码流。本申请实施例实现自适应的动态残差补偿,能有效地编码不同形式的帧间残差信息。

Description

图像编码方法、图像解码方法及相关装置 技术领域
本申请涉及电子设备技术领域,具体涉及一种图像编码方法、图像解码方法及相关装置。
背景技术
数字视频能力可并入到大范围的装置中,包含数字电视、数字直播系统、无线广播系统、个人数字助理(personal digital assistant,PDA)、膝上型或桌上型计算机、平板计算机、电子书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话、视频会议装置、视频流装置等等。
数字视频装置实施视频压缩技术,例如由MPEG-2、MPEG-4、ITU-TH.263、ITU-TH.264/MPEG-4第10部分高级视频编解码(advanced video coding,AVC)、ITU-TH.265高效率视频编解码(high efficiency video coding,HEVC)标准定义的标准和所述标准的扩展部分中所描述的那些视频压缩技术,从而更高效地发射及接收数字视频信息。视频装置可通过实施这些视频编解码技术来更高效地发射、接收、编码、解码和/或存储数字视频信息。
随着互联网视频的激增,尽管数字视频压缩技术不断演进,但仍然对视频压缩比提出更高要求。
发明内容
本申请实施例提供了一种图像编码方法、图像解码方法及相关装置,以期实现自适应的动态残差补偿,能有效地编码不同形式的帧间残差信息。
第一方面,本申请实施例提供一种图像编码方法,包括:
获取当前编码块的原始残差块,所述当前编码块包括当前处理的视频帧或者划分所述当前处理的视频帧而得到的编码单元;
根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征;
对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征;
通过预先训练好的概率预测模型,确定所述当前编码块的量化特征中每个像素的概率;
利用所述每个像素的概率生成所述当前编码块的二进制码流。
相比于现有技术,本申请方案对当前预测帧进行自适应的动态残差补偿并得到最终的帧间重建,能有效地编码不同形式的帧间残差信息。
第二方面,本申请实施例提供一种图像解码方法,包括:
获取当前解码块的二进制码流,所述当前解码块包括当前处理的视频帧的码流或者划分所述当前处理的视频帧而得到的解码单元;
通过预先训练好的概率预测模型,将所述二进制码流变换成所述当前解码块的量化特征;
根据所述量化特征和预先训练好的残差预测模型,确定所述当前解码块的残差块;
根据所述残差块与所述当前解码块的预测块,确定所述当前解码块的重建块。
相比于现有技术,本申请方案对当前预测帧进行自适应的动态残差补偿并得到最终的帧间重建,能 有效地编码不同形式的帧间残差信息。
第三方面,本申请实施例提供一种图像编码装置,包括:
获取单元,用于获取当前编码块的原始残差块,所述当前编码块包括当前处理的视频帧或者划分所述当前处理的视频帧而得到的编码单元;
第一预测单元,用于根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征;
量化单元,用于对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征;
第二预测单元,用于通过预先训练好的概率预测模型,确定所述当前编码块的量化特征中每个像素的概率;
生成单元,用于利用所述每个像素的概率生成所述当前编码块的二进制码流。
第四方面,本申请实施例提供一种图像解码装置,包括:
获取单元,用于获取当前解码块的二进制码流,所述当前解码块包括当前处理的视频帧的码流或者划分所述当前处理的视频帧而得到的解码单元;
第一预测单元,用于通过预先训练好的概率预测模型,将所述二进制码流变换成所述当前解码块的量化特征;
第二预测单元,用于根据所述量化特征和预先训练好的残差预测模型,确定所述当前解码块的残差块;
确定单元,用于根据所述残差块与所述当前解码块的预测块,确定所述当前解码块的重建块。
第五方面,本申请实施例提供了一种编码器,包括:处理器和耦合于所述处理器的存储器;所述处理器用于执行上述第一方面所述的方法。
第六方面,本申请实施例提供了一种解码器,包括:处理器和耦合于所述处理器的存储器;所述处理器用于执行上述第二方面所述的方法。
第七方面,本申请实施例提供了一种终端,所述终端包括:一个或多个处理器、存储器和通信接口;所述存储器、所述通信接口与所述一个或多个处理器连接;所述终端通过所述通信接口与其他设备通信,所述存储器用于存储计算机程序代码,所述计算机程序代码包括指令,当所述一个或多个处理器执行所述指令时,所述终端执行如第一方面或第二方面所述的方法。
第八方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
第九方面,本申请实施例提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例中编码树单元的一种示意性框图;
图2为本申请实施例中CTU和编码块CU的一种示意性框图;
图3为本申请实施例中颜色格式的一种示意性框图;
图4为本申请实施例中图像划分方式的示意图;
图5为本申请实施例中编解码系统的一种示意性框图;
图6为本申请实施例中视频编码器的一种示意性框图;
图7为本申请实施例中视频解码器的一种示意性框图;
图8A为本申请实施例中一种图像编码方法的流程示意图;
图8B为本申请实施例中一种不同阈值处理后生成的残差图的示意图;
图8C为本申请实施例中一种特征预测模型的结构图;
图9A为本申请实施例中一种图像解码方法的流程示意图;
图9B为本申请实施例中一种残差预测模型的结构图;
图10为本申请实施例中图像编码装置的一种功能单元框图;
图11为本申请实施例中图像编码装置的另一种功能单元框图;
图12为本申请实施例中图像解码装置的一种功能单元框图;
图13为本申请实施例中图像解码装置的另一种功能单元框图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
可以理解,本发明所使用的术语“第一”、“第二”等可在本文中用于描述各种元件,但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说,在不脱离本发明的范围的情况下,可以将第一客户端称为第二客户端,且类似地,可将第二客户端称为第一客户端。第一客户端和第二客户端两者都是客户端,但其不是同一客户端。
首先介绍一下本申请实施例中用到的术语和相关技术。
视频中完整的图像通常被称为“帧”,由许多帧按照时间顺序组成的视频也被称为视频序列(Video Sequence)。视频序列存在空间冗余、时间冗余、视觉冗余、信息熵冗余、结构冗余、知识冗余、重要性冗余等一系列的冗余信息。为了尽可能的去除视频序列中的冗余信息,减少表征视频的数据量,提出了视频编码(Video Coding)技术,以达到减小存储空间和节省传输带宽的效果。视频编码技术也称为视频压缩技术。
就目前的技术发展现状而言,视频编码技术主要包括帧内预测、帧间预测、变换量化、熵编码以及消块滤波处理等。在国际通用范围内,视频压缩编码标准,例如:由运动图像专家组(Motion Picture Experts Group,MPEG)制定的MPEG-2和MPEG-4第10部分高级视频编解码(Advanced Video Coding,AVC),由国际电信联盟电信标准化部门(International Telecommunication Uion-Telecommunication Standardization Sector,ITU-T)制定的H.263、H.264和H.265高效率视频编解码(High Efficiency Video Coding standard,HEVC),中主流的压缩编码方式主要有四种:色度抽样、预测编码、变换编码和量化编码。
预测编码:利用之前已编码帧的数据信息来预测当前将要编码的帧。编码端通过预测得到一个预测值,该预测值与实际值之间存在着一定的残差值。如果预测越适合,则预测值就会越接近实际值,残差 值就越小,这样编码端对残差值进行编码就能大大减小数据量。解码端在解码时,运用残差值加上预测值还原重构出初始图像。在主流编码标准中,预测编码分为帧内预测和帧间预测两种基本类型。
帧间预测是基于运动补偿(motion compensation)的预测技术,主要处理为确定当前块的运动信息,根据运动信息从当前块的参考帧中获取参考图像块,产生当前块的预测图像。其中,当前块使用前向预测、后向预测或双向预测中的一种进行,预测方向通过运动信息中的帧间预测方向指示,参考帧中用于预测当前块的参考图像块相对于当前块的位移矢量通过运动信息中的运动矢量指示,一个运动矢量对应有一个参考帧。一个图像块的帧间预测可以只通过一个运动矢量,使用一个参考帧中的像素来生成预测图像,称为单向预测;也可以通过两个运动矢量,使用两个参考帧中的像素来组合生成预测图像,称为双向预测。也就是说,一个图像块通常可包含一个或两个运动矢量。对于一些多假设帧间预测(multi-hypothesis inter prediction)技术,一个图像块可能包含多于两个运动矢量。
帧间预测通过参考帧索引(reference index,ref_idx)指明参考帧(reference frame),通过运动矢量(motion vector,MV)指示当前块在参考帧中的参考块(reference block)相对当前块的位置偏移。一个MV是二维矢量,包含水平方向位移分量和竖直方向位移分量;一个MV对应于两个帧,每一帧具有一个图像顺序号(picture order count,POC),用于表示图像在显示顺序上的编号,所以一个MV也对应于一个POC差值。POC差值与时间间隔呈线性关系。运动矢量的缩放通常采用基于POC差值的缩放方式,将一对图像之间的运动矢量转换成另一对图像之间的运动矢量。
常用的帧间预测模式有以下两种。
1)高级运动矢量预测(advanced motion vector prediction,AMVP)模式:在码流中标识当前块使用的帧间预测方向(前向、后向或双向)、参考帧索引(reference index)、运动矢量预测值索引(motion vector predictor index,MVP index)、运动矢量残差值(motion vector difference,MVD);由帧间预测方向确定使用的参考帧队列,由参考帧索引确定当前块MV指向的参考帧,由运动矢量预测值索引指示MVP列表中的一个MVP作为当前块MV的预测值,一个MVP与一个MVD相加得到一个MV。
2)合并/跳跃(merge/skip)模式:码流中标识融合索引(merge index),根据融合索引(merge index)从融合候选者列表(merge candidate list)中选择一个融合候选者(merge candidate),当前块的运动信息(包括预测方向、参考帧、运动矢量)由这个融合候选者(merge candidate)确定。merge模式和skip模式的主要区别在于,merge模式隐含当前块有残差信息,而skip模式隐含当前块没有残差信息(或者说残差为0);这两种模式导出运动信息的方式是一样的。
融合候选者具体是一种运动信息数据结构体,包含帧间预测方向、参考帧、运动矢量等多种信息。当前块可根据融合索引(merge index)从融合候选者列表(merge candidate list)中选择对应的融合候选者,将融合候选者的运动信息作为当前块的运动信息,或者对融合候选者的运动信息经过缩放后作为当前块的运动信息。HEVC标准中,融合候选者可以是当前块相邻的图像块的运动信息,称为空间融合候选者(spatial merge candidate);也可以是当前块在另一已编码图像中对应位置图像块的运动信息,称为时间融合候选者(temporal merge candidate)。此外,融合候选者还可以是由一个融合候选者的前向运动信息和另一个融合候选者的后向运动信息组合而成的双向预测融合候选者(bi-predictive merge candidate),或者运动矢量强制为0矢量的零运动矢量融合候选者(zero motion vector merge candidate)。
其中,所述帧间预测单元的划分包括2N×2N划分方式(如图4中的A所示)、N×N划分方式(如图4中的B所示)、N×2N划分方式(如图4中的C所示)、2N×N划分方式(如图4中的D所示)、2N×nD划分 方式(如图4中的E所示)、2N×nU划分方式(如图4中的F所示)、nL×2N划分方式(如图4中的G所示)、nR×2N划分方式(如图4中的H所示)。其中,N为任意正整数,n=x×N,0≤x≤1。
2N×2N划分方式为对图像块不进行划分;N×N划分方式为将图像块划分为四个等大的子图像块;N×2N划分方式为将图像块划分成左右两个等大的子图像块;2N×N划分方式为将图像块划分成上下两个等大的子图像块;2N×nD划分方式为将图像块划分为上下两个子图像块,且图像划分线相对该图像块的垂直平分线下移n,其中,D表示图像划分线相对该图像块的垂直平分线下移;2N×nU划分方式为将图像块划分为上下两个子图像块,且图像划分线相对该图像块的垂直平分线上移n,其中,U表示图像划分线相对该图像块的垂直平分线上移;nL×2N划分方式为将图像块划分为左右两个子图像块,且图像划分线相对该图像块的垂直平分线左移n,其中,L表示图像划分线相对该图像块的垂直平分线左移;nR×2N划分方式为将图像块划分为左右两个子图像块,且图像划分线相对该图像块的垂直平分线右移n,其中,R表示图像划分线相对该图像块的垂直平分线右移。
对于图像的划分,为了更加灵活的表示视频内容,高效率视频编解码(High Efficiency Video Coding standard,HEVC)技术中定义了编码树单元(coding tree unit,CTU)、编码块(Coding Unit,CU)、预测单元(Prediction Unit,PU)和变换单元(Transform Unit,TU)。CTU、CU、PU和TU均为图像块。
编码树单元CTU,一幅图像由多个CTU构成,一个CTU通常对应于一个方形图像区域,包含这个图像区域中的亮度像素和色度像素(或者也可以只包含亮度像素,或者也可以只包含色度像素);CTU中还包含语法元素,这些语法元素指示如何将CTU划分成至少一个编码块(coding unit,CU),以及解码每个编码块得到重建图像的方法。如图1所示,图像10由多个CTU构成(包括CTU A、CTU B、CTU C等)。与某一CTU对应的编码信息包含与该CTU对应的方形图像区域中的像素的亮度值和/或色度值。此外,与某一CTU对应的编码信息还可以包含语法元素,这些语法元素指示如何将该CTU划分成至少一个CU,以及解码每个CU以得到重建图像的方法。一个CTU对应的图像区域可以包括64×64、128×128或256×256个像素。在一个示例中,64×64个像素的CTU包含由64列、每列64个像素的矩形像素点阵,每个像素包含亮度分量和/或色度分量。CTU也可以对应矩形图像区域或者其它形状的图像区域,一个CTU对应的图像区域也可以是水平方向的像素点的数量与竖直方向的像素点数量不同的图像区域,例如包括64×128个像素。
编码块CU,通常对应于图像中一个A×B的矩形区域,包含A×B亮度像素或/和它对应的色度像素,A为矩形的宽,B为矩形的高,A和B可以相同也可以不同,A和B的取值通常为2的整数次幂,例如128、64、32、16、8、4。其中,本申请实施例中涉及到的宽是指图1示出的二维直角坐标系XoY中沿X轴方向(水平方向)的长度,高是指图1示出的二维直角坐标系XoY中沿Y轴方向(竖直方向)的长度。一个CU的重建图像可以通过预测图像与残差图像相加得到,预测图像通过帧内预测或帧间预测生成,具体可以由一个或多个预测块(prediction block,PB)构成,残差图像通过对变换系数进行反量化和反变换处理生成,具体可以由一个或多个变换块(transform block,TB)构成。具体的,一个CU包含编码信息,编码信息包括预测模式、变换系数等信息,按照这些编码信息对CU进行相应的预测、反量化、反变换等解码处理,产生这个CU对应的重建图像。编码树单元CTU与编码块CU关系如图2所示。
数字视频压缩技术作用于颜色编码方法为YCbCr,也可称为YUV,颜色格式为4:2:0、4:2:2或4:4:4 的视频序列。其中,Y表示明亮度(Luminance或Luma),也就是灰阶值,Cb表示蓝色色度分量,Cr表示红色色度分量,U和V表示色度(Chrominance或Chroma),用于描述色彩及饱和度。在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),而4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr),图3展示了不同颜色格式下的各分量分布图,其中圆形为Y分量,三角形为UV分量。
预测单元PU,是帧内预测、帧间预测的基本单元。定义图像块的运动信息包含帧间预测方向、参考帧、运动矢量等,正在进行编码处理的图像块称为当前编码块(current coding block,CCB),正在进行解码处理的图像块称为当前解码块(current decoding block,CDB),例如正在对一个图像块进行预测处理时,当前编码块或者当前解码块为预测块;正在对一个图像块进行残差处理时,当前编码块或者当前解码块为变换块。当前编码块或当前解码块所在的图像称为当前帧。当前帧中,位于当前块的左侧或上侧的图像块可能处于当前帧内部并且已经完成了编码/解码处理,得到了重建图像,它们称为重构块;重构块的编码模式、重建像素等信息是可以获得的(available)。在当前帧进行编码/解码之前已经完成编码/解码处理的帧称为重建帧。当前帧为单向预测帧(P帧)或双向预测帧(B帧)时,它分别具有一个或两个参考帧列表,两个列表分别称为L0和L1,每个列表中包含至少一个重建帧,称为当前帧的参考帧。参考帧为当前帧的帧间预测提供参考像素。
变换单元TU,对原始图像块和预测图像块的残差进行处理。
像素(又称为像素点),是指图像中的像素点,如编码块中的像素点、亮度分量像素块中的像素点(又称为亮度像素)、色度分量像素块中的像素点(又称为色度像素)等。
样本(又称为像素值、样本值),是指像素点的像素值,该像素值在亮度分量域具体是指亮度(即灰阶值),该像素值在在色度分量域具体是指色度值(即色彩和饱和度),按照处理阶段的不同,一个像素的样本具体包括原始样本、预测样本和重构样本。
目前,随着深度学习的发展和成熟,基于深度学习的视频图像处理和编码被广泛研究。通过数据驱动的方法以及端到端学习的方式,深度神经网络能基于率失真优化端到端整个系统。卷积神经网络采用可学习的特征变换,可微分量化,动态的概率分布估计能更高效地去除视频图像之间的冗余,得到更紧凑的视频图像特征空间表达,在相同的码率情况下能得到更高的重建质量。同时,基于特定神经网络硬件加速和开发,有利于进一步推进基于学习的编解码系统的加速与落地。然而,由于视频编解码的复杂性,实现完整的端到端基于学习的视频编码方法仍是这个领域亟待解决的问题,每个特定模块的优化与分析以及其对整个端到端系统的影响仍有很大的不确定性和研究价值。国内外针对基于学习的端到端视频编码系统的标准工作刚开始进行,MPEG和AVS对于智能编码标准化基本都处于call for evidence的阶段。
现有的端到端系统方案中,直接采用端到端帧内编码处理残差信息,没有考虑残差信息的特殊性以及预测后的不均匀分布性,没有嵌入残差稀疏化方法来近似传统编码方法中的skip模式。
针对上述问题,本申请实施例提供一种图像编码方法、编码方法及相关装置,下面结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
图5为本申请实施例中所描述的一种实例的编解码系统1的框图,编解码系统1包括视频编码器100和视频解码器200,视频编码器100和视频解码器200用于实现本申请提出的基于学习的端到端自适应帧间残差编码方法。
如图5中所示,编解码系统1包含源装置10和目的装置20。源装置10产生经编码视频数据。因此,源装置10可被称为视频编码装置。目的装置20可对由源装置10所产生的经编码的视频数据进行解码。因此,目的装置20可被称为视频解码装置。源装置10、目的装置20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。
源装置10和目的装置20可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
目的装置20可经由链路30从源装置10接收经编码视频数据。链路30可包括能够将经编码视频数据从源装置10移动到目的装置20的一或多个媒体或装置。在一个实例中,链路30可包括使得源装置10能够实时将经编码视频数据直接发射到目的装置20的一或多个通信媒体。在此实例中,源装置10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的装置20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促使从源装置10到目的装置20的通信的其它设备。在另一实例中,可将经编码数据从输出接口140输出到存储装置40。
本申请的图像编解码技术可应用于视频编解码以支持多种多媒体应用,例如空中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,编解码系统1可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。
图5中所说明的编解码系统1仅为实例,并且本申请的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的装置执行编码和解码。
在图5的实例中,源装置10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,源装置10经由输出接口140将经编码视频数据直接发射到目的装置20。在其它实例中,经编码视频数据还可存储到存储装置40上,供目的装置20以后存取来用于解码和/或播放。
在图5的实例中,目的装置20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与目的装置20集成或可在目的装置20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包括多种显示装置,例如,液晶显示器(LCD)、等离子显示器、 有机发光二极管(OLED)显示器或其它类型的显示装置。
尽管图5中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本申请技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
图6为本申请实施例中所描述的一种视频编码器100的示例框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。
在图6的实例中,视频编码器100包括预测处理单元108、滤波器单元106、存储器107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106表示一个或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)和样本自适应偏移(SAO)滤波器。尽管在图6中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。
图7为本申请实施例中所描述的一种视频解码器200的示例框图。在图7的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及存储器207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行大体上与相对于来自图6的视频编码器100描述的编码过程互逆的解码过程。
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频码流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频码流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频码流的经编码视频数据的经解码图像缓冲器(CPB)。
网络实体42可例如为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它此装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频码流发送到视频解码器200之前,网络实体42可实施本申请中描述的技术中的部分。 在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频码流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。
图8A为本申请实施例中图像编码方法的一种流程示意图,该图像编码方法可以应用于图5示出的编解码系统1中的源装置10或图6示出的视频编码器100。图8A示出的流程以执行主体为图6示出的视频编码器100为例进行说明。如图8A所示,本申请实施例提供的图像编码方法包括:
步骤110,获取当前编码块的原始残差块,所述当前编码块包括当前处理的视频帧或者划分所述当前处理的视频帧而得到的编码单元。
其中,所述编码单元的划分方式包括如图4所示的各种划分方式,此处不做唯一限定。
具体实现中,针对当前编码块为当前处理的视频帧的情况,由于最小数据处理对象为单帧图像,因此该方法处理效率更高,但精度和性能有一定损失。
针对当前编码块为划分所述当前处理的视频帧而得到的编码单元的情况,由于最小数据处理颗粒度为划分后的编码单元,因此整体算法处理过程复杂度变高,处理时长变长,但精度和性能相对较高。
步骤120,根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征。
其中,所述特征预测模型具体可以通过本端设备的图像处理器GPU实现数据处理,可以采用任意常用的神经网络架构,例如深度神经网络(Deep Neural Network,DNN)、支持向量机等,该模型输入为残差块,输出为变换特征。
步骤130,对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征。
步骤140,通过预先训练好的概率预测模型,确定所述当前编码块的量化特征中每个像素的概率。
其中,在算术编码过程中,对于每一个所需编码的像素,需要预测对应像素所出现的概率(0~1之间的值),其概率可表示当前像素预测可能出现的频次,预测的概率越高,则其可能出现的频次越高,则在进行算术编码生成的码流的越小。
步骤150,利用所述每个像素的概率生成所述当前编码块的二进制码流。
在本可能的示例中,所述获取当前编码块的原始残差块,包括:确定所述当前编码块的预测块;将所述当前编码块的预测块与所述当前编码块的原始图像块做差,得到所述原始残差块。
具体实现中,基于当前编码块的预测块
Figure PCTCN2021090270-appb-000001
进行数值变换并量化,从原(0,1)的连续浮点分布,生成(0,255)的离散分布
Figure PCTCN2021090270-appb-000002
与当前编码块X t做差得到整数信号残差r t
Figure PCTCN2021090270-appb-000003
在本可能的示例中,所述将所述当前编码块的预测块与所述当前编码块的原始图像块做差,得到所述原始残差块,包括:根据所述当前编码块的预测块进行数值变换并量化,生成所述预测块的离散分布;将所述预测块的离散分布与当前编码块的原始图像块做差,得到整数信号的所述原始残差块。
在本可能的示例中,所述根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征,包括:对所述原始残差块进行重归一化,得到归一化后的第一残差块;对所述第一残差块 进行稀疏化处理,得到处理后的第二残差块;将所述第二残差块输入预先训练好的特征预测模型,得到所述当前编码块的变换特征。
具体实现中,利用基于能量的重归一化,把预测后不同分布的残差统一归一化在(-1,1)之间,对于不同的视频序列,基于能量的归一化能统一数据分布使得训练更加稳定。
此外,基于能量的重归一化可使用基于其他的标准化方法,如0-1标准化(0-1normalization),线性函数归一化等,目标是统一预测后方差较大的残差分布,加快模型训练和收敛速度。
可见,本示例中,阈值稀疏化能在相同的码率约束下,在端到端编码中能分配更多的码率在运动边界,遮挡等区域,节省较多背景区域所需要码率,此外,基于能量的重归一化能加速模型的训练和收敛,使得模型更加鲁棒于不同的残差分布。
在本可能的示例中,所述对所述原始残差块进行重归一化,得到归一化后的第一残差块,包括:根据能量统一机制,将所述原始残差块的不同残差分布收敛到相同分布空间,得到归一化后的第一残差块。
在本可能的示例中,所述根据能量统一机制,将所述原始残差块的不同残差分布收敛到相同分布空间,得到归一化后的第一残差块,包括:
提取所述原始残差块中的最小像素值x min和最大像素值x max
通过如下公式将所述原始残差块归一化到区间(0,1);
Figure PCTCN2021090270-appb-000004
其中,
Figure PCTCN2021090270-appb-000005
表示初次变换后的像素值,r t表示归一化前的像素值;
通过如下公式对
Figure PCTCN2021090270-appb-000006
进行二次变换,得到处于区间(-1,1)的残差连续分布,即归一化后的第一残差块,
Figure PCTCN2021090270-appb-000007
其中,
Figure PCTCN2021090270-appb-000008
表示归一化后的像素值。
在本可能的示例中,所述对所述第一残差块进行稀疏化处理,得到处理后的第二残差块,包括:获取预设阈值集合,所述预设阈值集合包括多个阈值;从所述预设阈值集合筛选适配所述当前编码块的目标阈值;遍历所述第一残差块中每个像素的残差样本,并将残差样本小于所述目标阈值的像素的残差样本置零,得到处理后的第二残差块。
具体实现中,目标阈值可以通过如下方式获取:从预设阈值集合的最小的阈值开始,在编码端针对每一个阈值均做率失真优化得到相对应的结果,并从结果中选择最优结果对应的阈值作为最适合当前帧残差编码的阈值。所述对每一个阈值进行率失真优化是指,每选取一个阈值都需要做一次编解码并得到相对应的结果,从最终结果中选出最优的结果。如图8B所示,r t表示归一化前的像素值,m1表示预设阈值集合中的第一个阈值,m n表示表示预设阈值集合中的第n个阈值,不同阈值处理后,生成的残差图有着不同的稀疏性,阈值越大得到的残差越稀疏,同时表示需要编码的残差空间区间越小。通过遍历 预设阈值集合,可以准确筛选出最适合当前帧残差编码的阈值,提高编码效率。
具体实现中,设置不同的阈值,对已归一化后的残差做稀疏化处理,使其能分配更有效的信息在有效的像素上。
需要注意的是,基于阈值的稀疏化是基于传统模式选择的方式,实现skip模式来自适应编码残差信息,此处的阈值稀疏化可直接针对量化后特征操作。
可见,本示例中,阈值稀疏化能在相同的码率约束下,在端到端编码中能分配更多的码率在运动边界,遮挡等区域,节省较多背景区域所需要码率。
在本可能的示例中,所述多个阈值中每个阈值按照预设的采样间隔对所述当前编码块的像素进行均匀采样得到。
其中,所述采样间隔的取值范围通过如下方式确定:根据当前帧的残差分布,生成数值分布的残差直方图,获取残差分布的1/α的峰值部分对应的区间。
其中,α的数值可以是4、6、8等,此处不做唯一限定。
此外,在其他可能的示例中,所述多个阈值中每个阈值按照预设的采样间隔对所述当前编码块的像素进行非均匀采样得到,一般条件下不超过4个阈值能更好地权衡复杂度与性能的平衡。
在本可能的示例中,所述对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征,包括:对所述当前编码块的变换特征采用可微分量化机制,将浮点的特征变换成量化后的整数特征,得到所述当前编码块的量化特征。
具体实现中,对提取的特征采用可微分量化方法,将浮点(floating32)的特征变换成量化后的整数特征;其具体方法为正向计算
Figure PCTCN2021090270-appb-000009
此处,Round(.)为四舍五入函数,
Figure PCTCN2021090270-appb-000010
为正负
Figure PCTCN2021090270-appb-000011
的均值噪声分布;反向传播把此函数近似为线性函数,用1作为反向求导的梯度。
在本可能的示例中,如图8C所示,所述特征预测模型包括第一支路和第二支路,所述第一支路和所述第二支路并联;所述第一支路包括级联的三个残差提取模块和一个下采样模块;所述第二支路包括级联的三个残差提取模块、一个下采样模块以及一个激活模块。
其中,残差提取模块可采用任意的神经网络主流模块,例如残差块,密集连接块等,下采样模块采用带步长的卷积核;另一支路采用级联的卷积层提取特征并用sigmoid函数激活,得到空间通道逐点激活(spatial-channel wise)的自适应掩膜,并对提取的特征进行自适应激活。所述上采样模块可以采用转置卷积实现。
具体实现中,残差提取模块用于针对输入的残差块进行特征提取,多个残差提取模块用于提取多个特征进行堆叠,从而实现级联特征提取。
其需要注意的是,第一支路是主要的特征提取模块,第二支路的sigmoid之后的模块是自注意力激活映射模块,两个支路的输出相乘生成最终的变换特征。
此外,所述特征预测模型的训练过程中,码率和损失函数可以通过如下方式确定。
码率估计通过公式R=∑-log(p)得到,R为码率约束的损失,P为所述量化后的变换特征中每个像素的概率;
损失函数
Figure PCTCN2021090270-appb-000012
D(.)为均方误差MSE函数或者L2损失函数,
Figure PCTCN2021090270-appb-000013
为当前编码块的预测块,X t为前编码块,整数信号残差
Figure PCTCN2021090270-appb-000014
Figure PCTCN2021090270-appb-000015
为当前编码块的预测块的离散分布;
对所述码率和所述损失函数使用率失真优化L total=L+λR,L为每一帧的重建损失,R为码率约束的损失,通过调整λ,训练得到不同码率的特征预测模型。
具体实现中,所述特征预测模型可以采用自注意力机制(self-attention),能够根据需要灵活调整两路的残差提取模块使用个数,也可以使用简单卷积来替换残差提取模块,适用编解码的加速与简化。
例如,所述第一支路和第二支路可以分别包括四个残差提取模块,或者分别包括四个卷积模块。
可以看出,本申请实施例中,采用预先训练好的神经网络模型来编码残差信息,能使神经网络模型隐式学习不同失真的残差,相比于一般的端到端残差编码,此方法能自适应地编码并作帧间补偿,在相同的码率下,能更高效地分配空间上的残差信息,得到更高质量的重建视频帧。
与图8A所述的图像编码方法对应的,图9A为本申请实施例中图像编码方法的一种流程示意图,该图像编码方法可以应用于图5示出的编解码系统1中的目的装置20或图7示出的视频解码器200。图9A示出的流程以执行主体为图7示出的视频编码器200为例进行说明。如图9A所示,本申请实施例提供的图像解码方法包括:
步骤210,获取当前解码块的二进制码流,所述当前解码块包括当前处理的视频帧的码流或者划分所述当前处理的视频帧而得到的解码单元。
其中,所述解码单元的划分方式包括如图4所示的各种划分方式,此处不做唯一限定。
其中,所述解码块与前述编码方法实施例中所涉及到的编码块是对应的,具体可以表现为大小一致。
具体实现中,针对当前解码块为当前处理的视频帧的码流情况,由于最小数据处理对象为单帧图像的码流,因此该方法处理效率更高,但精度和性能有一定损失。
针对当前编码块为划分所述当前处理的视频帧而得到的编码单元的码流的情况,由于最小数据处理颗粒度为划分后的编码单元,因此整体算法处理过程复杂度变高,处理时长变长,但精度和性能相对较高。
步骤220,通过预先训练好的概率预测模型,将所述二进制码流变换成所述当前解码块的量化特征。
其中,所述变换为无损变换。
其中,在算术编码过程中,对于每一个所需编码的像素,需要预测对应像素所出现的概率(0~1之间的值),其概率可表示当前像素预测可能出现的频次,预测的概率越高,则其可能出现的频次越高,则在进行算术编码生成的码流的越小。
步骤230,根据所述量化特征和预先训练好的残差预测模型,确定所述当前解码块的残差块。
其中,所述残差预测模型具体可以通过本端设备的图像处理器GPU实现数据处理,可以采用任意 常用的神经网络架构,例如深度神经网络DNN、循环神经网络(Recurrent Neural Network,RNN)、卷积神经网络(Convolutional Neural Network,CNN)等,该模型输入为量化特征,输出为残差块。
步骤240,根据所述残差块与所述当前解码块的预测块,确定所述当前解码块的重建块。
在本可能的示例中,所述根据所述原始残差块与所述当前解码块的预测块,确定所述当前解码块的重建块,包括:确定所述当前解码块的预测块;利用所述原始残差块对所述当前解码块的预测块做残差补偿,得到所述当前解码块的重建块。
本申请实施例的图像解码方法具体可以解释为如下步骤。
首先,获取码流,该码流对应当前解码块的二级制码流,具体可以包括当前解码块的公共参数集,以及当前解码块的图像的编码信息,
其次,以初始化后的全零特征开始,二进制码流读取后的数值为预先训练好的概率预测模型的输入,运行该模型以输出当前解码块的量化特征;
再次,以模型预测得到的量化特征为预先训练好的残差预测模型的输入,运行该模型以输出对应的残差块,
最后,根据模型预测得到的残差块与当前解码块的预测块,计算重建块或重建图像。
其中,所述预测块可以根据解码信息中携带的帧间预测模式对当前解码块预测得到。
在本可能的示例中,所述确定所述当前解码块的预测块,包括:对所述当前解码块进行熵解码以产生语法元素;根据语法元素确定对所述当前解码块进行解码的帧间预测模式;根据确定的所述帧间预测模式,对所述当前解码块执行帧间预测以获取所述当前解码块的预测块。
在本可能的示例中,如图9B所示,所述残差预测模型包括第一支路和第二支路,所述第一支路和所述第二支路并联;所述第一支路包括级联的三个残差提取模块和一个上采样模块;所述第二支路包括级联的三个残差提取模块、一个上采样模块以及一个激活模块。
此外,所述残差预测模型的训练过程中,码率和损失函数可以通过如下方式确定。
码率估计通过公式R=∑-log(p)得到,R为码率约束的损失,P为所述量化后的变换特征中每个像素的概率;
损失函数
Figure PCTCN2021090270-appb-000016
D(.)为均方误差MSE函数或者L2损失函数,
Figure PCTCN2021090270-appb-000017
为当前编码块的预测块,X t为前编码块,整数信号残差
Figure PCTCN2021090270-appb-000018
Figure PCTCN2021090270-appb-000019
为当前编码块的预测块的离散分布;
对所述码率和所述损失函数使用率失真优化L total=L+λR,L为每一帧的重建损失,R为码率约束的损失,通过调整λ,训练得到不同码率的残差预测模型。
具体实现中,所述残差预测模型可以采用自注意力机制,能够根据需要灵活调整两路的残差提取模块使用个数,也可以使用简单卷积来替换残差提取模块,适用编解码的加速与简化。
具体实现中,残差预测模型用于针对输入的残差块进行特征提取,多个残差提取模块用于提取多个特征进行堆叠,从而实现级联特征提取。
其需要注意的是,第一支路是主要的特征提取模块,第二支路的sigmoid之后的模块是自注意力激 活映射模块,两个支路的输出相乘生成最终的残差块。
可以看出,本申请实施例中,采用预先训练好的神经网络模型来编码残差信息,能使神经网络模型隐式学习不同失真的残差,相比于一般的端到端残差编码,此方法能自适应地编码并作帧间补偿,在相同的码率下,能更高效地分配空间上的残差信息,得到更高质量的重建视频帧。
本申请实施例提供一种图像编码装置,该图像编码装置可以为视频解码器或视频编码器。具体的,图像编码装置用于执行以上解码方法中的视频解码器所执行的步骤。本申请实施例提供的图像编码装置可以包括相应步骤所对应的模块。
本申请实施例可以根据上述方法示例对图像编码装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图10示出上述实施例中所涉及的图像编码装置的一种可能的结构示意图。如图10所示,图像编码装置10包括获取单元100,用于获取当前编码块的原始残差块,所述当前编码块包括当前处理的视频帧或者划分所述当前处理的视频帧而得到的编码单元;第一预测单元101,用于根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征;量化单元102,用于对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征;第二预测单元103,用于通过预先训练好的概率预测模型,确定所述当前编码块的量化特征中每个像素的概率;生成单元104,用于利用所述每个像素的概率生成所述当前编码块的二进制码流。
在本可能的示例中,在所述获取当前编码块的原始残差块方面,所述获取单元100具体用于:确定所述当前编码块的预测块;将所述当前编码块的预测块与所述当前编码块的原始图像块做差,得到所述原始残差块。
在本可能的示例中,在所述将所述当前编码块的预测块与所述当前编码块的原始图像块做差,得到所述原始残差块方面,所述获取单元100具体用于:根据所述当前编码块的预测块进行数值变换并量化,生成所述预测块的离散分布;将所述预测块的离散分布与当前编码块的原始图像块做差,得到整数信号的所述原始残差块。
在本可能的示例中,在所述根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征方面,所述根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征方面,在所述第一预测单元101具体用于:对所述原始残差块进行重归一化,得到归一化后的第一残差块;对所述第一残差块进行稀疏化处理,得到处理后的第二残差块;将所述第二残差块输入预先训练好的特征预测模型,得到所述当前编码块的变换特征。
在本可能的示例中,在所述对所述原始残差块进行重归一化,得到归一化后的第一残差块方面,所述第一预测单元101具体用于:根据能量统一机制,将所述原始残差块的不同残差分布收敛到相同分布空间,得到归一化后的第一残差块。
在本可能的示例中,在所述根据能量统一机制,将所述原始残差块的不同残差分布收敛到相同分布空间,得到归一化后的第一残差块方面,所述第一预测单元101具体用于:提取所述原始残差块中的最小像素值xmin和最大像素值xmax;通过如下公式将所述原始残差块归一化到区间(0,1);
Figure PCTCN2021090270-appb-000020
其中,
Figure PCTCN2021090270-appb-000021
表示初次变换后的像素值,r t表示归一化前的像素值;
通过如下公式对
Figure PCTCN2021090270-appb-000022
进行二次变换,得到处于区间(-1,1)的残差连续分布,即归一化后的第一残差块,
Figure PCTCN2021090270-appb-000023
其中,
Figure PCTCN2021090270-appb-000024
表示归一化后的像素值。
在本可能的示例中,在所述对所述第一残差块进行稀疏化处理,得到处理后的第二残差块方面,所述第一预测模型101具体用于:获取预设阈值集合,所述预设阈值集合包括多个阈值;从所述预设阈值集合筛选适配所述当前编码块的目标阈值;遍历所述第一残差块中每个像素的像素值,并将像素值小于所述目标阈值的像素的像素值置零,得到处理后的第二残差块。
在本可能的示例中,所述多个阈值中每个阈值按照预设的采样间隔对所述当前编码块的像素进行均匀采样得到。
在本可能的示例中,在所述对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征方面,所述量化单元102具体用于:对所述当前编码块的变换特征采用可微分量化机制,将浮点的特征变换成量化后的整数特征,得到所述当前编码块的量化特征。
在本可能的示例中,所述特征预测模型包括第一支路和第二支路,所述第一支路和所述第二支路并联;所述第一支路包括级联的三个残差提取模块和一个下采样模块;所述第二支路包括级联的三个残差提取模块、一个下采样模块以及一个激活模块。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。当然,本申请实施例提供的图像编码装置10包括但不限于上述模块,例如:图像编码装置10还可以包括存储单元。存储单元可以用于存储该图像编码装置的程序代码和数据。
在采用集成的单元的情况下,本申请实施例提供的图像编码装置的结构示意图如图11所示。在图11中,图像编码装置11包括:处理模块110和通信模块111。处理模块110用于对图像编码装置的动作进行控制管理,例如,执行获取单元100、第一预测单元101、量化单元102、第二预测单元103、生成单元104执行的步骤,和/或用于执行本文所描述的技术的其它过程。通信模块111用于支持图像编码装置与其他设备之间的交互。如图11所示,图像编码装置还可以包括存储模块112,存储模块112用于存储图像编码装置的程序代码和数据,例如存储上述存储单元所保存的内容。
其中,处理模块110可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处 理器组合,DSP和微处理器的组合等等。通信模块111可以是收发器、RF电路或通信接口等。存储模块112可以是存储器。
其中,上述方法实施例涉及的各场景的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。上述图像编码装置10和图像编码装置11均可执行上述图8A所示的图像编码方法,图像编码装置10和图像编码装置11具体可以是视频图像编码装置或者其他具有视频编码功能的设备。
本申请还提供一种视频编码器,包括非易失性存储介质,以及中央处理器,所述非易失性存储介质存储有可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述可执行程序以实现本申请实施例的图像编码方法。
本申请实施例提供一种图像解码装置,该图像解码装置可以为视频解码器或视频解码器。具体的,图像解码装置用于执行以上解码方法中的视频解码器所执行的步骤。本申请实施例提供的图像解码装置可以包括相应步骤所对应的模块。
本申请实施例可以根据上述方法示例对图像解码装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图12示出上述实施例中所涉及的图像解码装置的一种可能的结构示意图。如图12所示,图像解码装置12包括:
获取单元120,用于获取当前解码块的二进制码流,所述当前解码块包括当前处理的视频帧的码流或者划分所述当前处理的视频帧而得到的解码单元;
第一预测单元121,用于通过预先训练好的概率预测模型,将所述二进制码流变换成所述当前解码块的量化特征;
第二预测单元122,用于根据所述量化特征和预先训练好的残差预测模型,确定所述当前解码块的残差块;
确定单元123,用于根据所述残差块与所述当前解码块的预测块,确定所述当前解码块的重建块。
在一个可能的示例中,在所述根据所述原始残差块与所述当前解码块的预测块,确定所述当前解码块的重建块方面,所述确定单元123具体用于:确定所述当前解码块的预测块;利用所述原始残差块对所述当前解码块的预测块做残差补偿,得到所述当前解码块的重建块。
在一个可能的示例中,在所述确定所述当前解码块的预测块方面,所述确定单元123具体用于:对所述当前解码块进行熵解码以产生语法元素;根据语法元素确定对所述当前解码块进行解码的帧间预测模式;根据确定的所述帧间预测模式,对所述当前解码块执行帧间预测以获取所述当前解码块的预测块。
在一个可能的示例中,所述残差预测模型包括第一支路和第二支路,所述第一支路和所述第二支路并联;所述第一支路包括级联的三个残差提取模块和一个上采样模块;所述第二支路包括级联的三个残差提取模块、一个上采样模块以及一个激活模块。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。当然,本申请实施例提供的图像解码装置包括但不限于上述模块,例如:图像解码装置还可以包括存储单元。存储单元可以用于存储该图像解码装置的程序代码和数据。
在采用集成的单元的情况下,本申请实施例提供的图像解码装置的结构示意图如图13所示。在图 13中,图像解码装置13包括:处理模块130和通信模块131。处理模块130用于对图像解码装置的动作进行控制管理,例如,执行获取单元120、第一预测单元121、第二预测单元122和确定单元123执行的步骤,和/或用于执行本文所描述的技术的其它过程。通信模块131用于支持图像解码装置与其他设备之间的交互。如图13所示,图像解码装置还可以包括存储模块132,存储模块132用于存储图像解码装置的程序代码和数据,例如存储上述存储单元123所保存的内容。
其中,处理模块130可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块131可以是收发器、RF电路或通信接口等。存储模块132可以是存储器。
其中,上述方法实施例涉及的各场景的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。上述图像解码装置12和图像解码装置13均可执行上述图9A所示的图像解码方法,图像解码装置12和图像解码装置13具体可以是视频图像解码装置或者其他具有视频解码功能的设备。
本申请还提供一种视频解码器,包括非易失性存储介质,以及中央处理器,所述非易失性存储介质存储有可执行程序,所述中央处理器与所述非易失性存储介质连接,并执行所述可执行程序以实现本申请实施例的图像解码方法。
本申请还提供一种终端,该终端包括:一个或多个处理器、存储器、通信接口。该存储器、通信接口与一个或多个处理器耦合;存储器用于存储计算机程序代码,计算机程序代码包括指令,当一个或多个处理器执行指令时,终端执行本申请实施例的图像编码和/或图像解码方法。这里的终端可以是视频显示设备,智能手机,便携式电脑以及其它可以处理视频或者播放视频的设备。
本申请另一实施例还提供一种计算机可读存储介质,该计算机可读存储介质包括一个或多个程序代码,该一个或多个程序包括指令,当解码设备中的处理器在执行该程序代码时,该解码设备执行本申请实施例的图像编码方法、图像解码方法。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;解码设备的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得终端实施执行本申请实施例的图像编码方法、图像解码方法。
在上述实施例中,可以全部或部分的通过软件,硬件,固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式出现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。
所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘,硬盘、磁带)、光介质(例如,DVD)或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (21)

  1. 一种图像编码方法,其特征在于,包括:
    获取当前编码块的原始残差块,所述当前编码块包括当前处理的视频帧或者划分所述当前处理的视频帧而得到的编码单元;
    根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征;
    对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征;
    通过预先训练好的概率预测模型,确定所述当前编码块的量化特征中每个像素的概率;
    利用所述每个像素的概率生成所述当前编码块的二进制码流。
  2. 根据权利要求1所述的方法,其特征在于,所述获取当前编码块的原始残差块,包括:
    确定所述当前编码块的预测块;
    将所述当前编码块的预测块与所述当前编码块的原始图像块做差,得到所述原始残差块。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述当前编码块的预测块与所述当前编码块的原始图像块做差,得到所述原始残差块,包括:
    根据所述当前编码块的预测块进行数值变换并量化,生成所述预测块的离散分布;
    将所述预测块的离散分布与当前编码块的原始图像块做差,得到整数信号的所述原始残差块。
  4. 根据权利要求1所述的方法,其特征在于,所述根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征,包括:
    对所述原始残差块进行重归一化,得到归一化后的第一残差块;
    对所述第一残差块进行稀疏化处理,得到处理后的第二残差块;
    将所述第二残差块输入预先训练好的特征预测模型,得到所述当前编码块的变换特征。
  5. 根据权利要求4所述的方法,其特征在于,所述对所述原始残差块进行重归一化,得到归一化后的第一残差块,包括:
    根据能量统一机制,将所述原始残差块的不同残差分布收敛到相同分布空间,得到归一化后的第一残差块。
  6. 根据权利要求5所述的方法,其特征在于,所述根据能量统一机制,将所述原始残差块的不同残差分布收敛到相同分布空间,得到归一化后的第一残差块,包括:
    提取所述原始残差块中的最小像素值x min和最大像素值x max
    通过如下公式将所述原始残差块归一化到区间(0,1);
    Figure PCTCN2021090270-appb-100001
    其中,
    Figure PCTCN2021090270-appb-100002
    表示初次变换后的像素值,r t表示归一化前的像素值;
    通过如下公式对
    Figure PCTCN2021090270-appb-100003
    进行二次变换,得到处于区间(-1,1)的残差连续分布,即归一化后的第一残差块;
    Figure PCTCN2021090270-appb-100004
    其中,
    Figure PCTCN2021090270-appb-100005
    表示归一化后的像素值。
  7. 根据权利要求4-6任一项所述的方法,其特征在于,所述对所述第一残差块进行稀疏化处理,得到处理后的第二残差块,包括:
    获取预设阈值集合,所述预设阈值集合包括多个阈值;
    从所述预设阈值集合筛选适配所述当前编码块的目标阈值;
    遍历所述第一残差块中每个像素的像素值,并将像素值小于所述目标阈值的像素的像素值置零,得到处理后的第二残差块。
  8. 根据权利要求7所述的方法,其特征在于,所述多个阈值中每个阈值按照预设的采样间隔对所述当前编码块的像素进行均匀采样得到。
  9. 根据权利要求1所述的方法,其特征在于,所述对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征,包括:
    对所述当前编码块的变换特征采用可微分量化机制,将浮点的特征变换成量化后的整数特征,得到所述当前编码块的量化特征。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述特征预测模型包括第一支路和第二支路,所述第一支路和所述第二支路并联;
    所述第一支路包括级联的三个残差提取模块和一个下采样模块;
    所述第二支路包括级联的三个残差提取模块、一个下采样模块以及一个激活模块。
  11. 一种图像解码方法,其特征在于,包括:
    获取当前解码块的二进制码流,所述当前解码块包括当前处理的视频帧的码流或者划分所述当前处理的视频帧而得到的解码单元;
    通过预先训练好的概率预测模型,将所述二进制码流变换成所述当前解码块的量化特征;
    根据所述量化特征和预先训练好的残差预测模型,确定所述当前解码块的残差块;
    根据所述残差块与所述当前解码块的预测块,确定所述当前解码块的重建块。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述原始残差块与所述当前解码块的预测块,确定所述当前解码块的重建块,包括:
    确定所述当前解码块的预测块;
    利用所述原始残差块对所述当前解码块的预测块做残差补偿,得到所述当前解码块的重建块。
  13. 根据权利要求12所述的方法,其特征在于,所述确定所述当前解码块的预测块,包括:
    对所述当前解码块进行熵解码以产生语法元素;
    根据语法元素确定对所述当前解码块进行解码的帧间预测模式;
    根据确定的所述帧间预测模式,对所述当前解码块执行帧间预测以获取所述当前解码块的预测块。
  14. 根据权利要求11所述的方法,其特征在于,所述残差预测模型包括第一支路和第二支路,所述第一支路和所述第二支路并联;
    所述第一支路包括级联的三个残差提取模块和一个上采样模块;
    所述第二支路包括级联的三个残差提取模块、一个上采样模块以及一个激活模块。
  15. 一种图像编码装置,其特征在于,包括:
    获取单元,用于获取当前编码块的原始残差块,所述当前编码块包括当前处理的视频帧或者划分所述当前处理的视频帧而得到的编码单元;
    第一预测单元,用于根据所述原始残差块和预先训练好的特征预测模型,得到所述当前编码块的变换特征;
    量化单元,用于对所述当前编码块的变换特征进行量化,得到所述当前编码块的量化特征;
    第二预测单元,用于通过预先训练好的概率预测模型,确定所述当前编码块的量化特征中每个像素的概率;
    生成单元,用于利用所述每个像素的概率生成所述当前编码块的二进制码流。
  16. 一种图像解码装置,其特征在于,包括:
    获取单元,用于获取当前解码块的二进制码流,所述当前解码块包括当前处理的视频帧的码流或者划分所述当前处理的视频帧而得到的解码单元;
    第一预测单元,用于通过预先训练好的概率预测模型,将所述二进制码流变换成所述当前解码块的量化特征;
    第二预测单元,用于根据所述量化特征和预先训练好的残差预测模型,确定所述当前解码块的残差块;
    确定单元,用于根据所述残差块与所述当前解码块的预测块,确定所述当前解码块的重建块。
  17. 一种编码器,包括非易失性存储介质以及中央处理器,其特征在于,所述非易失性存储介质存储有可执行程序,所述中央处理器与所述非易失性存储介质连接,当所述中央处理器执行所述可执行程序时,所述编码器执行如权利要求1-10中任意一项所述的双向帧间预测方法。
  18. 一种解码器,包括非易失性存储介质以及中央处理器,其特征在于,所述非易失性存储介质存储有可执行程序,所述中央处理器与所述非易失性存储介质连接,当所述中央处理器执行所述可执行程序时,所述解码器执行如权利要求11-14中任意一项所述的双向帧间预测方法。
  19. 一种终端,其特征在于,所述终端包括:一个或多个处理器、存储器和通信接口;所述存储器、所述通信接口与所述一个或多个处理器连接;所述终端通过所述通信接口与其他设备通信,所述存储器用于存储计算机程序代码,所述计算机程序代码包括指令,
    当所述一个或多个处理器执行所述指令时,所述终端执行如权利要求1-10或11-14中任意一项所述的方法。
  20. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在终端上运行时,使得所述终端执行如权利要求1-10或11-14中任意一项所述的方法。
  21. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在终端上运行时,使得所述终端执行如权利要求1-10或11-14中任意一项所述的方法。
PCT/CN2021/090270 2020-10-28 2021-04-27 图像编码方法、图像解码方法及相关装置 WO2022088631A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011176891.8A CN114501010B (zh) 2020-10-28 2020-10-28 图像编码方法、图像解码方法及相关装置
CN202011176891.8 2020-10-28

Publications (1)

Publication Number Publication Date
WO2022088631A1 true WO2022088631A1 (zh) 2022-05-05

Family

ID=81383511

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090270 WO2022088631A1 (zh) 2020-10-28 2021-04-27 图像编码方法、图像解码方法及相关装置

Country Status (3)

Country Link
CN (1) CN114501010B (zh)
TW (1) TW202218428A (zh)
WO (1) WO2022088631A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115037933A (zh) * 2022-08-09 2022-09-09 浙江大华技术股份有限公司 一种帧间预测的方法及设备
CN115052154A (zh) * 2022-05-30 2022-09-13 北京百度网讯科技有限公司 一种模型训练和视频编码方法、装置、设备及存储介质
CN115174908A (zh) * 2022-06-30 2022-10-11 北京百度网讯科技有限公司 视频编码的变换量化方法、装置、设备以及存储介质
CN115941966A (zh) * 2022-12-30 2023-04-07 深圳大学 一种视频压缩方法及电子设备
CN116112694A (zh) * 2022-12-09 2023-05-12 无锡天宸嘉航科技有限公司 一种应用于模型训练的视频数据编码方法及系统
CN116708934A (zh) * 2023-05-16 2023-09-05 深圳东方凤鸣科技有限公司 一种视频编码处理方法及装置
WO2024093627A1 (zh) * 2022-11-04 2024-05-10 腾讯科技(深圳)有限公司 一种视频压缩方法、视频解码方法和相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103329522A (zh) * 2010-12-28 2013-09-25 三菱电机株式会社 用于使用字典编码视频的方法
US20190246102A1 (en) * 2018-02-08 2019-08-08 Electronics And Telecommunications Research Institute Method and apparatus for video encoding and video decoding based on neural network
CN110753225A (zh) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 一种视频压缩方法、装置及终端设备
CN111641832A (zh) * 2019-03-01 2020-09-08 杭州海康威视数字技术股份有限公司 编码方法、解码方法、装置、电子设备及存储介质
US10771807B1 (en) * 2019-03-28 2020-09-08 Wipro Limited System and method for compressing video using deep learning

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MX2013013912A (es) * 2011-06-27 2013-12-16 Panasonic Corp Metodo de decodificacion de imagenes, metodo de codificacion de imagenes, aparato de decodificacion de imagenes, aparato de decodificacion de imagenes y aparato de codificacion-decodificacio n de imagenes.
KR101955374B1 (ko) * 2011-06-30 2019-05-31 에스케이 텔레콤주식회사 고속 코딩 단위(Coding Unit) 모드 결정을 통한 부호화/복호화 방법 및 장치
CN102970536B (zh) * 2012-11-15 2015-10-28 上海交通大学 一种改进的带有预测残差调整的视频编码方法
CN103117546B (zh) * 2013-02-28 2016-03-16 武汉大学 一种超短期风电功率滑动预测方法
CN106412579B (zh) * 2015-07-30 2019-07-16 浙江大华技术股份有限公司 一种图像的编码、解码方法和装置
CN105430416B (zh) * 2015-12-04 2019-03-01 四川大学 一种基于自适应稀疏域编码的指纹图像压缩方法
EP3471418A1 (en) * 2017-10-12 2019-04-17 Thomson Licensing Method and apparatus for adaptive transform in video encoding and decoding
US10798402B2 (en) * 2017-10-24 2020-10-06 Google Llc Same frame motion estimation and compensation
WO2019117645A1 (ko) * 2017-12-14 2019-06-20 한국전자통신연구원 예측 네트워크를 사용하는 영상의 부호화 및 복호화를 위한 방법 및 장치
CN113923455B (zh) * 2018-03-30 2023-07-18 华为技术有限公司 一种双向帧间预测方法及装置
CN108550131B (zh) * 2018-04-12 2020-10-20 浙江理工大学 基于特征融合稀疏表示模型的sar图像车辆检测方法
CN116405686A (zh) * 2018-12-15 2023-07-07 华为技术有限公司 图像重建方法和装置
CN110503833B (zh) * 2019-08-29 2021-06-08 桂林电子科技大学 一种基于深度残差网络模型的入口匝道联动控制方法
CN110740319B (zh) * 2019-10-30 2024-04-05 腾讯科技(深圳)有限公司 视频编解码方法、装置、电子设备及存储介质
CN111681298A (zh) * 2020-06-08 2020-09-18 南开大学 一种基于多特征残差网络的压缩感知图像重建方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103329522A (zh) * 2010-12-28 2013-09-25 三菱电机株式会社 用于使用字典编码视频的方法
US20190246102A1 (en) * 2018-02-08 2019-08-08 Electronics And Telecommunications Research Institute Method and apparatus for video encoding and video decoding based on neural network
CN111641832A (zh) * 2019-03-01 2020-09-08 杭州海康威视数字技术股份有限公司 编码方法、解码方法、装置、电子设备及存储介质
US10771807B1 (en) * 2019-03-28 2020-09-08 Wipro Limited System and method for compressing video using deep learning
CN110753225A (zh) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 一种视频压缩方法、装置及终端设备

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115052154A (zh) * 2022-05-30 2022-09-13 北京百度网讯科技有限公司 一种模型训练和视频编码方法、装置、设备及存储介质
CN115052154B (zh) * 2022-05-30 2023-04-14 北京百度网讯科技有限公司 一种模型训练和视频编码方法、装置、设备及存储介质
CN115174908A (zh) * 2022-06-30 2022-10-11 北京百度网讯科技有限公司 视频编码的变换量化方法、装置、设备以及存储介质
CN115174908B (zh) * 2022-06-30 2023-09-15 北京百度网讯科技有限公司 视频编码的变换量化方法、装置、设备以及存储介质
CN115037933A (zh) * 2022-08-09 2022-09-09 浙江大华技术股份有限公司 一种帧间预测的方法及设备
WO2024093627A1 (zh) * 2022-11-04 2024-05-10 腾讯科技(深圳)有限公司 一种视频压缩方法、视频解码方法和相关装置
CN116112694A (zh) * 2022-12-09 2023-05-12 无锡天宸嘉航科技有限公司 一种应用于模型训练的视频数据编码方法及系统
CN116112694B (zh) * 2022-12-09 2023-12-15 无锡天宸嘉航科技有限公司 一种应用于模型训练的视频数据编码方法及系统
CN115941966A (zh) * 2022-12-30 2023-04-07 深圳大学 一种视频压缩方法及电子设备
CN115941966B (zh) * 2022-12-30 2023-08-22 深圳大学 一种视频压缩方法及电子设备
CN116708934A (zh) * 2023-05-16 2023-09-05 深圳东方凤鸣科技有限公司 一种视频编码处理方法及装置
CN116708934B (zh) * 2023-05-16 2024-03-22 深圳东方凤鸣科技有限公司 一种视频编码处理方法及装置

Also Published As

Publication number Publication date
CN114501010B (zh) 2023-06-06
CN114501010A (zh) 2022-05-13
TW202218428A (zh) 2022-05-01

Similar Documents

Publication Publication Date Title
WO2022088631A1 (zh) 图像编码方法、图像解码方法及相关装置
US9584832B2 (en) High quality seamless playback for video decoder clients
TWI741239B (zh) 視頻資料的幀間預測方法和裝置
WO2020220884A1 (zh) 视频序列的帧内预测方法及装置
US11736706B2 (en) Video decoding method and apparatus, and decoding device
WO2021238540A1 (zh) 图像编码方法、图像解码方法及相关装置
WO2020125595A1 (zh) 视频译码器及相应方法
US11924438B2 (en) Picture reconstruction method and apparatus
WO2021185257A1 (zh) 图像编码方法、图像解码方法及相关装置
WO2020114394A1 (zh) 视频编解码方法、视频编码器和视频解码器
WO2020143585A1 (zh) 视频编码器、视频解码器及相应方法
WO2021244197A1 (zh) 图像编码方法、图像解码方法及相关装置
US11582444B2 (en) Intra-frame coding method and apparatus, frame coder, and frame coding system
CN111586406B (zh) 一种vvc帧内帧间跳过方法、系统、设备及存储介质
WO2021164014A1 (zh) 视频编码方法及装置
WO2022022622A1 (zh) 图像编码方法、图像解码方法及相关装置
US20220166982A1 (en) Video encoder and qp setting method
WO2022037300A1 (zh) 编码方法、解码方法及相关装置
WO2022022299A1 (zh) 视频编解码中的运动信息列表构建方法、装置及设备
CN114071161B (zh) 图像编码方法、图像解码方法及相关装置
CN112055970B (zh) 候选运动信息列表的构建方法、帧间预测方法及装置
WO2020134817A1 (zh) 预测模式确定方法、装置及编码设备和解码设备
WO2020147514A1 (zh) 视频编码器、视频解码器及相应方法
TWI841033B (zh) 視頻數據的幀間預測方法和裝置
WO2020259330A1 (zh) 非可分离变换方法以及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884374

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884374

Country of ref document: EP

Kind code of ref document: A1