WO2023083245A1 - 解码方法、编码方法及装置 - Google Patents

解码方法、编码方法及装置 Download PDF

Info

Publication number
WO2023083245A1
WO2023083245A1 PCT/CN2022/131068 CN2022131068W WO2023083245A1 WO 2023083245 A1 WO2023083245 A1 WO 2023083245A1 CN 2022131068 W CN2022131068 W CN 2022131068W WO 2023083245 A1 WO2023083245 A1 WO 2023083245A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
image
information
decoding
pixel
Prior art date
Application number
PCT/CN2022/131068
Other languages
English (en)
French (fr)
Inventor
魏亮
陈方栋
王莉
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2023083245A1 publication Critical patent/WO2023083245A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/94Vector quantisation

Definitions

  • the present application relates to the field of video encoding and decoding, and in particular to an image decoding method, encoding method and device.
  • Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundant information inherent in video sequences.
  • the basic principle of video compression is to quantify the correlation between the space domain, time domain and code words, and remove redundancy as much as possible. Quantization refers to the process of mapping continuous values (or a large number of discrete values) of a signal into a finite number of discrete amplitude values, realizing the many-to-one mapping of signal values.
  • the current practice is that, for one or more coding units (CU) included in a frame of image, the encoder obtains the quantization parameter (quantization parameter, QP) of each CU, and encodes the CU according to the QP to obtain the code
  • the decoder performs dequantization on the code stream to obtain the QP of the CU, and decodes the CU according to the QP.
  • the CU is divided according to the image content, and the encoding end and the decoding end use the same QP to quantize the image content corresponding to a CU, resulting in large quantization distortion in the image encoding and decoding process.
  • the present application provides an image decoding method, encoding method and device, which solve the problem of large quantization distortion in the image encoding and decoding process.
  • the present application provides an image decoding method, the method can be applied to a video decoding system, or the method can be applied to a decoding end that can support video decoding to implement the method, for example, the decoding end includes a video Decoder, the method includes: firstly, parsing the code stream to obtain one or more image frames, one image frame includes one or more CUs; secondly, determining multiple QP values of the one image frame; wherein, one CU It includes multiple QGs, one QG corresponds to one QP value; finally, the one image frame is decoded according to the multiple QP values.
  • a CU can be divided into multiple QGs, and each QG One or more residual coefficients in share a QP value, and then, the video decoder can make a finer-grained QP decision for one or more CUs corresponding to the image frame, and reduce the image quality while ensuring a certain compression rate.
  • the decoding of frames is distorted, which improves the authenticity and accuracy of video image decoding.
  • the one CU includes a plurality of residual coefficients
  • the one QG includes a part of the plurality of residual coefficients
  • the part of the residual coefficients shares the one QP value.
  • the method before determining the multiple QP values of the one image frame, the method further includes: dividing a CU included in the one image frame according to a first rule, and obtaining the A plurality of QGs; wherein, the first rule includes a division domain and a division method, the division domain is a transform domain or a pixel domain, and the division method includes at least one of uniform division and non-uniform division.
  • the one CU includes multiple residual coefficients, and positions of the multiple residual coefficients are marked by coordinates, where the coordinates include an abscissa and a ordinate.
  • the division domain is a transform domain
  • dividing a CU included in the one image frame according to the first rule, and acquiring the multiple QGs includes: combining the coordinate sums of the multiple residual coefficients that do not reach the first
  • the residual coefficient of the coordinate threshold is divided into the first QG, and the residual coefficient reaching the first coordinate threshold is divided into the second QG; the coordinate sum is the sum of the abscissa and ordinate of the residual coefficient.
  • the coordinates and the residual coefficients that do not reach the first coordinate threshold are divided into the first QG, and the coordinates and the residual coefficients that reach the first coordinate threshold and do not reach the second coordinate threshold
  • the difference coefficients are assigned to a second QG, and the coordinates and residual coefficients reaching the second coordinate threshold are assigned to a third QG; the second coordinate threshold is greater than the first coordinate threshold.
  • the one CU includes multiple residual coefficients, and if the division domain is a transform domain, divide the one CU included in the one image frame according to the first rule, and obtain the multiple residual coefficients QGs, including:
  • Sorting the plurality of residual coefficients Sorting the plurality of residual coefficients, dividing the residual coefficients that do not reach the first proportion threshold among the plurality of residual coefficients into the first QG, and dividing the residual coefficients that reach the first proportion threshold into the first QG Two QG; the ordering manner of the plurality of residual coefficients is any of the following: zigzag, reverse zigzag.
  • sort the plurality of residual coefficients divide the residual coefficients among the plurality of residual coefficients that do not reach the first ratio threshold into the first QG, and reach the first ratio threshold but not reach the first QG
  • the residual coefficients of the second proportional threshold are divided into the second QG, and the residual coefficients reaching the second proportional threshold are divided into the third QG; the second proportional threshold is greater than the first proportional threshold.
  • the one CU includes multiple residual coefficients, and if the division domain is the pixel domain, divide the one CU included in the one image frame according to the first rule, and obtain the multiple residual coefficients.
  • the first QG includes: dividing the plurality of residual coefficients horizontally or vertically symmetrically to obtain two QGs containing the same number of residual coefficients.
  • the plurality of residual coefficients are divided horizontally or vertically to obtain two QGs containing inconsistent numbers of residual coefficients.
  • the plurality of residual coefficients are divided horizontally or vertically to obtain three QGs; the residual coefficients included in the three QGs do not have a symmetrical relationship.
  • a QP value corresponding to the one QG includes a luma QP value and a chrominance QP value. Determining the multiple QP values of the one image frame includes: separately acquiring the luminance QP value and the chrominance QP value of the one QG.
  • determining multiple QP values of the one image frame includes: parsing the code stream to obtain flag information of the one image frame, where the flag information is used to indicate the one The QP value of the QG, and/or, the flag information is used to indicate the QP value of the one CU.
  • determining multiple QP values of the one image frame includes: first, parsing the code stream to obtain flag information of the one image frame, where the flag information is used to indicate The QP offset of the one QG; second, determine the QP value of the one QG according to the predicted QP value of the one QG and the flag information.
  • determining the QP value of the one QG according to the predicted QP value of the one QG and the flag information includes: acquiring the predicted QP value of the one QG, and storing the The sum of the predicted QP value of one QG and the QP offset is used as the QP value of one QG.
  • determining the multiple QP values of the one image frame includes: acquiring the predicted QP value of the one QG, and determining the multiple QP values of the one QG according to the predicted QP value and derivation information of the one QG.
  • the distortion constraint information indicates a distortion threshold of any one of the multiple QGs. Then, according to the predicted QP value and derivation information of the one QG, determining the QP value of the one QG includes: determining the predicted distortion corresponding to the predicted QP value; if the predicted distortion is less than or equal to the distortion threshold, setting The predicted QP value is used as the QP value of the QG; if the predicted distortion is greater than the distortion threshold, the QP value determined by the distortion threshold is used as the QP value of the QG.
  • the derivation information is the content information of the one QG or the remaining space of the code stream buffer
  • determine the The QP value of a QG includes: determining the QP offset of the QG according to the derivation information; taking the sum of the predicted QP value of the QG and the QP offset as the QP of the QG value.
  • obtaining the predicted QP value of the one QG includes: obtaining the QP value of at least one other QG adjacent to the one QG in the one CU; A QP value for a QG, determining a predicted QP value for said one QG.
  • the one image frame includes at least a first partial CU and a second partial CU, the first partial CU and the second partial CU have no overlapping area, and the first partial CU and the second partial CU
  • the QP values of the second part of CUs are obtained in different ways.
  • determining the multiple QP values of the one image frame includes: parsing the code stream to obtain flag information of the one image frame, where the flag information includes the first part of CU A QP offset; determine the QP value of the first part of CUs according to the flag information. And, for the second partial CU, obtain the predicted QP value of the second partial CU; determine the QP value of the second partial CU according to the predicted QP value and derivation information of the second partial CU; wherein , the derivation information is any one or a combination of several of the following: flatness information or texture information of the second partial CU, remaining space of the code stream buffer, or distortion constraint information.
  • decoding the one image frame according to the multiple QP values includes: firstly, for each QP value in the multiple QP values, obtaining the The quantization step size Qstep; secondly, obtain the level value contained in the QG corresponding to the QP value; finally, dequantize the level value of the QG according to the selected quantizer combination; the quantizer combination includes one or Multiple quantizers.
  • the quantizer is a uniform quantizer or a non-uniform quantizer.
  • the quantizer combination is determined through the flag information carried in the code stream, or determined by the distribution of residual coefficients in the QG.
  • performing inverse quantization on the level value of the QG includes: first, determining a division domain type of the QG. Secondly, if the QG is divided into domain types and is a transform domain, then select a quantization matrix that matches the parameter information of the QG from the matrix template library at the decoding end; the matrix template library includes multiple types of quantization matrix templates , the parameter information includes any one or a combination of the following: the size of the QG, the size of the CU where the QG resides, luma and chroma channel information, and flatness information. Finally, the level value in the QG is dequantized by using the quantization matrix of the QG to obtain the residual coefficient of the QG.
  • the multiple types of quantization matrix templates include flat block templates and texture block templates; the Qstep of the residual coefficients in the flat block templates whose frequency is higher than the frequency threshold is greater than or equal to the The Qstep of the residual coefficient whose frequency does not reach the frequency threshold in the flat block template; the Qstep of the residual coefficient whose frequency is higher than the frequency threshold in the texture block template is less than or equal to the frequency in the texture block template that does not reach the frequency Qstep of the threshold residual coefficient.
  • the quantization matrix template included in the matrix template library is obtained by any one or more of the following transformation methods: discrete cosine transform (discrete cosine transform, DCT), discrete sine transform (discrete cosine transform) sine transform, DST), integer transform or discrete wavelet transform (discrete wave transform, DWT).
  • the one QG includes one or more pixels of the one image frame.
  • determining multiple QP values of the one image frame includes: parsing the code stream to determine one or more marked QGs in the one CU; the marked One or more QGs need to be dequantized during the decoding process, and the unmarked QGs in the one CU are not dequantized; for each QG in the one or more marked QGs, the each A QP value for a QG.
  • the scanning order includes any one or a combination of the following: from top to bottom, from left to right, zigzag or reverse zigzag order ; For each QG among all the QGs, acquire the QP value of each QG according to the scanning order.
  • QP values corresponding to at least two QGs among the multiple QGs are different.
  • the present application provides an image decoding method, the method can be applied to a video decoding system, or the method can be applied to a decoding end that can support video decoding to implement the method, and the method is executed by the decoding end , the method includes: parsing the code stream to obtain one or more image frames, one image frame includes one or more CUs; determining multiple QP values of the one image frame; wherein, one CU includes multiple pixel points, One pixel corresponds to one QP value, and the QP values of at least two pixel points among the multiple pixel points are different; the one image frame is decoded according to the multiple QP values.
  • determining the multiple QP values of the one image frame includes: acquiring the predicted QP value of the one pixel point, and deriving information based on the predicted QP value and derivation information of the one pixel point, Determine the QP value of the one pixel; wherein, the derivation information is the information of one or more reconstructed pixels around the one pixel.
  • the predicted QP value of the one pixel is the QP value of the CU or QG where the one pixel is located, or according to one or more reconstructed QP values around the one pixel
  • the QP value of the pixel is derived, wherein the derivation method includes calculating at least one of mean, median or mode.
  • the reconstructed pixel is a pixel in a square area with a side length of 3 or 5 centered on the one pixel point, or a diagonal line length of 3 or 5 Pixels in the diamond-shaped area of .
  • the reconstructed pixel information includes any one or a combination of several of the following: pixel value, flatness information or texture information, background brightness, and contrast.
  • determining the QP value of the one pixel according to the predicted QP value and derivation information of the one pixel includes: according to one or more existing The information of the reconstructed pixel determines the indication information of the pixel; if the indication information is less than or equal to the first threshold, and the predicted QP value is greater than or equal to the QP value corresponding to the human eye just perceptible distortion, then the human eye The QP value corresponding to the perceivable distortion is used as the QP value of the pixel.
  • the QP value corresponding to the distortion that can be detected by the human eye is a preset value (such as the image-level or CU-level information preset by the encoding end or the decoding end), and the QP value corresponding to the distortion that can be detected by the human eye is the code stream It can be obtained through analysis (image-level or CU-level transmission), or the QP value corresponding to the distortion that can be detected by the human eye is derived from the flatness information or texture information, background brightness, and contrast information of the surrounding reconstructed CU.
  • the present application provides an image encoding method, the method can be applied to a video decoding system, or the method can be applied to an encoding end that can support video decoding to implement the method, and the method is executed by the encoding end , the method includes: dividing an image frame into one or more CUs; determining multiple QP values of the one image frame; wherein, one CU includes multiple QGs, and one QG corresponds to one QP value; according to the multiple A QP value encodes the one image frame.
  • the present application provides an image encoding method, the method can be applied to a video decoding system, or the method can be applied to an encoding end that can support video decoding to implement the method, and the method is executed by the encoding end , the method includes: dividing an image frame into one or more CUs; determining multiple QP values of the one image frame; wherein, one CU includes multiple pixels, one pixel corresponds to one QP value, and the The QP values of at least two pixel points among the plurality of pixel points are different; and the one image frame is encoded according to the plurality of QP values.
  • the present application provides an image decoding device, the decoding device is applied to a decoding end, and the decoding device includes various modules for implementing the method in any possible implementation manner of the first aspect or the second aspect.
  • the decoding device has the function of implementing the behaviors in the method examples of any one of the first aspect or the second aspect above.
  • the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides an image encoding device, the encoding device is applied to an encoding end, and the encoding device includes various modules for implementing the method in any possible implementation manner of the third aspect or the fourth aspect.
  • the encoding device includes: an image segmentation unit, a QP decision unit and an image encoding unit.
  • the encoding device has the function of realizing the behavior in the method example of any one of the third aspect or the fourth aspect above.
  • the functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides an electronic device, including a processor and a memory, the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to realize the first to the second aspects Any method of realization in the four aspects.
  • the electronic device may refer to a video encoder, or include an encoding end of the video encoder.
  • the electronic device may refer to a video decoder, or include a decoding end of the video decoder.
  • the present application provides a computer-readable storage medium, in which computer programs or instructions are stored.
  • the computer programs or instructions are executed by a computing device or a storage system where the computing device is located, the first to second aspects can be realized. Any method of realization in the four aspects.
  • the present application provides a computer program product, the computing program product includes instructions, and when the computer program product is run on a computing device or a processor, the computing device or processor executes the instructions, so as to implement the first aspect to A method in any one of the implementation manners in the fourth aspect.
  • the present application provides a video decoding system
  • the video decoding system includes an encoding end and a decoding end
  • the decoding end is used to implement any one of the implementation methods in the first aspect to the second aspect
  • the encoding end uses A method for realizing any one of the implementation manners from the third aspect to the fourth aspect.
  • FIG. 1 is an exemplary block diagram of a video decoding system provided by the present application
  • Fig. 2 is an exemplary block diagram of a video encoder provided by the present application
  • Fig. 3 is an exemplary block diagram of a video decoder provided by the present application.
  • FIG. 4 is a schematic flow diagram of a video encoding/decoding provided by the present application.
  • FIG. 5 is a schematic flowchart of an image decoding method provided by the present application.
  • FIG. 6 is a schematic diagram of a transformation domain division provided by the present application.
  • FIG. 7 is a schematic diagram of a pixel domain division provided by the present application.
  • FIG. 8 is a schematic flow chart of obtaining QP by using predictive coding provided by the present application.
  • FIG. 9 is a schematic diagram of the distribution of pixels provided by the present application.
  • FIG. 10 is a schematic flow diagram of an image frame decoding provided by the present application.
  • Figure 11 is a schematic diagram of the quantization matrix template provided by the present application.
  • FIG. 12 is a schematic flowchart of an image encoding method provided by the present application.
  • FIG. 13 is a schematic structural diagram of a decoding device provided by the present application.
  • FIG. 14 is a schematic structural diagram of an encoding device provided by the present application.
  • FIG. 15 is a schematic structural diagram of an electronic device provided by the present application.
  • FIG. 1 is an exemplary block diagram of a video decoding system provided in the present application.
  • video decoder generally refers to both a video encoder and a video decoder.
  • video coding or “coding” may refer to video encoding or video decoding generally.
  • the video encoder 100 and the video decoder 200 of the video decoding system 1 are used to predict the currently decoded image block according to various method examples described in any one of the multiple new inter-frame prediction modes proposed in this application
  • the motion information of its sub-blocks, such as the motion vector makes the predicted motion vector as close as possible to the motion vector obtained by using the motion estimation method, so that there is no need to transmit the motion vector difference during encoding, thereby further improving the encoding and decoding performance.
  • the video decoding system 1 includes an encoding end 10 and a decoding end 20 .
  • the encoding end 10 generates encoded video data. Therefore, the encoder 10 can be referred to as a video encoding device.
  • the decoding end 20 may decode the encoded video data generated by the encoding end 10 . Therefore, the decoding end 20 may be referred to as a video decoding device.
  • Various implementations of encoding end 10, decoding end 20, or both may include one or more processors and memory coupled to the one or more processors.
  • the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures accessible by the computer, as described herein.
  • the encoding end 10 and decoding end 20 may comprise a variety of devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, etc. , televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
  • Decoding end 20 may receive encoded video data from encoding end 10 via link 30 .
  • Link 30 may include one or more media or devices capable of moving encoded video data from encoding end 10 to decoding end 20 .
  • link 30 may include one or more communication media that enable encoder 10 to transmit encoded video data directly to decoder 20 in real time.
  • the encoding end 10 may modulate the encoded video data according to a communication standard (eg, a wireless communication protocol), and may transmit the modulated video data to the decoding end 20 .
  • the one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the encoding end 10 to the decoding end 20 .
  • encoded data may be output from output interface 140 to storage device 40 .
  • encoded data may be accessed from storage device 40 through input interface 240 .
  • Storage device 40 may comprise any of a variety of distributed or locally accessed data storage media, such as hard drives, Blu-ray discs, digital video discs (DVDs), compact discs read-only memory, CD-ROM), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by the encoding end 10 .
  • the decoder 20 can access the stored video data from the storage device 40 via streaming or downloading.
  • the file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the decoding terminal 20 .
  • Example file servers include a web server (eg, for a website), a file transfer protocol (FTP) server, a network attached storage (NAS) device, or a local disk drive.
  • the decoder 20 may access the encoded video data through any standard data connection, including an Internet connection.
  • Wi-Fi wireless-fidelity
  • DSL digital subscriber line
  • the transmission of encoded video data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.
  • the image decoding method provided by this application can be applied to video codec to support various multimedia applications, such as over-the-air TV broadcasting, cable TV transmission, satellite TV transmission, streaming video transmission (for example, via the Internet), for storage in data Encoding of video data on a storage medium, decoding of video data stored on a data storage medium, or other applications.
  • the video coding system 1 can be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • the video coding system 1 illustrated in FIG. 1 is merely an example, and the techniques of this application are applicable to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. .
  • data is retrieved from local storage, streamed over a network, and so on.
  • a video encoding device may encode and store data to memory, and/or a video decoding device may retrieve and decode data from memory.
  • encoding and decoding are performed by devices that do not communicate with each other but merely encode and/or retrieve data from memory and decode data to memory.
  • the encoder 10 includes a video source 120 , a video encoder 100 and an output interface 140 .
  • output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter.
  • Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface for receiving video data from a video content provider, and/or a computer for generating video data A graphics system, or a combination of such sources of video data.
  • Video encoder 100 may encode video data from video source 120 .
  • the encoding end 10 directly transmits the encoded video data to the decoding end 20 via the output interface 140 .
  • the encoded video data may also be stored on the storage device 40 for later access by the decoding end 20 for decoding and/or playback.
  • the decoder 20 includes an input interface 240 , a video decoder 200 and a display device 220 .
  • input interface 240 includes a receiver and/or a modem.
  • Input interface 240 may receive encoded video data via link 30 and/or from storage device 40 .
  • the display device 220 may be integrated with the decoding end 20 or may be external to the decoding end 20 . In general, display device 220 displays decoded video data.
  • the display device 220 may include various display devices, for example, a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) display or other types of display devices.
  • video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer units. or other hardware and software to handle the encoding of both audio and video in a common data stream or in separate data streams.
  • the demultiplexer (MUX-DEMUX) unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP), if applicable.
  • Each of video encoder 100 and video decoder 200 may be implemented as any of a variety of circuits such as: one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (application-specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA), discrete logic, hardware, or any combination thereof. If the application is implemented partially in software, a device may store instructions for the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors Thereby implementing the technology of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as a combined encoder in a corresponding device /decoder (codec) part.
  • codec device /decoder
  • This application may generally refer to video encoder 100 as “signaling” or “transmitting” certain information to another device, such as video decoder 200 .
  • the term “signaling” or “transmitting” may generally refer to the transmission of syntax elements and/or other data used to decode compressed video data. This transfer can occur in real time or near real time. Alternatively, this communication may take place after a period of time, such as when encoding stores syntax elements in a coded codestream to a computer-readable storage medium, and the decoding device may then store the syntax elements on this medium. The syntax element is retrieved at any time.
  • H.265 JCT-VC developed the H.265 (HEVC) standard.
  • the HEVC standardization is based on an evolution model of a video decoding device called the HEVC test model (HEVC model, HM).
  • HEVC model, HM HEVC test model
  • the latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265, the latest version of the standard document is H.265 (12/16), the standard document is in full text Incorporated herein by reference.
  • the HM assumes that the video decoding device has several additional capabilities relative to the existing algorithms of ITU-TH.264/AVC. For example, H.264 provides 9 intra-frame prediction coding modes, while HM can provide up to 35 intra-frame prediction coding modes.
  • H.266 Test Model an evolution model of video decoding devices.
  • the algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet, and the latest algorithm description is included in JVET-F1001-v2, which is incorporated by reference in its entirety .
  • the reference software for the JEM test model is available from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, also incorporated herein by reference in its entirety.
  • HM coding tree unit
  • LCUs largest coding units
  • CTU coding tree unit
  • Treeblocks have a similar purpose to macroblocks of the H.264 standard.
  • a slice contains a number of consecutive treeblocks in decoding order.
  • a video frame or image may be divided into one or more slices.
  • Each treeblock may be split into coding units (CUs) according to a quadtree. For example, a tree block that is the root node of a quadtree may be split into four child nodes, and each child node may in turn be a parent node and be split into four additional child nodes.
  • the final non-splittable child nodes that are leaf nodes of the quadtree include decoding nodes, eg, decoded video blocks.
  • decoding nodes eg, decoded video blocks.
  • syntax data associated with a decoded codestream may define the maximum number of times a treeblock may be split, and may also define the minimum size of a decoding node.
  • the size of the CU corresponds to the size of the decoding node and must be square in shape.
  • the size of a CU may range from 8x8 pixels up to the size of a treeblock up to 64x64 pixels or larger.
  • a video sequence usually consists of a sequence of video frames or images.
  • a group of picture (group of picture, GOP) exemplarily includes a series, one or more video images.
  • a GOP may include syntax data in header information of the GOP, in header information of one or more of the pictures, or elsewhere, that describes the number of pictures included in the GOP.
  • Each slice of a picture may contain slice syntax data describing the encoding mode of the corresponding picture.
  • Video encoder 100 typically operates on video blocks within individual video slices in order to encode video data.
  • a video block may correspond to a decoding node within a CU.
  • Video blocks may have fixed or varying sizes, and may differ in size according to a specified decoding standard.
  • NxN and N by N are used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, eg, 16x16 pixels or 16 by 16 pixels.
  • an NxN block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value.
  • the pixels in a block may be arranged into rows and columns.
  • a block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction.
  • a block may include NxM pixels, where M is not necessarily equal to N.
  • video encoder 100 may calculate residual data for the CU.
  • a CU may include pixel data in the spatial domain (also referred to as the pixel domain), and the CU may include a transform (e.g., a discrete cosine transform (DCT), an integer transform, a discrete wavelet transform, or a conceptually similar transform). ) applied to the coefficients in the transform domain after the residual video data.
  • the residual data may correspond to pixel differences between pixels of the uncoded picture and predictor values corresponding to the CU.
  • Video encoder 100 may form a CU including residual data and generate transform coefficients for the CU.
  • video encoder 100 may perform quantization of the transform coefficients.
  • Quantization illustratively refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients, thereby providing further compression.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be truncated down to an m-bit value during quantization, where n is greater than m.
  • the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate entropy-encodeable serialized vectors.
  • video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may perform context-based adaptive variable-length code (CAVLC), context-adaptive binary arithmetic decoding (context-based based adaptive binary arithmetic coding (CABAC), syntax-based adaptive binary arithmetic coding (syntax-based adaptive binary arithmetic coding, SBAC), probability interval partitioning entropy (probability interval partitioning entropy, PIPE) decoding or other entropy decoding methods to Entropy decodes a 1D vector.
  • Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 in decoding the video data.
  • video encoder 100 may assign contexts within a context model to symbols to be transmitted.
  • the context may be related to whether a symbol's adjacent value is non-zero.
  • video encoder 100 may select a variable length code for symbols to be transmitted. Codewords in variable-length codes (VLC) can be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC can achieve code rate savings relative to using equal length codewords for each symbol to be transmitted. Probabilities in CABAC can be determined based on the context assigned to the symbols.
  • the video encoder may perform inter-frame prediction to reduce temporal redundancy between images.
  • the CU currently being decoded by the video decoder may be referred to as the current CU.
  • the image currently being decoded by the video decoder may be referred to as the current image.
  • Fig. 2 is an exemplary block diagram of a video encoder provided by the present application.
  • the video encoder 100 is used to output the video to the post-processing entity 41 .
  • Post-processing entity 41 represents an example of a video entity that may process encoded video data from video encoder 100, such as a media-aware network element (MANE) or a splicing/editing device.
  • MEM media-aware network element
  • post-processing entity 41 may be an instance of a network entity.
  • post-processing entity 41 and video encoder 100 may be parts of a single device, while in other cases the functionality described with respect to post-processing entity 41 may be provided by the same device including video encoder 100 implement.
  • post-processing entity 41 is an instance of storage device 40 of FIG. 1 .
  • video encoder 100 includes prediction processing unit 108, filter unit 106, decoded picture buffer (decoded picture buffer, DPB) 107, summer 112, transformer 101, quantizer 102, and entropy Encoder 103.
  • the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109 .
  • the video encoder 100 also includes an inverse quantizer 104 , an inverse transformer 105 and a summer 111 .
  • the filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF) and a sample adaptive offset (SAO) filter.
  • ALF adaptive loop filter
  • SAO sample adaptive offset
  • filter unit 106 is shown in FIG. 2 as an in-loop filter, in other implementations filter unit 106 may be implemented as a post-loop filter.
  • the video encoder 100 may further include a video data storage and a segmentation unit (not shown in the figure).
  • the video data storage may store video data to be encoded by the components of the video encoder 100 .
  • the video data stored in the video data storage may be obtained from video source 120 .
  • DPB 107 may be a reference picture memory that stores reference video data used by video encoder 100 to encode video data in intra, inter coding modes.
  • the video data memory and DPB 107 may be formed from any of a variety of memory devices such as dynamic random access memory (DRAM) including synchronous dynamic random access memory (SDRAM), magnetoresistive RAM (magnetic random access memory, MRAM), resistive RAM (resistive random access memory, RRAM), or other types of memory devices.
  • Video data memory and DPB 107 may be provided by the same memory device or separate memory devices.
  • video encoder 100 receives video data and stores the video data in a video data memory.
  • the segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on quadtree structure or binary tree structure. Such partitioning may also include partitioning into slices, tiles, or other larger units.
  • Video encoder 100 generally illustrates components that encode image blocks within a video slice to be encoded. The slice may be divided into tiles (and possibly into collections of tiles called slices).
  • Prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. Prediction processing unit 108 may provide the resulting intra-, inter-coded blocks to summer 112 to produce a residual block and to summer 111 to reconstruct the encoded block used as a reference picture.
  • the intra predictor 109 within the prediction processing unit 108 may perform intra predictive coding of the current image block relative to one or more adjacent blocks in the same frame or slice as the current block to be coded to remove spatial redundancy .
  • the inter predictor 110 within the prediction processing unit 108 may perform inter predictive encoding of the current image block relative to one or more prediction blocks in one or more reference images to remove temporal redundancy.
  • the inter-frame predictor 110 may be used to determine an inter-frame prediction mode for encoding the current image block.
  • the inter predictor 110 can use the rate-distortion analysis to calculate the rate-distortion values of various inter prediction modes in the set of candidate inter prediction modes, and select the rate-distortion value with the best rate-distortion characteristic
  • Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and the original unencoded block that was encoded to produce the encoded block, as well as the bit rate (also That is, the number of bits).
  • the inter predictor 110 may determine the inter prediction mode with the smallest rate distortion cost for encoding the current image block in the set of candidate inter prediction modes as the inter prediction mode for performing inter prediction on the current image block.
  • the inter-frame predictor 110 is configured to predict motion information (such as a motion vector) of one or more sub-blocks in the current image block based on the determined inter-frame prediction mode, and utilize the motion information (such as a motion vector) of one or more sub-blocks in the current image block (such as motion vector) to obtain or generate the prediction block of the current image block.
  • the inter predictor 110 may locate the predictive block pointed to by the motion vector in one of the reference picture lists.
  • the inter predictor 110 may also generate syntax elements associated with the image blocks and video slices for use by the video decoder 200 when decoding the image blocks of the video slice.
  • the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block; it should be understood that the The inter predictor 110 performs motion estimation and motion compensation processes.
  • the inter-frame predictor 110 may provide information indicating the selected inter-frame prediction mode of the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the instruction Information about the selected inter prediction mode.
  • the intra predictor 109 may perform intra prediction on the current image block.
  • the intra predictor 109 may determine the intra prediction mode used to encode the current block.
  • the intra predictor 109 can use the rate-distortion analysis to calculate the rate-distortion values of various intra prediction modes to be tested, and select the one with the best rate-distortion characteristic among the modes to be tested Intra prediction mode.
  • the intra predictor 109 may provide information indicative of the selected intra prediction mode for the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the indication Information about the selected intra prediction mode.
  • the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
  • Summer 112 represents one or more components that perform this subtraction operation.
  • the residual video data in the residual block may be included in one or more (transform units, TUs) and applied to the transformer 101 .
  • a transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform.
  • Transformer 101 may convert residual video data from a pixel value domain to a transform domain, such as a frequency domain.
  • Transformer 101 may send the resulting transform coefficients to quantizer 102 .
  • Quantizer 102 quantizes the transform coefficients to further reduce the bit rate.
  • quantizer 102 may then perform a scan of the matrix including the quantized transform coefficients.
  • the entropy encoder 103 may perform a scan.
  • the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 may perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE ) encoding or another entropy encoding method or technique.
  • CAVLC context-adaptive variable-length coding
  • CABAC context-adaptive binary arithmetic coding
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the encoded codestream may be transmitted to video decoder 200 , or archived for later transmission or retrieval by video decoder 200 .
  • the entropy encoder 103 may also perform entropy encoding on syntax elements of the current image block to
  • Inverse quantizer 104 and inverse variator 105 respectively apply inverse quantization and inverse transformation to reconstruct the residual block in the pixel domain, eg for later use as a reference block of a reference image.
  • the summer 111 adds the reconstructed residual block to the prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block.
  • the filter unit 106 may be applied to the reconstructed image blocks to reduce artifacts, such as block artifacts.
  • This reconstructed image block is then stored in the decoded image buffer 107 as a reference block, which may be used by the inter predictor 110 as a reference block for inter prediction of blocks in subsequent video frames or images.
  • the video encoder 100 may be used to encode video streams.
  • the video encoder 100 can directly quantize the residual signal without processing by the transformer 101, and correspondingly does not need to be processed by the inverse transformer 105; or, for some image blocks Or image frame, video encoder 100 does not produce residual data, does not need to process through transformer 101, quantizer 102, inverse quantizer 104 and inverse transformer 105 accordingly; Or, video encoder 100 can be reconstructed image
  • the block is directly stored as a reference block without being processed by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 can be combined together.
  • FIG. 3 is an exemplary block diagram of a video decoder 200 provided in this application.
  • video decoder 200 includes entropy decoder 203 , prediction processing unit 208 , inverse quantizer 204 , inverse transformer 205 , summer 211 , filter unit 206 and DPB 207 .
  • the prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209 .
  • video decoder 200 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 100 from FIG. 2 .
  • video decoder 200 receives from video encoder 100 an encoded video bitstream representing image blocks of an encoded video slice and associated syntax elements.
  • the video decoder 200 may receive video data from the network entity 42, and optionally, store the video data in a video data storage (not shown in the figure).
  • the video data memory may store video data, such as an encoded video code stream, to be decoded by the components of the video decoder 200 .
  • the video data stored in the video data storage may be obtained, for example, from the storage device 40, from a local video source such as a camera, via wired or wireless network communication of the video data, or by accessing a physical data storage medium.
  • the video data memory may serve as a decoded picture buffer (CPB) for storing encoded video data from an encoded video bitstream. Therefore, although the video data storage is not shown in Fig. 3, the video data storage and DPB 207 can be the same storage, or can be separate storage.
  • the video data memory and DPB 207 may be formed from any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM) , or other types of memory devices.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • Network entity 42 may be a server, MANE, video editor/slicer, or other device for implementing one or more of the techniques described above.
  • Network entity 42 may or may not include a video encoder, such as video encoder 100 .
  • network entity 42 may implement portions of the techniques described in this application.
  • network entity 42 and video decoder 200 may be part of a separate device, while in other cases the functionality described with respect to network entity 42 may be performed by the same device including video decoder 200 .
  • network entity 42 may be an instance of storage device 40 of FIG. 1 .
  • the entropy decoder 203 of the video decoder 200 entropy-decodes the codestream to generate quantized coefficients and some syntax elements. Entropy decoder 203 forwards the syntax elements to prediction processing unit 208 . Video decoder 200 may receive syntax elements at a video slice level and/or a tile level.
  • the intra predictor 209 of the prediction processing unit 208 may base on the signaled intra prediction mode and the previous decoded block from the current frame or picture data to generate a prediction block for an image block of the current video slice.
  • the inter predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203 , the An inter-frame prediction mode for decoding the current image block of the video slice, based on the determined inter-frame prediction mode, the current image block is decoded (for example, inter-frame prediction is performed).
  • the inter-frame predictor 210 may determine whether to use a new inter-frame prediction mode to predict the current image block of the current video slice, if the syntax element indicates that the new inter-frame prediction mode is used to predict the current image block, based on A new inter-frame prediction mode (such as a new inter-frame prediction mode specified by a syntax element or a default new inter-frame prediction mode) predicts the current image block of the current video slice or the sub-block of the current image block Motion information, so that the predicted motion information of the current image block or the sub-block of the current image block is used to acquire or generate the prediction block of the current image block or the sub-block of the current image block through the motion compensation process.
  • a new inter-frame prediction mode such as a new inter-frame prediction mode specified by a syntax element or a default new inter-frame prediction mode
  • the motion information here may include reference image information and motion vectors, wherein the reference image information may include but not limited to unidirectional/bidirectional prediction information, reference image list number and reference image index corresponding to the reference image list.
  • the predicted block may be generated from one of the reference pictures within one of the reference picture lists.
  • Video decoder 200 may construct reference picture lists, List 0 and List 1, based on the reference pictures stored in DPB 207.
  • the reference frame index for the current picture may be included in one or more of reference frame list 0 and list 1 .
  • the video encoder 100 signals a specific syntax element indicating whether a new inter-frame prediction mode is used to decode a specific block, or alternatively, it may be signaling whether a new inter-frame prediction mode is used, And a specific syntax element indicating which new inter prediction mode is specifically adopted to decode a specific block.
  • the inter predictor 210 here performs a motion compensation process.
  • the inverse quantizer 204 inverse quantizes, ie dequantizes, the quantized transform coefficients provided in the code stream and decoded by the entropy decoder 203 .
  • the inverse quantization process may include using quantization parameters calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and likewise determining the degree of inverse quantization that should be applied.
  • An inverse transformer 205 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients to produce a residual block in the pixel domain.
  • the video decoder 200 predicts The blocks are summed to obtain a reconstructed block, ie a decoded image block.
  • Summer 211 represents the component that performs this summing operation.
  • a loop filter (either in the decoding loop or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality, if desired.
  • Filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although filter unit 206 is shown in FIG.
  • filter unit 206 may be implemented as a post-loop filter.
  • the filter unit 206 is adapted to reconstruct the block to reduce block distortion, and the result is output as a decoded video stream.
  • the decoded image blocks in a given frame or image can be stored in the decoded image buffer 207, via the DPB 207 to store reference images for subsequent motion compensation.
  • the DPB 207 may be part of a memory that may also store decoded video for later presentation on a display device (such as display device 220 of FIG. 1 ), or may be separate from such memory.
  • the video decoder 200 can generate an output video stream without being processed by the filter unit 206; or, for some image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, correspondingly It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205 .
  • the techniques of this application may be performed by any of the video encoders or video decoders described in this application, such as video encoder 100 and video decoder 200 shown and described with respect to FIGS. 1-3 . That is, in one possible implementation, the video encoder 100 described with respect to FIG. 2 may perform certain techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the video decoder 200 described with respect to FIG. 3 may perform certain techniques described below when performing inter prediction during decoding of a block of video data. Thus, references to "video encoder” or “video decoder” in general may include video encoder 100, video decoder 200, or another video encoding or decoding unit.
  • Figures 1 to 3 are only examples provided by the present application.
  • the video encoder 100, the video decoder 200, and the video decoding system may include more or less components or units, which are not included in the present application. limited.
  • this application provides a possible implementation of video encoding/decoding, as shown in Figure 4, which is a video encoding system provided by this application.
  • Schematic diagram of the process of encoding/decoding the implementation of video encoding/decoding includes process 1 to process 5, and process 1 to process 5 can be performed by any one of the above-mentioned encoding end 10, video decoder 100, decoding end 20 or video decoder 200. or multiple executions.
  • Process 1 Divide a frame of image into one or more non-overlapping parallel coding units. There is no dependency between the one or more parallel encoding units, and they can be completely parallel/independently encoded and decoded, as shown in FIG. 4 , the parallel encoding unit 1 and the parallel encoding unit 2 .
  • each parallel coding unit it can be divided into one or more independent coding units that do not overlap with each other.
  • Each independent coding unit can be independent of each other, but can share some parallel coding unit header information.
  • an independent coding unit has a width of w_lcu and a height of h_lcu. If the parallel coding unit is divided into an independent coding unit, the size of the independent coding unit is exactly the same as that of the parallel coding unit; otherwise, the width of the independent coding unit should be larger than the height (except for the edge area).
  • the independent coding unit can be fixed w_lcu ⁇ h_lcu, both w_lcu and h_lcu are 2 to the Nth power (N ⁇ 0), for example, the size of the independent coding unit is: 128 ⁇ 4, 64 ⁇ 4, 32 ⁇ 4, 16 ⁇ 4, 8 ⁇ 4, 32 ⁇ 2, 16 ⁇ 2 or 8 ⁇ 2 etc.
  • the independent coding unit may be a fixed 128 ⁇ 4. If the size of the parallel coding unit is 256 ⁇ 8, the parallel coding unit can be equally divided into 4 independent coding units; if the size of the parallel coding unit is 288 ⁇ 10, the parallel coding unit is divided into: the first/second row One 128 ⁇ 4+1 32 ⁇ 4 independent coding unit; the third row is two 128 ⁇ 2+1 32 ⁇ 2 independent coding unit.
  • the independent coding unit can include three components of luma Y, chroma Cb, and chroma Cr, or three components of red (red, R), green (green, G), and blue (blue, B) , or only one of them may be included. If the independent coding unit includes three components, the sizes of the three components may be exactly the same or different, which is specifically related to the input format of the image.
  • each independent coding unit it can be divided into one or more non-overlapping coding units.
  • Each coding unit in an independent coding unit can depend on each other. For example, multiple coding units can perform mutual reference precoding and decoding .
  • the size of the coding unit is the same as that of the independent coding unit (that is, the independent coding unit is only divided into one coding unit), then its size can be all the sizes described in process 2.
  • the feasible division examples are: horizontal equal division (the height of the coding unit is the same as that of the independent coding unit, but the width is different, which can be 1/2, 1/4, 1/8, 1/16, etc.), vertical equal division (the width of the coding unit is the same as that of the independent coding unit, and the height is different, which can be 1/2, 1/4, 1/8, 1/16, etc.), horizontal and Vertical equal division (quadtree division), etc., preferably horizontal equal division.
  • the width of the coding unit is w_cu, and the height is h_cu, so its width should be greater than its height (unless it is an edge area).
  • the coding unit can be fixed w_cu x h_cu, both of w_cu and h_cu are 2 powers of N (N is greater than or equal to 0), such as 16x4, 8x4, 16x2, 8x2, 8x1, 4x1, etc.
  • the coding unit may be fixed 16x4. If the size of the independent coding unit is 64x4, the independent coding unit can be equally divided into 4 coding units; if the size of the independent coding unit is 72x4, the coding unit can be divided into four 16x4+1 8x4.
  • the coding unit may include three components of luma Y, chroma Cb, and chroma Cr (or three components of red R, green G, and blue B), or only one of them. If it contains three components, the sizes of several components can be exactly the same or different, depending on the image input format.
  • process 3 is an optional step in the video encoding and decoding method, and the video encoder/decoder can encode/decode the residual coefficient (or residual value) of the independent coding unit obtained in the process 2.
  • Process 4 For the coding unit, it can be divided into one or more non-overlapping prediction groups (Prediction Group, PG), PG can also be referred to as Group, and each PG is encoded and decoded according to the selected prediction mode.
  • the predicted value of the PG is obtained to form the predicted value of the entire coding unit, and the residual value of the coding unit is obtained based on the predicted value and the original value of the coding unit.
  • the coding unit Based on the residual value of the coding unit, the coding unit is grouped to obtain one or more non-overlapping residual blocks (residual block, RB), and the residual coefficients of each RB are encoded and decoded according to the selected mode , forming a stream of residual coefficients. Specifically, it can be divided into two types: transforming the residual coefficients and not transforming them.
  • the selected mode of the residual coefficient encoding and decoding method in the process 5 may include, but not limited to any of the following: semi-fixed length encoding method, exponential Golomb (Golomb) encoding method, Golomb-Rice encoding method, truncated unary code Encoding methods, run-length encoding methods, direct encoding of raw residual values, etc.
  • a video encoder may directly encode coefficients within RBs.
  • the video encoder may also perform transformation on the residual block, such as DCT, DST, Hadamard transformation, etc., and then encode the transformed coefficients.
  • transformation such as DCT, DST, Hadamard transformation, etc.
  • the video encoder may directly uniformly quantize each coefficient in the RB, and then perform binarization coding. If the RB is large, it can be further divided into multiple coefficient groups (coefficient groups, CG), and then each CG is uniformly quantized, and then binarized and encoded.
  • coefficient groups coefficient groups, CG
  • QG quantization group
  • the maximum value of the absolute value of the residual within an RB block is defined as the modified maximum value (mm).
  • the number of coded bits of the residual coefficient in the RB block is determined (the number of coded bits of the residual coefficient in the same RB block is the same). For example, if the critical limit (CL) of the current RB block is 2 and the current residual coefficient is 1, then 2 bits are required to encode the residual coefficient 1, which is expressed as 01. If the CL of the current RB block is 7, it means encoding 8-bit residual coefficient and 1-bit sign bit.
  • the determination of CL is to find the minimum M value that satisfies all the residuals of the current sub-block within the range of [-2 ⁇ (M-1), 2 ⁇ (M-1)]. If there are two boundary values -2 ⁇ (M-1) and 2 ⁇ (M-1) at the same time, M should be increased by 1, that is, M+1 bits are required to encode all residuals of the current RB block; if only - One of the two boundary values of 2 ⁇ (M-1) and 2 ⁇ (M-1), you need to encode a Trailing bit to determine whether the boundary value is -2 ⁇ (M-1) or 2 ⁇ (M-1 ); if any of -2 ⁇ (M-1) and 2 ⁇ (M-1) does not exist in all residuals, there is no need to encode the Trailing bit.
  • the video encoder can also directly encode the original value of the image instead of the residual value.
  • FIG. 5 is a schematic flow chart of an image decoding method provided in the present application.
  • the image decoding method can be applied to the video decoding system shown in FIG. 1 , and the image decoding method can be executed by the decoding terminal 20, specifically , the decoding method can be executed by the video decoder 200 included in the decoding end 20, please refer to FIG. 5 , the decoding method provided in this embodiment includes the following steps.
  • the video decoder 200 parses the acquired code stream to obtain one or more image frames.
  • the above-mentioned one image frame includes one or more CUs.
  • the video decoder 200 determines multiple QP values of an image frame.
  • one CU includes multiple QGs, and one QG corresponds to one QP value.
  • the aforementioned one CU includes a plurality of residual coefficients, and the one QG includes a part of the aforementioned plurality of residual coefficients, and the part of the residual coefficients shares one QP value.
  • the residual coefficient is also called a level value.
  • the residual coefficient and the level value may also be collectively referred to as the residual coefficient, which is represented by the residual coefficient in this embodiment, and should not be construed as a limitation to the present application.
  • a CU can be divided into multiple QGs, and each QG One or more residual coefficients in share a QP value, and then, the video decoder can make a finer-grained QP decision for one or more CUs corresponding to the image frame, and reduce the image quality while ensuring a certain compression rate.
  • the decoding of frames is distorted, which improves the authenticity and accuracy of video image decoding.
  • a QG may include one pixel or multiple pixels, and each pixel has a corresponding residual coefficient.
  • the quantization process of QG can also be called adaptive point-by-point quantization of image frames, and this point-by-point quantization can also be applied to CUs without QG division .
  • the feature of point-by-point quantization is that it allows each pixel to use a different QP, which is equivalent to refining the granularity of QP to the pixel level. In this way, the dequantization of the image frame by using the point-by-point quantization method can greatly improve the subjective quality of the image frame under the condition that the compression rate of the image frame remains unchanged.
  • the residual coefficients corresponding to the multiple pixels included in one QG may share one QP value.
  • the CU may be determined in the following manner: the video decoder 200 divides a CU included in an image frame according to a first rule, and acquires multiple QGs.
  • the first rule includes a division domain and a division manner, the division domain is a transform domain or a pixel domain, and the division manner includes at least one of uniform division and non-uniform division.
  • the positions of multiple residual coefficients included in a CU are marked by coordinates, and the marks may include an abscissa and a ordinate.
  • the position coordinates of the residual coefficients are (i, j), where i is the abscissa and j is the ordinate.
  • the process of video decoder 200 dividing a CU to obtain multiple QGs includes: combining multiple residual coefficients whose coordinate sum does not reach the first coordinate threshold The residual coefficients of are assigned to the first QG, and the coordinates and residual coefficients reaching the threshold of the first coordinate are assigned to the second QG.
  • the coordinate sum is the sum of the abscissa and ordinate of the residual coefficient.
  • Figure 6 is a schematic diagram of a transformation domain division provided by the present application.
  • a CU includes 16 regions corresponding to residual coefficients, where the coordinates of the residual coefficient in the upper left corner are (1,1), and the residual coefficient in the lower right corner is The coordinates of are (4,4).
  • QG(2-1) shows a QG dichotomy method in the transform domain.
  • the residual coefficient satisfying "i+j ⁇ threshold 1" is the first QG, and the others are
  • the threshold value 1 is 5.5.
  • the residual coefficient satisfying "i+j ⁇ threshold value 1" may be the first QG, and the others may be the second QG.
  • the process of video decoder 200 dividing a CU to obtain multiple QGs includes: combining multiple residual coefficients whose coordinate sum does not reach the first coordinate threshold The residual coefficients are divided into the first QG, the coordinates and the residual coefficients reaching the first coordinate threshold but not reaching the second coordinate threshold are divided into the second QG, and the coordinates and the residual coefficients reaching the second coordinate threshold are divided into the third QG.
  • the second coordinate threshold is greater than the first coordinate threshold.
  • QG(3-1) shows a QG three-point method in the transform domain.
  • the residual coefficient at (i, j) position the residual coefficient that satisfies "i+j ⁇ threshold 1" It is the first QG
  • the residual coefficient satisfying "threshold 1 ⁇ i+j ⁇ threshold 2" is the second QG
  • the residual coefficient satisfying "i+j ⁇ threshold 2" is the third QG
  • the residual coefficient satisfying "i+j ⁇ threshold 1" is the first QG
  • the residual coefficient satisfying "threshold 1 ⁇ i+j ⁇ threshold 2" is the second QG
  • the residual coefficient of threshold 2" is the third QG
  • the process of video decoder 200 dividing a CU to obtain multiple QGs includes: sorting the multiple residual coefficients, sorting the multiple residual coefficients Residual coefficients reaching the first ratio threshold are assigned to the first QG, and residual coefficients reaching the first ratio threshold are assigned to the second QG.
  • the sorting manner of the plurality of residual coefficients is any of the following: zigzag, reverse zigzag.
  • QG(2-2) shows a zigzag QG dichotomy in the transform domain
  • the first 7/16 (43.75%) residual coefficients are divided into the first QG
  • the remaining The residual coefficients are divided into the second QG.
  • QG (2-3) shows a QG dichotomy method in the transform domain of a reverse zigzag, and the residual coefficient of the first 7/16 (43.75%) is divided into the first QG, The remaining residual coefficients are divided into the second QG.
  • the process of video decoder 200 dividing a CU to obtain multiple QGs includes: sorting the multiple residual coefficients, sorting the multiple residual coefficients Residual coefficients that reach the first ratio threshold are assigned to the first QG, residual coefficients that reach the first ratio threshold but not the second ratio threshold are assigned to the second QG, and residual coefficients that reach the second ratio threshold are assigned to the second QG.
  • QG(3-2) shows a zigzag QG three-point method in the transform domain
  • the first 5/16 (31.25%) residual coefficients are divided into the first QG
  • the latter 3/16 (18.75%) of the residual coefficients are divided into the second QG
  • the remaining residual coefficients are divided into the third QG.
  • QG(3-3) shows a QG three-point method in the transformation domain of an inverse zigzag, and the first 5/16 (31.25%) residual coefficients are divided into the first QG , the last 3/16 (18.75%) residual coefficients are divided into the second QG, and the remaining residual coefficients are divided into the third QG.
  • the foregoing first to fourth possible examples are only examples given by this embodiment to describe division of transform domains in a CU, and should not be construed as limiting the present application.
  • the type of domain division is transform domain
  • the selection of the coordinate threshold and the ratio threshold can be determined according to the image content of the CU or the requirement of video codec, which is not limited in this application.
  • the transform domain can be divided into more QGs, such as 4, 5, 10 or more, which is not limited in this application.
  • the video decoder 200 divides the plurality of residual coefficients symmetrically in the horizontal or vertical direction to obtain two QGs containing the same number of residual coefficients. These two QGs contain residual coefficients in a ratio of 1:1.
  • FIG. 7 is a schematic diagram of a pixel domain division provided by the present application.
  • QG_pixel (pixel, P)(2-1) shows an example of performing symmetrical dichotomy in the vertical direction of the CU.
  • QG_P(2-4 ) shows an example of performing symmetrical bisection in the horizontal direction of the CU.
  • the video decoder 200 divides the multiple residual coefficients symmetrically in the horizontal or vertical direction to obtain three QGs.
  • the number of residual coefficients contained in two non-adjacent QGs among the three QGs is the same, and the number of residual coefficients contained in another QG is consistent with the sum of the number of residual coefficients contained in two non-adjacent QGs .
  • QG_P(3-1) shows an example of symmetrical third division in the vertical direction of the CU.
  • the QGs on both sides contain the same number of residual coefficients.
  • the residual coefficients contained in these three QGs are The ratio of the number of difference coefficients is 1:2:1.
  • QG_P(3-4) shows an example of performing symmetrical third division in the vertical direction of the CU, and the ratio of the number of residual coefficients contained in these three QGs is 1:2:1.
  • the video decoder 200 divides the plurality of residual coefficients horizontally or vertically, and obtains two QGs containing inconsistent numbers of residual coefficients.
  • QG_P(2-2) as shown in FIG. 7 provides an example of bisection in the vertical direction of the CU, and the ratio of the residual coefficients included in the two QGs is 1:3.
  • QG_P(2-3) gives an example of bisection in the vertical direction of the CU, and the ratio of the residual coefficients included in the two QGs is 3:1.
  • QG_P(2-5) gives an example of bisection in the horizontal direction of the CU, and the ratio of the residual coefficients included in the two QGs is 1:3.
  • QG_P(2-6) gives an example of bisection in the vertical direction of the CU, and the ratio of the residual coefficients included in the two QGs is 3:1.
  • the video decoder 200 divides the multiple residual coefficients horizontally or vertically to obtain three QGs.
  • the residual coefficients contained in these three QGs do not have a symmetrical relationship.
  • QG_P(3-2) as shown in FIG. 7 gives an example of performing three divisions in the vertical direction of the CU, and the ratio of the residual coefficients contained in these three QGs is 1:1:2.
  • QG_P(3-3) gives an example of performing three divisions in the vertical direction of the CU, and the ratio of the residual coefficients included in the three QGs is 2:1:1.
  • QG_P(3-5) gives an example of performing three divisions in the horizontal direction of the CU, and the ratio of the residual coefficients included in the three QGs is 1:1:2.
  • QG_P(3-6) gives an example of three divisions in the vertical direction of the CU, and the ratio of the residual coefficients included in the three QGs is 2:1:1.
  • the foregoing fifth to eighth possible examples are only examples given by this embodiment to describe division of pixel domains in a CU, and should not be construed as limiting the present application.
  • the ratio of the residual coefficients included in the QG can be determined according to the image content of the CU or the requirement of video codec, which is not limited in this application.
  • the pixel domain can be divided into more QGs, such as 4, 5, 10 or more, which is not limited in this application.
  • the video decoder 200 adopts different QG division methods for the residual coefficients in the pixel domain and the transform domain, and the QP quantization process is also different, thereby reducing the distortion of image decoding.
  • QP quantization refer to the description of the process of obtaining the QP value in this article, which will not be repeated here.
  • the image decoding method provided by this embodiment further includes the following step S530 .
  • the video decoder 200 decodes an image frame according to multiple QP values.
  • a decoded image is obtained.
  • the decoded video of the code stream is obtained.
  • the video decoder can perform QP quantization for image decoding at the granularity of QG. Since one CU can be divided into multiple QGs, and one QG corresponds to one QP value, compared to all Using the same QP value for the residual coefficients will result in greater image distortion.
  • the video decoder can make a finer-grained QP decision for one or more CUs corresponding to the image frame. In this case, the decoding distortion of the image frame is reduced, and the authenticity and accuracy of video image decoding are improved.
  • the video encoder/video decoder will obtain the QP value of each QG (or CU).
  • a QP value corresponding to a QG includes a luma QP value and a chroma QP value.
  • the brightness QP value (QP_Y) refers to the QP value required to quantize or dequantize the brightness (Luminance or Luma) of the image frame
  • the chroma QP value refers to the quantization or dequantization of the chrominance (Chrominance or Chroma) of the image frame. Quantize the desired QP value.
  • the video decoder 200 determines multiple QP values of an image frame, which may include the following two possible situations.
  • the video decoder 200 acquires a QG luma QP value and chrominance QP value respectively.
  • the video decoder 200 acquires a QG luma QP value; secondly, the video decoder 200 determines a QG chrominance QP value based on the luma QP value.
  • the chroma QP is QP_Y plus the QP offset value of the picture parameter set (picture parameter set, PPS) layer and the slice layer.
  • the above-mentioned luma QP value and chrominance QP value may be obtained by the video decoder 200 by parsing the code stream.
  • the video decoder 200 may first obtain the CU-level QP value, and then obtain the QG-level QP value in the CU.
  • the video decoder 200 can adopt methods such as truncated unary codes, truncated Rice codes, or exponential Golomb codes, etc.
  • the included QP value is used to dequantize the residual coefficient (horizontal value) included in QG.
  • the flag information may carry the QP value of the CU.
  • the video decoder 200 can determine the QP value of the CU according to the flag information carried in the code stream, which avoids the video decoder deriving the image frames in the code stream to obtain the CU
  • the QP value reduces the computing resource consumption of the video decoder and improves the efficiency of image decoding.
  • the flag information may carry the QP value of any one of the multiple QGs.
  • the flag information may carry the QP value of the CU and the QP value of any QG in the CU.
  • the tag information can carry the QP value of the QG, which prevents the video decoder from deriving the image frames in the code stream to obtain the QP value of the QG, reducing the The computing resource consumption of the video decoder improves the efficiency of image decoding.
  • the following takes the video decoder 200 to determine the QP value of a QG in an image frame as an example to describe in detail, as shown in FIG.
  • the process of obtaining the QP may be implemented by a video decoder or a video encoder.
  • the video decoder 200 is taken as an example for illustration.
  • the process of obtaining the QP includes the following steps.
  • the video decoder 200 acquires a predicted QP value of the QG.
  • the video decoder 200 uses the QP value of the CU where the QG is located as the predicted QP value of the QG.
  • the QP value of the CU may be determined by the video decoder 200 analyzing the flag information of the code stream.
  • the video decoder 200 may first obtain the QP value of at least one other QG adjacent to the QG in the CU where the QG is located, and determine the QP value of the aforementioned QG according to the QP value of the at least one other QG. Predict QP values. For example, the video decoder 200 uses the QP values of other QGs adjacent to the QG as the predicted QP value of the QG.
  • the video decoder 200 acquires the QP offset of the QG.
  • the QP offset can be represented by deltaQP.
  • this embodiment provides two possible implementation manners.
  • the video decoder 200 may determine the QP offset of the QG by using the flag information carried in the code stream, as shown in S820A in FIG. 8 .
  • the video decoder 200 parses the bitstream to obtain flag information indicating the QP offset of the QG.
  • the video decoder 200 After the video decoder 200 acquires the code stream, it parses the code stream to obtain the marking information of the image frame where the QG is located, and the marking information is used to indicate the QP offset (deltaQP) of the QG.
  • the marking information is used to indicate the QP offset (deltaQP) of the QG.
  • the video decoder 200 may use the derivation information to determine the QP offset of the QG, as shown in S820B in FIG. 8 .
  • the video decoder 200 determines the QP offset of the QG according to the derivation information of the QG.
  • the derivation information may be any one or a combination of several of the following: flatness information or texture information of the QG, remaining space of the code stream buffer, or distortion constraint information.
  • the flatness information or texture information is used to indicate the image gradient of the QG;
  • the distortion constraint information indicates the distortion threshold of any one of the multiple QGs included in an image frame;
  • the remaining space of the code stream buffer is used to indicate The available headroom of stream buffers (such as buffer space).
  • the derivation information is flatness information or texture information
  • the video decoder 200 can derive the QP offset of QG according to the flatness information or texture information.
  • the video decoder 200 calculates the texture complexity of the current block (QG). For a QG with high texture complexity (such as reaching the texture complexity threshold), a large QP (such as 20) is used; texture complexity threshold), use a small QP (such as 5).
  • the derivation information is the remaining space of the code stream buffer, and the video decoder 200 calculates the average number of bits BPPtotal of all pixels in the entire image, and the average number of bits BPPleft of the remaining uncoded pixels. If BPPleft>BPPtotal, Then decrease QP, otherwise increase QP.
  • BPPtotal and BPPleft can be obtained by using the following formulas.
  • the necessary and sufficient condition for the maximum distortion in the pixel domain not to exceed ⁇ is ⁇ D ⁇ max ⁇ ⁇ , from which the QP value in the pixel domain can be derived.
  • the sufficient condition for the maximum distortion in the pixel domain not to exceed ⁇ is: the maximum distortion in the transform domain satisfies From this the QP value in the transform domain can be derived.
  • the process of obtaining the QP provided by this embodiment further includes the following steps.
  • the video decoder 200 determines the QP value of the QG according to the predicted QP value of the QG and the QP offset.
  • the video decoder 200 uses the sum of the predicted QP value of a QG and the QP offset as the QP value of the QG.
  • Video decoder 200 determines the corresponding predicted distortion according to the reference QP value of QG, if the predicted distortion is less than or equal to the distortion threshold, the reference QP value is used as the QP value of QG; if the predicted distortion is greater than the distortion threshold, the QP value determined by the distortion threshold QP value as QG.
  • the reference QP value refers to the predicted QP value of the QG determined in S810.
  • the reference QP value may refer to the deltaQP determined by the texture information (or flatness information) and the predicted QP value determined by S810 Add the obtained QP values.
  • the reference QP value may be obtained by adding the deltaQP determined by the remaining space of the code stream buffer and the predicted QP value determined by S810 QP value.
  • the above derivation information can be deltaQP used to determine QG, or the actual coded QP value used to directly determine QG.
  • the specific use process of derivation information can be based on the requirements of QP quantization/inverse quantization in video codec To be sure, the above three situations and three examples should not be construed as limiting the present application.
  • the aforementioned examples 1 to 3 are only examples in this embodiment to illustrate that the video decoder 200 uses derivation information to determine the QP value of QG, and should not be construed as limiting the present application.
  • the derivation information may also include distortion Constraint information, texture information (or flatness information) and the remaining space of the code stream buffer are not limited in this application.
  • an image frame includes at least a first part of CUs and a second part of CUs, the first part of CUs and the second part of CUs do not have overlapping areas, and the QP values of the first part of CUs and the second part of CUs Obtained in different ways.
  • the acquisition method of the QP value of the first part of CUs is carried by the tag information of the code stream, and the acquisition method of the QP value of the second part of CUs is derived by the video decoder 200 .
  • the process for the video decoder 200 to determine multiple QP values of an image frame may include the following processes: First, the video decoder 200 parses the code stream to obtain the tag information of an image frame, and the tag information includes the first part of CU QP offset, the video decoder 200 determines the QP value of the first part of the CU according to the flag information; secondly, the video decoder 200 obtains the predicted QP value of the second part of the CU for the second part of the CU, and the video decoder 200 also Determine the QP value of the second part of CUs according to the predicted QP value of the second part of CUs and the derivation information.
  • the relevant content of the derivation information refer to the foregoing description of the derivation information of the QG, and just replace the QG with the CU, which will not be repeated here.
  • the video decoder 200 divides an image into several regions, and adopts different QP processing manners for CUs in different regions.
  • the benchmark QP values of different regions are transmitted at the image level, and the label information of different regions is transmitted at the CU level.
  • the CU-level QP of different regions can be obtained through code stream transmission or derivation at the decoder.
  • the video decoder divides the image into regions of interest (region of interset, ROI) and non-ROI regions.
  • the QP value is obtained through code stream transmission (such as the above-mentioned mark information) ;
  • the QP value is obtained through derivation at the decoding end (such as the above-mentioned derivation information).
  • one CU of one image frame may include multiple QGs, and the multiple QGs may be partly QP-quantized, or all of them may be QP-quantized.
  • the video decoder 200 may first determine the scanning order of all QGs included in a CU of an image frame . Then, the video decoder 200 acquires the QP value of each QG in scanning order for each QG among all the QGs.
  • the scanning order includes any one of the following: top-to-bottom, left-to-right, zigzag or reverse zigzag order.
  • a QP offset may be encoded for each of these QGs.
  • the video decoder 200 can analyze the code stream to determine a marked QG in a CU of an image frame or a plurality of QGs, the marked one or more QGs need to be QP dequantized in the decoding process; then, the video decoder 200 obtains the QP of each QG for each QG in the marked one or more QGs QP value.
  • the scanning order is related to the division method, which can be from top to bottom, from left to right, Zigzag or reverse zigzag order.
  • a QP offset may be encoded for each of these QGs.
  • the flag information carried by the code stream can be used to distinguish the QP quantization mode (partial quantization or full quantization) of all QGs, avoiding the indiscriminate QP of the video decoder Quantization, which reduces the computing resources and image distortion required by the video decoder for QP quantization, and improves the decoding efficiency and accuracy of the video.
  • this point-by-point quantization technique comprises the following process: video decoder 200 can adaptively adjust the current pixel according to the reconstructed pixel information around the current pixel point QP value.
  • the reconstructed pixel information includes but not limited to pixel value, flatness information or texture information, background brightness, contrast and so on.
  • the adaptive point-by-point quantization technology can be applied to QG, and can also be applied to CUs without QG division.
  • the point-by-point quantization technology is characterized by allowing each pixel in an image frame to use a different QP value, which is equivalent to refining the granularity of the QP value quantization to the pixel level.
  • QP pred be the QP value of the current CU or QG
  • QP JND ⁇ 0 is the QP value corresponding to Just Noticeable Distortion (JND) by the human eye
  • offset>0 is the QP offset value (it can be transmitted in the code stream, or can be set in advance)
  • threshold 2 >threshold 1.
  • the video decoder adopts the point-by-point quantization technology, which can greatly improve the subjective quality of the image frame and reduce the distortion of the image frame without changing the compression rate of the image frame.
  • the video decoder 200 acquires a predicted QP value of a pixel, and determines the QP value of a pixel according to the predicted QP value of a pixel and derivation information.
  • the predicted QP value of a pixel is the QP value of the CU or QG where the pixel is located.
  • the predicted QP value of a pixel is derived from the QP values of one or more reconstructed pixels around the pixel, wherein the derivation method includes calculating the mean value (such as multiple pixel At least one of the average value of the QP value of the point), the median (such as the median of the QP value of multiple pixel points) or the mode (such as the QP value with the highest frequency among the QP values of multiple pixel points) .
  • the above-mentioned derivation information of the pixel point may be information of one or more reconstructed pixel points around the pixel point.
  • the reconstructed pixel information includes any one or a combination of several of the following: pixel value, flatness information or texture information, background brightness, and contrast. It should be noted that the foregoing information is only an example provided by this embodiment, and should not be construed as a limitation to this application.
  • Figure 9 is a pixel provided by this application Schematic diagram of point distribution
  • (A) in Figure 9 shows a schematic diagram of a square area division centered on the current pixel point as an example, and two possible situations are given here: Case 1, the reconstructed pixel point is Refers to the pixels in a square area with the current pixel as the center and a side length of 3, such as the surrounding pixel 1 shown in (A) of Figure 9; in case 2, the reconstructed pixel refers to the current pixel As the center, the pixels in a square area with a side length of 5, such as the surrounding pixel points 2 shown in (A) of FIG. 9 .
  • (B) in Figure 9 shows an example of a diamond-shaped area division centered on the current pixel point, and two possible situations are given here: Case 1, the reconstructed pixel point refers to the current pixel point as the center, the pixels in the diamond-shaped area with a diagonal length of 3, such as the surrounding pixel 1 shown in (B) of Figure 9; in case 2, the reconstructed pixel refers to the current pixel as the center, Pixels in a diamond-shaped area with a diagonal length of 5, such as the surrounding pixel 2 shown in (B) of FIG. 9 .
  • FIG. 9 is only an example for illustrating the reconstructed pixel points around the current pixel point in this embodiment, and should not be construed as a limitation to the present application.
  • the reconstructed pixel points around the current pixel point may also refer to one or two pixel points adjacent up and down, or adjacent to the current pixel point.
  • determining the QP value of a pixel point according to the predicted QP value and derivation information of a pixel point includes: determining the pixel point according to the information of one or more reconstructed pixel points around a pixel point If the indication information is less than or equal to the first threshold, and the predicted QP value is greater than or equal to the QP value corresponding to the distortion that can be detected by the human eye, then the QP value corresponding to the distortion that can be detected by the human eye is taken as the QP value of the pixel .
  • the first threshold may be preset, or may be determined according to the compression rate or distortion rate requirements of the video codec. In addition, the first threshold may also be determined according to user input information.
  • the QP value corresponding to the distortion that can be detected by the human eye is image-level or CU-level information.
  • the QP value corresponding to the distortion just perceivable by the human eye is obtained by parsing the bitstream, for example, the bitstream carries the QP value corresponding to the distortion just perceivable by the human eye, such as 20.
  • the QP value corresponding to the distortion that can be perceived by the human eye is derived based on the flatness information or texture information, background brightness, and contrast information of the surrounding reconstructed CU.
  • the process of deriving and obtaining the QP value reference may be made to the above-mentioned relevant content in FIG. 9 , which will not be repeated here.
  • the QP value corresponding to the distortion just detectable by human eyes may also be a preset value set by a video encoder or a video decoder, such as 15. That is to say, the QP value corresponding to the distortion that can be detected by the human eye can not only be carried in the tag information of the code stream, but also can be parsed by the video encoder or video decoder during the video encoding and decoding process, and can also be a preset value The QP value.
  • the QP value corresponding to the distortion that can be detected by the human eye is introduced into the QP value decision of the current pixel, so that each pixel meets the judgment information corresponding to the distortion that can be detected by the human eye, which reduces image distortion and improves the subjective quality of the image. quality.
  • offset>0 is a QP offset value (which can be transmitted in the code stream or can be set in advance).
  • Method 1 When the indication information is less than or equal to the threshold, the smaller value of QP pred and QP JND is used as the QP value of the current pixel; when the indication information is greater than the threshold, the predicted QP value (QP pred ) of the pixel is used as The QP value of the current pixel.
  • Method 2 When the indication information is less than or equal to the threshold, the smaller value of QP pred and QP JND is used as the QP value of the current pixel; when the indication information is greater than the threshold, the difference between QP pred and QP offset (offset) And as the QP value of the current pixel.
  • Method 3 When the indication information is less than or equal to the threshold, use QP pred as the QP value of the current pixel; when the indication information is greater than the threshold, use the sum of QP pred and QP offset (offset) of the pixel as the current pixel The QP value.
  • the QP value corresponding to the distortion that can be detected by the human eye is introduced into the QP value decision-making process of the current pixel, so that each pixel can meet the judgment information corresponding to the distortion that can be detected by the human eye, which reduces the image distortion and improves the quality of the image. subjective quality.
  • FIG. 10 is a schematic flowchart of an image frame decoding provided by the present application.
  • S530 may include The following steps S1010-S1030.
  • the video decoder can obtain the Qstep through at least one of formula derivation and table lookup, and four possible implementation methods are provided below.
  • Method 2 You may wish to record octave as the rank of QP, that is, every time QP increases octave, Qstep doubles, and octave is usually selected as 6 or 8. Record offset as an integer offset value, then:
  • Method 3 You may wish to record octave as the rank of QP, that is, every time QP increases octave, Qstep doubles, and octave is usually selected as 6 or 8. Record offset as an integer offset value, Represents rounding up and rounding down, respectively. remember
  • Method 4 An example of quantization and dequantization is given below.
  • c is the residual coefficient to be quantized (transform domain or pixel domain)
  • l is the quantized level value
  • c′ is the reconstructed value after inverse quantization
  • Qstep is the quantization step size
  • f ⁇ [0,1) is the control
  • the rounding parameter, [0,1-f) is the quantization dead zone (the interval with a horizontal value of 0).
  • method 3 When quantizing residuals in the pixel domain, method 3 requires a clip operation to ensure that the quantization coefficient can be represented by "M-T” bits, and the quantization coefficient of method 4 can naturally be represented by "M-T” bits without clipping.
  • the parameter f is related to the length of the quantization dead zone, the smaller f is, the longer the quantization dead zone is, and the closer the quantized horizontal value is to zero.
  • f ⁇ 0.5 the smaller f is, the larger the quantization distortion is, and the lower the code rate of coefficient coding is.
  • the above S530 may further include the following steps.
  • the quantizer combination includes one or more quantizers, and the quantizers are uniform quantizers or non-uniform quantizers.
  • a uniform quantizer refers to a uniform scalar quantizer
  • its quantization or inverse quantization formula can refer to the quantization and inverse quantization formula provided by the fourth method of S1010 above.
  • the parameter f ⁇ [0,1) has the following methods:
  • Method 1 f is set to 0.5 or other fixed values.
  • Method 2 f can be adaptively determined according to the QP value, prediction mode, and whether transformation is performed.
  • the non-uniform quantizer refers to a non-uniform scalar quantizer, and the corresponding relationship between the quantization level value, the quantization interval and the reconstruction value of inverse quantization can be obtained by looking up a table.
  • Table 1 A possible implementation manner is shown in Table 1 below.
  • the reconstruction value and the quantization interval are non-uniform, and the reconstruction value is the probability centroid of the quantization interval.
  • Quantizer combinations can use one non-uniform scalar quantizer, or multiple non-uniform scalar quantizers.
  • the quantizer combination is determined by the flag information carried in the code stream.
  • Another example is determined by the distribution of residual coefficients in QG.
  • the video decoder adaptively selects which quantizer to use, and the selection basis may be mode information or transformation information, and the mode information or transformation information is related to the distribution of residual coefficients in the QG.
  • dequantizing the level value of QG may include the following process.
  • a quantization matrix that matches the parameter information of the QG is selected from the matrix template library at the decoder.
  • the matrix template library includes various types of quantization matrix templates, and the parameter information includes any one or a combination of the following: the size of the QG, the size of the CU in which the QG is located, luma and chroma channel information, and flatness information.
  • the QG quantization matrix is used to dequantize the level value in QG to obtain the residual coefficient of QG.
  • the residual coefficients at different positions in the pixel domain are equally important, so no quantization matrix is used.
  • the coefficients in the transform domain are divided into low-frequency coefficients and high-frequency coefficients. Through the quantization matrix, the high and low-frequency coefficients adopt different quantization steps, which can improve the subjective quality of the image while ensuring a certain compression rate.
  • the element distribution of the quantization matrix has a specific template.
  • This application allows encoding blocks of different sizes to adopt different quantization matrix templates, and a large-size quantization matrix can be obtained by upsampling one or more small-size quantization matrices.
  • the quantized matrix template included in the matrix template library is obtained by any one or more of the following transformation methods: discrete cosine transform (DCT), discrete sine transform (DST), integer transform or discrete wavelet transform (DWT).
  • DCT discrete cosine transform
  • DST discrete sine transform
  • DWT discrete wavelet transform
  • FIG. 11 is a schematic diagram of a quantization matrix template provided by the present application, and the quantization matrix template specifically includes.
  • the multiple types of quantization matrix templates included in the matrix template library may include flat block templates and texture block templates.
  • the Qstep of the residual coefficients whose frequency is higher than the frequency threshold in the flat block template is greater than or equal to the Qstep of the residual coefficients whose frequency does not reach the frequency threshold in the flat block template.
  • the Qstep of the residual coefficients whose frequency is higher than the frequency threshold in the texture block template is less than or equal to the Qstep of the residual coefficients whose frequency does not reach the frequency threshold in the texture block template.
  • the video decoder marks the current block as a flat block or a texture block according to the flatness information, and then designs the quantization matrix template according to the texture masking effect: (1), if the current block (QG) is a flat block, the high-frequency coefficients in the quantization matrix
  • the Qstep of is greater than or equal to the Qstep of the low-frequency coefficient, because the human eye is more sensitive to the low-frequency distortion of the flat block than the high-frequency distortion, so the high-frequency coefficient allows greater loss.
  • the Qstep of the high-frequency coefficient in the quantization matrix is less than or equal to the Qstep of the low-frequency coefficient, because the human eye is more sensitive to the high-frequency distortion of the texture block than the low-frequency distortion, so the priority is to protect the texture block high frequency coefficient.
  • the video decoder first obtains the QP and Qstep of QG, parses the horizontal value from the code stream, and then adaptively selects the quantizer to dequantize the horizontal value to obtain the reconstruction value, thus Implements decoding of image frames.
  • a CU in the decoding process of the video image provided by this embodiment, can be divided into multiple QGs, One or more residual coefficients in each QG share a QP value.
  • the video decoder can make a finer-grained QP decision for one or more CUs corresponding to the image frame. Under the condition of ensuring a certain compression rate, The decoding distortion of the image frame is reduced, and the authenticity and accuracy of video image decoding are improved.
  • the video encoder first obtains the QP, Qstep and residual coefficient of QG; adaptively selects the quantizer to quantize the residual coefficient; finally adjusts the quantization coefficient to obtain the final level value, thus Implements encoding of image frames.
  • this application also provides an image encoding method, as shown in FIG. 12 , which is a schematic flowchart of an image encoding method provided by this application.
  • the image The encoding method can be executed by the video encoder 100, and can also be executed by an encoding end (encoding end 10 shown in FIG. 1 ) that supports the function of the video encoder 100.
  • the implementation of the encoding method by the video encoder 100 is taken as an example. It should be noted that the image encoding method includes the following steps.
  • the video encoder 100 divides an image frame into one or more CUs.
  • the video encoder 100 determines multiple QP values of an image frame.
  • a CU includes multiple QGs, and one QG corresponds to one QP value.
  • one CU includes multiple pixels, one pixel corresponds to one QP value, and at least two of the multiple pixels have different QP values.
  • the video encoder 100 encodes an image frame according to multiple QP values.
  • a CU in the encoding process of the video image provided by this embodiment, can be divided into multiple QGs (or Pixels), one or more residual coefficients in each QG share a QP value, and then the video encoder can make a finer-grained QP decision for one or more CUs corresponding to the image frame, ensuring a certain compression rate In the case of , the coding distortion of the image frame is reduced, and the authenticity and accuracy of the video image coding are improved.
  • the video encoder/video decoder includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • FIG. 13 is a schematic structural diagram of a decoding device provided in the present application.
  • the decoding device 1300 includes a code stream analysis unit 1310 , a QP decision unit 1320 and an image decoding unit 1330 .
  • the decoding device 1300 can be used to implement the functions of the video decoder or the decoding end in the above decoding method embodiments, and thus can also realize the beneficial effects of the above decoding method embodiments.
  • the decoding device 1300 may be the decoding end 20 or the video decoder 200 as shown in FIG. 1, or the video decoder 200 as shown in FIG. 20 or the module of video decoder 200.
  • the code stream analysis unit 1310, the QP decision unit 1320 and the image decoding unit 1330 are used to realize the decoding method provided by any one of the embodiments in Fig. 4 to Fig. 11 . More detailed descriptions about the code stream parsing unit 1310, QP decision unit 1320, and image decoding unit 1330 can be directly obtained by referring to the relevant descriptions in the method embodiments shown in FIG. 4 to FIG. 11 , and will not be repeated here.
  • FIG. 14 is a schematic structural diagram of an encoding device provided in the present application.
  • the encoding device 1400 includes: an image segmentation unit 1410 , a QP decision unit 1420 and an image encoding unit 1430 .
  • the encoding device 1400 can be used to implement the functions of the video encoder or the encoding end in the above decoding method embodiments, and thus can also realize the beneficial effects of the above decoding method embodiments.
  • the encoding device 1400 may be the encoding end 10 or the video encoder 100 as shown in FIG. 1, or the video encoder 100 as shown in FIG. 10 or modules of the video encoder 100.
  • the image segmentation unit 1410, the QP decision unit 1420 and the image encoding unit 1430 are used to implement the encoding method provided in FIG. 12 . More detailed descriptions about the image segmentation unit 1410, QP decision unit 1420, and image encoding unit 1430 can be directly obtained by referring to the relevant descriptions in the method embodiments shown in FIG. 4 to FIG. 12, and will not be repeated here.
  • the present application also provides an electronic device, as shown in FIG. 15 , which is a schematic structural diagram of an electronic device provided in the present application.
  • the electronic device 1500 includes a processor 1510 and an interface circuit 1520 .
  • the processor 1510 and the interface circuit 1520 are coupled to each other.
  • the interface circuit 1520 may be a transceiver or an input/output interface.
  • the electronic device 1500 may further include a memory 1530 for storing instructions executed by the processor 1510 or storing input data required by the processor 1510 to execute the instructions or storing data generated by the processor 1510 after executing the instructions.
  • the electronic device 1500 includes a processor 1510 and a communication interface 1520 .
  • the processor 1510 and the communication interface 1520 are coupled to each other.
  • the communication interface 1520 may be a transceiver or an input and output interface.
  • the electronic device 1500 may further include a memory 1530 for storing instructions executed by the processor 1510 or storing input data required by the processor 1510 to execute the instructions, or storing data generated after the processor 1510 executes the instructions.
  • the processor 1510 and the interface circuit 1520 are used to execute the functions of the code stream analysis unit 1310 , the QP decision unit 1320 and the image decoding unit 1350 .
  • the processor 1510 and the interface circuit 1520 are used to execute the functions of the image segmentation unit 1410 , the QP decision unit 1420 and the image encoding unit 1430 .
  • a specific connection medium among the communication interface 1520, the processor 1510, and the memory 1530 is not limited.
  • the communication interface 1520, the processor 1510, and the memory 1530 are connected through the bus 1540.
  • the bus is represented by a thick line in FIG. 15, and the connection mode between other components is only for schematic illustration. , is not limited.
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 15 , but it does not mean that there is only one bus or one type of bus.
  • the memory 1530 can be used to store software programs and modules, such as the program instructions/modules corresponding to the decoding method or encoding method provided in the embodiment of the present application.
  • the processor 1510 executes various programs and modules by executing the software programs and modules stored in the memory 1530.
  • the communication interface 1520 can be used for signaling or data communication with other devices. In this application, the electronic device 1500 may have multiple communication interfaces 1520 .
  • the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), a neural processing unit (neural processing unit, NPU) or a graphics processing unit (graphic processing unit, GPU), or It can be other general-purpose processors, digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), field programmable gate arrays (field programmable gate arrays, FPGAs) or other programmable logic devices , transistor logic devices, hardware components or any combination thereof.
  • a general-purpose processor can be a microprocessor, or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC can be located in a network device or a terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media. Described usable medium can be magnetic medium, for example, floppy disk, hard disk, magnetic tape; It can also be optical medium, for example, digital video disc (digital video disc, DVD); It can also be semiconductor medium, for example, solid state drive (solid state drive) , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种图像的解码方法、编码方法及装置,涉及视频编解码领域。该解码方法包括:首先,解析码流以获得一个或多个图像帧,一个图像帧包括一个或多个CU;其次,确定所述一个图像帧的多个QP值;其中,一个CU包括多个QG,一个QG对应一个QP值;最后,依据所述多个QP值对所述一个图像帧进行解码。

Description

解码方法、编码方法及装置
相关申请的交叉引用
本申请要求于2021年11月11日提交的、申请号为202111334223.8的中国专利申请的优先权,该申请以引用的方式并入本文中。
技术领域
本申请涉及视频编解码领域,尤其涉及一种图像的解码方法、编码方法及装置。
背景技术
在视频编解码技术中,视频压缩技术尤为重要。视频压缩技术执行空间(图像内)预测和/或时间(图像间)预测,以减少或移除视频序列中固有的冗余信息。视频压缩的基本原理是,对空域、时间域和码字之间的相关性进行量化,尽可能去除冗余。量化(quantization)是指将信号的连续取值(或大量的离散取值)映射为有限多个离散幅值的过程,实现信号取值多对一的映射。
目前的做法是,对于一帧图像包括的一个或多个编码单元(coding unit,CU),编码端获取每个CU的量化参数(quantization parameter,QP),并依据该QP对CU进行编码获得码流;相应的,解码端对码流进行反量化(dequantization),获得CU的QP,并依据该QP对CU进行解码。CU是依据图像内容划分的,编码端和解码端对一个CU对应的图像内容采用相同的QP进行量化,导致图像编解码过程的量化失真较大。
发明内容
本申请提供一种图像的解码方法、编码方法及装置,解决了图像编解码过程的量化失真较大的问题。
本申请采用如下技术方案。
第一方面,本申请提供了一种图像的解码方法,所述方法可应用于视频译码系统,或者该方法可应用于可以支持视频译码实现该方法的解码端,例如该解码端包括视频解码器,所述方法包括:首先,解析码流以获得一个或多个图像帧,一个图像帧包括一个或多个CU;其次,确定所述一个图像帧的多个QP值;其中,一个CU包括多个QG,一个QG对应一个QP值;最后,依据所述多个QP值对所述一个图像帧进行解码。
相较于一个CU中所有的残差系数都采用相同的QP值导致的图像失真较大,在本实施例提供的视频图像的解码过程中,一个CU可以被划分为多个QG,每个QG中的一个或多个残差系数共用一个QP值,进而,视频解码器可以对图像帧对应的一个或多个CU进行更精细粒度的QP决策,在保证一定压缩率的情况下,降低了图像帧的解码失真,提高了视频图像解码的真实性和准确性。
在一种可选的实现方式中,所述一个CU包含多个残差系数,所述一个QG包含所述多个残差系数中的一部分残差系数,所述一部分残差系数共用所述一个QP值。
在一种可选的实现方式中,在所述确定所述一个图像帧的多个QP值之前,所述方法还包括:按照第一规则划分所述一个图像帧包括的一个CU,获取所述多个QG;其中,所述第一规则包括划分域和划分方式,所述划分域为变换域或像素域,所述划分方式包括均匀划分和非均匀划分中至少一种。
在一种可选的实现方式中,所述一个CU包含多个残差系数,所述多个残差系数的位置由坐标进行标记,所述坐标包括横坐标和纵坐标。若所述划分域为变换域,则按照第一规则划分所述一个图像帧包括的一个CU,获取所述多个QG,包括:将所述多个 残差系数中,坐标和未达到第一坐标阈值的残差系数划分至第一QG,所述坐标和达到所述第一坐标阈值的残差系数划分至第二QG;所述坐标和为残差系数的横坐标与纵坐标之和。
或者,将所述多个残差系数中,坐标和未达到第一坐标阈值的残差系数划分至第一QG,所述坐标和达到所述第一坐标阈值、未达到第二坐标阈值的残差系数划分至第二QG,所述坐标和达到所述第二坐标阈值的残差系数划分至第三QG;所述第二坐标阈值大于所述第一坐标阈值。
在一种可选的实现方式中,所述一个CU包含多个残差系数,若所述划分域为变换域,按照第一规则划分所述一个图像帧包括的一个CU,获取所述多个QG,包括:
对所述多个残差系数进行排序,将所述多个残差系数中未达到第一比例阈值的残差系数划分至第一QG,达到所述第一比例阈值的残差系数划分至第二QG;所述多个残差系数的排序方式为以下任一种:Z字形、反向Z字形。
或者,对所述多个残差系数进行排序,将所述多个残差系数中未达到第一比例阈值的残差系数划分至第一QG,达到所述第一比例阈值、但未达到第二比例阈值的残差系数划分至第二QG,达到所述第二比例阈值的残差系数划分至第三QG;所述第二比例阈值大于所述第一比例阈值。
在一种可选的实现方式中,所述一个CU包含多个残差系数,若所述划分域为像素域,则按照第一规则划分所述一个图像帧包括的一个CU,获取所述多个QG,包括:将所述多个残差系数进行水平或垂直方向的对称划分,获得包含的残差系数的数量一致的两个QG。
或者,将所述多个残差系数进行水平或垂直方向的对称划分,获得三个QG;所述三个QG中两个不相邻的QG包含的残差系数的数量一致,且另一个QG包含的残差系数的数量与所述两个不相邻的QG包含的残差系数的数量和是一致的。
或者,将所述多个残差系数进行水平或垂直方向的划分,获得包含的残差系数的数量不一致的两个QG。
或者,将所述多个残差系数进行水平或垂直方向的划分,获得三个QG;所述三个QG包含的残差系数不存在对称关系。
在一种可选的实现方式中,所述一个QG对应的一个QP值包括亮度QP值和色度QP值。确定所述一个图像帧的多个QP值,包括:分别获取所述一个QG的亮度QP值和色度QP值。
或者,获取所述一个QG的亮度QP值;基于所述亮度QP值确定所述一个QG的色度QP值。
在一种可选的实现方式中,确定所述一个图像帧的多个QP值,包括:解析所述码流以获得所述一个图像帧的标记信息,所述标记信息用于指示所述一个QG的QP值,和/或,所述标记信息用于指示所述一个CU的QP值。
在一种可选的实现方式中,确定所述一个图像帧的多个QP值,包括:第一,解析所述码流以获得所述一个图像帧的标记信息,所述标记信息用于指示所述一个QG的QP偏移量;第二,依据所述一个QG的预测QP值和所述标记信息,确定所述一个QG的QP值。
在一种可选的实现方式中,依据所述一个QG的预测QP值和所述标记信息,确定所述一个QG的QP值,包括:获取所述一个QG的预测QP值,并将所述一个QG的预测QP值与所述QP偏移量之和作为所述一个QG的QP值。
在一种可选的实现方式中,确定所述一个图像帧的多个QP值,包括:获取所述一个QG的预测QP值,并依据所述一个QG的预测QP值和推导信息,确定所述一个QG的QP值;其中,所述推导信息为以下任意一种或几种的组合:所述一个QG的平坦度信息或纹理度信息、所述码流缓冲区的剩余空间或失真约束信息。
在一种可选的实现方式中,若所述推导信息为所述失真约束信息,所述失真约束信息指示了所述多个QG中任一个QG的失真阈值。则依据所述一个QG的预测QP值和推导信息,确定所述一个QG的QP值,包括:确定所述预测QP值对应的预测失真;若所述预测失真小于或等于所述失真阈值,将所述预测QP值作为所述QG的QP值;若所述预测失真大于所述失真阈值,将由所述失真阈值确定的QP值作为所述QG的QP值。
在一种可选的实现方式中,若所述推导信息为所述一个QG的内容信息或所述码流缓冲区的剩余空间,则依据所述一个QG的预测QP值和推导信息,确定所述一个QG的QP值,包括:依据所述推导信息确定所述一个QG的QP偏移量;将所述一个QG的预测QP值与所述QP偏移量之和作为所述一个QG的QP值。
在一种可选的实现方式中,获取所述一个QG的预测QP值,包括:获取所述一个CU中与所述一个QG相邻的至少一个其他QG的QP值;依据所述至少一个其他QG的QP值,确定所述一个QG的预测QP值。
或者,将所述一个CU的QP值作为所述一个QG的预测QP值。
在一种可选的实现方式中,所述一个图像帧至少包括第一部分CU和第二部分CU,所述第一部分CU和所述第二部分CU不具有重叠区域,且所述第一部分CU和所述第二部分CU的QP值的获取方式不同。
在一种可选的实现方式中,确定所述一个图像帧的多个QP值,包括:解析所述码流以获得所述一个图像帧的标记信息,所述标记信息包括所述第一部分CU的QP偏移量;依据所述标记信息确定所述第一部分CU的QP值。以及,针对于所述第二部分CU,获取所述第二部分CU的预测QP值;依据所述第二部分CU的预测QP值和推导信息,确定所述第二部分CU的QP值;其中,所述推导信息为以下任意一种或几种的组合:所述第二部分CU的平坦度信息或纹理度信息、所述码流缓冲区的剩余空间或失真约束信息。
在一种可选的实现方式中,依据所述多个QP值对所述一个图像帧进行解码,包括:首先,针对于所述多个QP值中每一个QP值,获取所述QP值对应的量化步长Qstep;其次,获取所述QP值对应的QG所包含的水平值;最后,依据选择的量化器组合,对所述QG的水平值进行反量化;所述量化器组合包括一个或多个量化器。示例的,所述量化器为均匀量化器或非均匀量化器。
在一种可选的实现方式中,所述量化器组合是通过所述码流携带的标记信息确定的,或者,残差系数在所述QG中的分布情况确定的。
在一种可选的实现方式中,对所述QG的水平值进行反量化,包括:首先,确定所述QG的划分域类型。其次,若所述QG为划分域类型为变换域,则从所述解码端的矩阵模板库中选择与所述QG的参数信息匹配的量化矩阵;所述矩阵模板库包括多种类型的量化矩阵模板,所述参数信息包括以下任意一种或多种的组合:所述QG的尺寸、所述QG所在的CU的尺寸、亮色度通道信息和平坦度信息。最后,利用所述QG的量化矩阵对所述QG中的水平值进行反量化,获得所述QG的残差系数。
在一种可选的实现方式中,所述多种类型的量化矩阵模板包括平坦块模板和纹理块模板;所述平坦块模板中频率高于频率阈值的残差系数的Qstep大于或等于所述平坦块模板中频率未达到所述频率阈值的残差系数的Qstep;所述纹理块模板中频率高于频率阈值的残差系数的Qstep小于或等于所述纹理块模板中频率未达到所述频率阈值的残差系数的Qstep。
在一种可选的实现方式中,所述矩阵模板库包括的量化矩阵模板由以下任意一种或多种类型的变换方式获得:离散余弦变换(discrete cosine transform,DCT)、离散正弦变换(discrete sine transform,DST)、整数变换或离散小波变换(discrete wave transform,DWT)。
在一种可选的实现方式中,所述一个QG包括所述一个图像帧的一个或多个像素点。
在一种可选的实现方式中,确定所述一个图像帧的多个QP值,包括:解析所述码流以确定所述一个CU中被标记的一个或多个QG;所述被标记的一个或多个QG在解码过程中需进行反量化,所述一个CU中未被标记的QG不进行反量化;针对于所述被标记的一个或多个QG中每一个QG,获取所述每一个QG的QP值。
或者,确定所述一个CU包括的所有QG的扫描顺序;其中,所述扫描顺序包括以下任意一种或几种的组合:从上到下、从左到右、Z字形或反向Z字形顺序;针对于所述所有QG中的每一个QG,按照所述扫描顺序获取所述每一个QG的QP值。
在一种可选的实现方式中,所述多个QG中至少两个QG对应的QP值是不同的。
第二方面,本申请提供一种图像的解码方法,所述方法可应用于视频译码系统,或者该方法可应用于可以支持视频译码实现该方法的解码端,所述方法由解码端执行,所述方法包括:解析码流以获得一个或多个图像帧,一个图像帧包括一个或多个CU;确定所述一个图像帧的多个QP值;其中,一个CU包括多个像素点,一个像素点对应一个QP值,且所述多个像素点中至少两个像素点的QP值不同;依据所述多个QP值对所述一个图像帧进行解码。
在一种可选的实现方式中,确定所述一个图像帧的多个QP值,包括:获取所述一个像素点的预测QP值,并依据所述一个像素点的预测QP值和推导信息,确定所述一个像素点的QP值;其中,推导信息为所述一个像素点周围的一个或多个已重建像素点的信息。
在另一种可选的实现方式中,所述一个像素点的预测QP值为所述一个像素点所在的CU或QG的QP值,或者根据所述一个像素点周围的一个或多个已重建像素点的QP值推导得到,其中,推导的方法包括计算均值、中位数或众数中至少一种。
在另一种可选的实现方式中,所述已重建像素点为以所述一个像素点为中心,边长为3或5的正方形区域中的像素点,或者对角线长度为3或5的菱形区域中的像素点。
在另一种可选的实现方式中,所述已重建像素点的信息包括以下任意一种或几种的组合:像素值、平坦度信息或纹理度信息、背景亮度、对比度。
在另一种可选的实现方式中,依据所述一个像素点的预测QP值和推导信息,确定所述一个像素点的QP值,包括:根据所述一个像素点周围的一个或多个已重建像素点的信息确定所述像素点的指示信息;如果所述指示信息小于或等于第一阈值,并且所述预测QP值大于或等于人眼恰可察觉失真对应的QP值,则将人眼恰可察觉失真对应的QP值作为所述像素点的QP值。其中,人眼恰可察觉失真对应的QP值是预设值(如编码端或解码端预设的图像级或CU级信息),所述人眼恰可察觉失真对应的QP值是从码流中解析得到(图像级或CU级传输),或者,人眼恰可察觉失真对应的QP值是根据周围的已重建CU的平坦度信息或纹理度信息、背景亮度、对比度信息推导得到。
第三方面,本申请提供一种图像的编码方法,所述方法可应用于视频译码系统,或者该方法可应用于可以支持视频译码实现该方法的编码端,所述方法由编码端执行,所述方法包括:将一个图像帧划分为一个或多个CU;确定所述一个图像帧的多个QP值;其中,一个CU包括多个QG,一个QG对应一个QP值;依据所述多个QP值对所述一个图像帧进行编码。
第四方面,本申请提供一种图像的编码方法,所述方法可应用于视频译码系统,或者该方法可应用于可以支持视频译码实现该方法的编码端,所述方法由编码端执行,所述方法包括:将一个图像帧划分为一个或多个CU;确定所述一个图像帧的多个QP值;其中,一个CU包括多个像素点,一个像素点对应一个QP值,且所述多个像素点中至少两个像素点的QP值不同;依据所述多个QP值对所述一个图像帧进行编码。
第五方面,本申请提供一种图像的解码装置,解码装置应用于解码端中,该解码装置包括用于实现第一方面或第二方面中任一种可能实现方式中方法的各个模块。如码流 解析单元、QP决策单元和图像解码单元。
有益效果可以参见第一方面或第二方面中任一方面的描述,此处不再赘述。所述解码装置具有实现上述第一方面或第二方面中任一方面的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
第六方面,本申请提供一种图像的编码装置,编码装置应用于编码端中,该编码装置包括用于实现第三方面或第四方面中任一种可能实现方式中的方法的各个模块。如所述编码装置包括:图像切分单元、QP决策单元和图像编码单元。有益效果可以参见第三方面或第四方面中任一种的描述,此处不再赘述。所述编码装置具有实现上述第三方面或第四方面中任一种的方法实例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
第七方面,本申请提供一种电子设备,包括处理器和存储器,所述存储器用于存储计算机指令,所述处理器用于从存储器中调用并运行所述计算机指令,以实现第一方面至第四方面中任一种实现方式的方法。
例如,该电子设备可以是指视频编码器,或包括视频编码器的编码端。
又如,该电子设备可以是指视频解码器,或包括视频解码器的解码端。
第八方面,本申请提供一种计算机可读存储介质,存储介质中存储有计算机程序或指令,当计算机程序或指令被计算设备或计算设备所在的存储系统执行时,以实现第一方面至第四方面中任一种实现方式的方法。
第九方面,本申请提供一种计算机程序产品,该计算程序产品包括指令,当计算机程序产品在计算设备或处理器上运行时,使得计算设备或处理器执行该指令,以实现第一方面至第四方面中任一种实现方式的方法。
第十方面,本申请提供一种视频译码系统,该视频译码系统包括编码端和解码端,解码端用于实现第一方面至第二方面中任一种实现方式的方法,编码端用于实现第三方面至第四方面中任一种实现方式的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种视频译码系统的示例性框图;
图2为本申请提供的一种视频编码器的示例性框图;
图3为本申请提供的一种视频解码器的示例性框图;
图4为本申请提供的一种视频编/解码的流程示意图;
图5为本申请提供的一种图像的解码方法的流程示意图;
图6为本申请提供的一种变换域划分的示意图;
图7为本申请提供的一种像素域划分的示意图;
图8为本申请提供的一种采用预测编码方式获取QP的流程示意图;
图9为本申请提供的一种像素点的分布示意图;
图10为本申请提供的一种图像帧解码的流程示意图;
图11为本申请提供的量化矩阵模板的示意图;
图12为本申请提供的一种图像的编码方法的流程示意图;
图13为本申请提供的一种解码装置的结构示意图;
图14为本申请提供的一种编码装置的结构示意图;
图15为本申请提供的一种电子设备的结构示意图。
具体实施方式
为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍。
图1为本申请提供的一种视频译码系统的示例性框图,如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本申请中,术语“视频译码”或“译码”可一般地指代视频编码或视频解码。视频译码系统1的视频编码器100和视频解码器200用于根据本申请提出的多种新的帧间预测模式中的任一种所描述的各种方法实例来预测当前经译码图像块或其子块的运动信息,例如运动矢量,使得预测出的运动矢量最大程度上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,从而进一步的改善编解码性能。
如图1中所示,视频译码系统1包含编码端10和解码端20。编码端10产生经编码视频数据。因此,编码端10可被称为视频编码装置。解码端20可对由编码端10所产生的经编码的视频数据进行解码。因此,解码端20可被称为视频解码装置。编码端10、解码端20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。
编码端10和解码端20可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
解码端20可经由链路30从编码端10接收经编码视频数据。链路30可包括能够将经编码视频数据从编码端10移动到解码端20的一或多个媒体或装置。在一个实例中,链路30可包括使得编码端10能够实时将经编码视频数据直接发射到解码端20的一或多个通信媒体。在此实例中,编码端10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到解码端20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(radio frequency,RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从编码端10到解码端20的通信的其它设备。
在另一实例中,可将经编码数据从输出接口140输出到存储装置40。类似地,可通过输入接口240从存储装置40存取经编码数据。存储装置40可包含多种分布式或本地存取的数据存储媒体中的任一者,例如硬盘驱动器、蓝光光盘、数字通用光盘(digital video disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码视频数据的任何其它合适的数字存储媒体。
在另一实例中,存储装置40可对应于文件服务器或可保持由编码端10产生的经编码视频的另一中间存储装置。解码端20可经由流式传输或下载从存储装置40存取所存储的视频数据。文件服务器可为任何类型的能够存储经编码的视频数据并且将经编码的视频数据发射到解码端20的服务器。实例文件服务器包含网络服务器(例如,用于网站)、文件传输协议(file transfer protocol,FTP)服务器、网络附接式存储(network attached storage,NAS)装置或本地磁盘驱动器。解码端20可通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。这可包含无线信道(例如,无线保真(wIreless-fidelity,Wi-Fi)连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置40的传输可为流式传输、下载传输或两者的组合。
本申请提供的图像的解码方法可应用于视频编解码以支持多种多媒体应用,例如空 中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,视频译码系统1可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。
图1中所说明的视频译码系统1仅为实例,并且本申请的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的装置执行编码和解码。
在图1的实例中,编码端10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,编码端10经由输出接口140将经编码视频数据直接发射到解码端20。在其它实例中,经编码视频数据还可存储到存储装置40上,供解码端20以后存取来用于解码和/或播放。
在图1的实例中,解码端20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与解码端20集成或可在解码端20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包括多种显示装置,例如,液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。
尽管图1中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。在一些实例中,如果适用的话,那么解复用器(MUX-DEMUX)单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本申请技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本申请可大体上将视频编码器100称为将某些信息“发信号通知”或“发射”到例如视频解码器200的另一装置。术语“发信号通知”或“发射”可大体上指代用以对经压缩视频数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码码流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
JCT-VC开发了H.265(HEVC)标准。HEVC标准化基于称作HEVC测试模型(HEVC model,HM)的视频解码装置的演进模型。H.265的最新标准文档可从 http://www.itu.int/rec/T-REC-H.265获得,最新版本的标准文档为H.265(12/16),该标准文档以全文引用的方式并入本文中。HM假设视频解码装置相对于ITU-TH.264/AVC的现有算法具有若干额外能力。例如,H.264提供9种帧内预测编码模式,而HM可提供多达35种帧内预测编码模式。
JVET致力于开发H.266标准。H.266标准化的过程基于称作H.266测试模型的视频解码装置的演进模型。H.266的算法描述可从http://phenix.int-evry.fr/jvet获得,其中最新的算法描述包含于JVET-F1001-v2中,该算法描述文档以全文引用的方式并入本文中。同时,可从https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/获得JEM测试模型的参考软件,同样以全文引用的方式并入本文中。
一般来说,HM的工作模型描述可将视频帧或图像划分成包含亮度及色度样本两者的树块或最大编码单元(largest coding unit,LCU)的序列,LCU也被称为编码树单元(coding tree unit,CTU)。树块具有与H.264标准的宏块类似的目的。条带包含按解码次序的数个连续树块。可将视频帧或图像分割成一个或多个条带。可根据四叉树将每一树块分裂成编码单元(coding unit,CU)。例如,可将作为四叉树的根节点的树块分裂成四个子节点,且每一子节点可又为母节点且被分裂成另外四个子节点。作为四叉树的叶节点的最终不可分裂的子节点包括解码节点,例如,经解码视频块。与经解码码流相关联的语法数据可定义树块可分裂的最大次数,且也可定义解码节点的最小大小。
CU的大小对应于解码节点的大小且形状必须为正方形。CU的大小的范围可为8×8像素直到最大64×64像素或更大的树块的大小。
视频序列通常包含一系列视频帧或图像。图像群组(group of picture,GOP)示例性地包括一系列、一个或多个视频图像。GOP可在GOP的头信息中、图像中的一者或多者的头信息中或在别处包含语法数据,语法数据描述包含于GOP中的图像的数目。图像的每一条带可包含描述相应图像的编码模式的条带语法数据。视频编码器100通常对个别视频条带内的视频块进行操作以便编码视频数据。视频块可对应于CU内的解码节点。视频块可具有固定或变化的大小,且可根据指定解码标准而在大小上不同。
在本申请中,“N×N”与“N乘N”可互换使用以指依照垂直维度及水平维度的视频块的像素尺寸,例如,16×16像素或16乘16像素。一般来说,16×16块将在垂直方向上具有16个像素(y=16),且在水平方向上具有16个像素(x=16)。同样地,N×N块一股在垂直方向上具有N个像素,且在水平方向上具有N个像素,其中N表示非负整数值。可将块中的像素排列成行及列。此外,块未必需要在水平方向上与在垂直方向上具有相同数目个像素。例如,块可包括N×M个像素,其中M未必等于N。
在使用CU的帧内/帧间预测性解码之后,视频编码器100可计算CU的残余数据。CU可包括空间域(也称作像素域)中的像素数据,且CU可包括在将变换(例如,离散余弦变换(discrete cosine transform,DCT)、整数变换、离散小波变换或概念上类似的变换)应用于残余视频数据之后变换域中的系数。残余数据可对应于未经编码图像的像素与对应于CU的预测值之间的像素差。视频编码器100可形成包含残余数据的CU,且产生CU的变换系数。
在任何变换以产生变换系数之后,视频编码器100可执行变换系数的量化。量化示例性地指对系数进行量化以可能减少用以表示系数的数据的量从而提供进一步压缩的过程。量化过程可减少与系数中的一些或全部相关联的位深度。例如,可在量化期间将n位值降值舍位到m位值,其中n大于m。
在一些可行的实施方式中,视频编码器100可利用预定义扫描次序来扫描经量化变换系数以产生可经熵编码的串行化向量。在其它可行的实施方式中,视频编码器100可执行自适应性扫描。在扫描经量化变换系数以形成一维向量之后,视频编码器100可根据上下文自适应性可变长度解码(context-based adaptive variable-length code,CAVLC)、上下文自适应性二进制算术解码(context-based adaptive binary arithmetic coding, CABAC)、基于语法的上下文自适应性二进制算术解码(syntax-based adaptive binary arithmetic coding,SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)解码或其他熵解码方法来熵解码一维向量。视频编码器100也可熵编码与经编码视频数据相关联的语法元素以供视频解码器200用于解码视频数据。
为了执行CABAC,视频编码器100可将上下文模型内的上下文指派给待传输的符号。上下文可与符号的相邻值是否为非零有关。为了执行CAVLC,视频编码器100可选择待传输的符号的可变长度码。可变长度解码(variable-length code,VLC)中的码字可经构建以使得相对较短码对应于可能性较大的符号,而较长码对应于可能性较小的符号。以这个方式,VLC的使用可相对于针对待传输的每一符号使用相等长度码字达成节省码率的目的。基于指派给符号的上下文可以确定CABAC中的概率。
在本申请实施例中,视频编码器可执行帧间预测以减少图像之间的时间冗余。本申请可将视频解码器当前在解码的CU称作当前CU。本申请可将视频解码器当前在解码的图像称作当前图像。
图2为本申请提供的一种视频编码器的示例性框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。
在图2的实例中,视频编码器100包括预测处理单元108、滤波器单元106、经解码图像缓冲器(decoded picture buffer,DPB)107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106既定表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(adaptive loop filter,ALF)和样本自适应偏移(sample adaptive offset,SAO)滤波器。尽管在图2中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。
视频数据存储器可存储待由视频编码器100的组件编码的视频数据。可从视频源120获得存储在视频数据存储器中的视频数据。DPB 107可为参考图像存储器,其存储用于由视频编码器100在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。视频数据存储器和DPB 107可由多种存储器装置中的任一者形成,例如包含同步动态随机存储器(synchronous dynamic random access memory,SDRAM)的动态随机存取存储器(dynamic random access memory,DRAM)、磁阻式RAM(magnetic random access memory,MRAM)、电阻式RAM(resistive random access memory,RRAM),或其它类型的存储器装置。视频数据存储器和DPB 107可由同一存储器装置或单独存储器装置提供。
如图2所示,视频编码器100接收视频数据,并将所述视频数据存储在视频数据存储器中。分割单元将所述视频数据分割成若干图像块,而且这些图像块可以被进一步分割为更小的块,例如基于四叉树结构或者二叉树结构的图像块分割。此分割还可包含分割成条带(slice)、片(tile)或其它较大单元。视频编码器100通常说明编码待编码的视频条带内的图像块的组件。所述条带可分成多个图像块(并且可能分成被称作片的图像块集合)。预测处理单元108可选择用于当前图像块的多个可能的译码模式中的一者,例如多个帧内译码模式中的一者或多个帧间译码模式中的一者。预测处理单元108可将所得经帧内、帧间译码的块提供给求和器112以产生残差块,且提供给求和器111以重 构用作参考图像的经编码块。
预测处理单元108内的帧内预测器109可相对于与待编码当前块在相同帧或条带中的一或多个相邻块执行当前图像块的帧内预测性编码,以去除空间冗余。预测处理单元108内的帧间预测器110可相对于一或多个参考图像中的一或多个预测块执行当前图像块的帧间预测性编码以去除时间冗余。
具体的,帧间预测器110可用于确定用于编码当前图像块的帧间预测模式。举例来说,帧间预测器110可使用码率-失真分析来计算候选帧间预测模式集合中的各种帧间预测模式的码率-失真值,并从中选择具有最佳码率-失真特性的帧间预测模式。码率失真分析通常确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量,以及用于产生经编码块的位码率(也就是说,位数目)。例如,帧间预测器110可确定候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
帧间预测器110用于基于确定的帧间预测模式,预测当前图像块中一个或多个子块的运动信息(例如运动矢量),并利用当前图像块中一个或多个子块的运动信息(例如运动矢量)获取或产生当前图像块的预测块。帧间预测器110可在参考图像列表中的一者中定位所述运动向量指向的预测块。帧间预测器110还可产生与图像块和视频条带相关联的语法元素以供视频解码器200在对视频条带的图像块解码时使用。又或者,一种示例下,帧间预测器110利用每个子块的运动信息执行运动补偿过程,以生成每个子块的预测块,从而得到当前图像块的预测块;应当理解的是,这里的帧间预测器110执行运动估计和运动补偿过程。
具体的,在为当前图像块选择帧间预测模式之后,帧间预测器110可将指示当前图像块的所选帧间预测模式的信息提供到熵编码器103,以便于熵编码器103编码指示所选帧间预测模式的信息。
帧内预测器109可对当前图像块执行帧内预测。明确地说,帧内预测器109可确定用来编码当前块的帧内预测模式。举例来说,帧内预测器109可使用码率-失真分析来计算各种待测试的帧内预测模式的码率-失真值,并从待测试模式当中选择具有最佳码率-失真特性的帧内预测模式。在任何情况下,在为图像块选择帧内预测模式之后,帧内预测器109可将指示当前图像块的所选帧内预测模式的信息提供到熵编码器103,以便熵编码器103编码指示所选帧内预测模式的信息。
在预测处理单元108经由帧间预测、帧内预测产生当前图像块的预测块之后,视频编码器100通过从待编码的当前图像块减去所述预测块来形成残差图像块。求和器112表示执行此减法运算的一或多个组件。所述残差块中的残差视频数据可包含在一或多个(transform unit,TU)中,并应用于变换器101。变换器101使用例如离散余弦变换(discrete cosine transform,DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换器101可将残差视频数据从像素值域转换到变换域,例如频域。
变换器101可将所得变换系数发送到量化器102。量化器102量化所述变换系数以进一步减小位码率。在一些实例中,量化器102可接着执行对包含经量化的变换系数的矩阵的扫描。或者,熵编码器103可执行扫描。
在量化之后,熵编码器103对经量化变换系数进行熵编码。举例来说,熵编码器103可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器103熵编码之后,可将经编码码流发射到视频解码器200,或经存档以供稍后发射或由视频解码器200检索。熵编码器103还可对待编码的当前图像块的语法元素进行熵编码。
反量化器104和反变化器105分别应用逆量化和逆变换以在像素域中重构所述残差块,例如以供稍后用作参考图像的参考块。求和器111将经重构的残差块添加到由帧间 预测器110或帧内预测器109产生的预测块,以产生经重构图像块。滤波器单元106可以适用于经重构图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重构图像块作为参考块存储在经解码图像缓冲器107中,可由帧间预测器110用作参考块以对后续视频帧或图像中的块进行帧间预测。
应当理解的是,视频编码器100的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器100可以直接地量化残差信号而不需要经变换器101处理,相应地也不需要经反变换器105处理;或者,对于某些图像块或者图像帧,视频编码器100没有产生残差数据,相应地不需要经变换器101、量化器102、反量化器104和反变换器105处理;或者,视频编码器100可以将经重构图像块作为参考块直接地进行存储而不需要经滤波器单元106处理;或者,视频编码器100中量化器102和反量化器104可以合并在一起。
图3为本申请提供的一种视频解码器200的示例性框图。在图3的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及DPB 207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行大体上与相对于来自图2的视频编码器100描述的编码过程互逆的解码过程。
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频码流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频码流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频码流的经编码视频数据的经解码图像缓冲器(CPB)。因此,尽管在图3中没有示意出视频数据存储器,但视频数据存储器和DPB 207可以是同一个的存储器,也可以是单独设置的存储器。视频数据存储器和DPB 207可由多种存储器装置中的任一者形成,例如:包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。
网络实体42可为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频码流发送到视频解码器200之前,网络实体42可实施本申请中描述的技术中的部分。在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。在一些情况下,网络实体42可为图1的存储装置40的实例。
视频解码器200的熵解码器203对码流进行熵解码以产生经量化的系数和一些语法元素。熵解码器203将语法元素转发到预测处理单元208。视频解码器200可接收在视频条带层级和/或图像块层级处的语法元素。
当视频条带被解码为经帧内解码(I)条带时,预测处理单元208的帧内预测器209可基于发信号通知的帧内预测模式和来自当前帧或图像的先前经解码块的数据而产生当前视频条带的图像块的预测块。当视频条带被解码为经帧间解码(即,B或P)条带时,预测处理单元208的帧间预测器210可基于从熵解码器203接收到的语法元素,确定用于对当前视频条带的当前图像块进行解码的帧间预测模式,基于确定的帧间预测模式,对所述当前图像块进行解码(例如执行帧间预测)。具体的,帧间预测器210可确定是否对当前视频条带的当前图像块采用新的帧间预测模式进行预测,如果语法元素指示采用新的帧间预测模式来对当前图像块进行预测,基于新的帧间预测模式(例如通过 语法元素指定的一种新的帧间预测模式或默认的一种新的帧间预测模式)预测当前视频条带的当前图像块或当前图像块的子块的运动信息,从而通过运动补偿过程使用预测出的当前图像块或当前图像块的子块的运动信息来获取或生成当前图像块或当前图像块的子块的预测块。这里的运动信息可以包括参考图像信息和运动矢量,其中参考图像信息可以包括但不限于单向/双向预测信息,参考图像列表号和参考图像列表对应的参考图像索引。对于帧间预测,可从参考图像列表中的一者内的参考图像中的一者产生预测块。视频解码器200可基于存储在DPB 207中的参考图像来建构参考图像列表,即列表0和列表1。当前图像的参考帧索引可包含于参考帧列表0和列表1中的一或多者中。在一些实例中,可以是视频编码器100发信号通知指示是否采用新的帧间预测模式来解码特定块的特定语法元素,或者,也可以是发信号通知指示是否采用新的帧间预测模式,以及指示具体采用哪一种新的帧间预测模式来解码特定块的特定语法元素。应当理解的是,这里的帧间预测器210执行运动补偿过程。
反量化器204将在码流中提供且由熵解码器203解码的经量化变换系数逆量化,即去量化。逆量化过程可包括:使用由视频编码器100针对视频条带中的每个图像块计算的量化参数来确定应施加的量化程度以及同样地确定应施加的逆量化程度。反变换器205将逆变换应用于变换系数,例如逆DCT、逆整数变换或概念上类似的逆变换过程,以便产生像素域中的残差块。
在帧间预测器210产生用于当前图像块或当前图像块的子块的预测块之后,视频解码器200通过将来自反变换器205的残差块与由帧间预测器210产生的对应预测块求和以得到重建的块,即经解码图像块。求和器211表示执行此求和操作的组件。在需要时,还可使用环路滤波器(在解码环路中或在解码环路之后)来使像素转变平滑或者以其它方式改进视频质量。滤波器单元206可以表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)以及样本自适应偏移(SAO)滤波器。尽管在图3中将滤波器单元206示出为环路内滤波器,但在其它实现方式中,可将滤波器单元206实施为环路后滤波器。在一种示例下,滤波器单元206适用于重建块以减小块失真,并且该结果作为经解码视频流输出。并且,还可以将给定帧或图像中的经解码图像块存储在经解码图像缓冲器207中,经DPB 207存储用于后续运动补偿的参考图像。经DPB 207可为存储器的一部分,其还可以存储经解码视频,以供稍后在显示装置(例如图1的显示装置220)上呈现,或可与此类存储器分开。
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频码流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。
本申请的技术可通过本申请中所描述的视频编码器或视频解码器中的任一者进行,如关于图1到图3所展示及描述的视频编码器100及视频解码器200。即,在一种可行的实施方式中,关于图2所描述的视频编码器100可在视频数据的块的编码期间在执行帧间预测时执行下文中所描述的特定技术。在另一可行的实施方式中,关于图3所描述的视频解码器200可在视频数据的块的解码期间在执行帧间预测时执行下文中所描述的特定技术。因此,对一般性“视频编码器”或“视频解码器”的引用可包含视频编码器100、视频解码器200或另一视频编码或解码单元。
图1~图3仅为本申请提供的示例,在一些示例中,视频编码器100、视频解码器200以及视频译码系统可以包括更多或更少的部件或单元,本申请对此不予限定。
下面在图1~图3所示出的视频译码系统的基础上,本申请提供一种可能的视频编/解码实现方式,如图4所示,图4为本申请提供的一种视频编/解码的流程示意图,该视频编/解码实现方式包括过程①至过程⑤,过程①至过程⑤可以由上述的编码端10、视频解码器100、解码端20或视频解码器200中的任意一个或多个执行。
过程①:将一帧图像分成一个或多个互相不重叠的并行编码单元。该一个或多个并行编码单元间无依赖关系,可完全并行/独立编码和解码,如图4所示出的并行编码单元1和并行编码单元2。
过程②:对于每个并行编码单元,可再将其分成一个或多个互相不重叠的独立编码单元,各个独立编码单元间可相互不依赖,但可以共用一些并行编码单元头信息。
例如,独立编码单元的宽为w_lcu,高为h_lcu。若并行编码单元划分成一个独立编码单元,则独立编码单元的尺寸与并行编码单元完全相同;否则,则独立编码单元的宽应大于高(除非是边缘区域)。
通常的,独立编码单元可为固定的w_lcu×h_lcu,w_lcu和h_lcu均为2的N次方(N≥0),如独立编码单元的尺寸为:128×4,64×4,32×4,16×4,8×4,32×2,16×2或8×2等。
作为一种可能的示例,独立编码单元可为固定的128×4。若并行编码单元的尺寸为256×8,则可将并行编码单元等分为4个独立编码单元;若并行编码单元的尺寸为288×10,则并行编码单元划分为:第一/二行为2个128×4+1个32×4的独立编码单元;第三行为2个128×2+1个32×2的独立编码单元。
值得注意的是,独立编码单元既可以是包括亮度Y、色度Cb、色度Cr三个分量,或红(red,R)、绿(green,G)、蓝(blue,B)三个分量,也可以仅包含其中的某一个分量。若独立编码单元包含三个分量,则这三个分量的尺寸可以完全一样,也可以不一样,具体与图像的输入格式相关。
过程③:对于每个独立编码单元,可再将其分成一个或多个互相不重叠的编码单元,独立编码单元内的各个编码单元可相互依赖,如多个编码单元可以进行相互参考预编解码。
若编码单元与独立编码单元尺寸相同(即独立编码单元仅分成一个编码单元),则其尺寸可为过程②所述的所有尺寸。
若独立编码单元分成多个互相不重叠的编码单元,则其可行划分例子有:水平等分(编码单元的高与独立编码单元相同,但宽不同,可为其1/2,1/4,1/8,1/16等),垂直等分(编码单元的宽与独立编码单元相同,高不同,可为其1/2,1/4,1/8,1/16等),水平和垂直等分(四叉树划分)等,优选为水平等分。
编码单元的宽为w_cu,高为h_cu,则其宽应大于高(除非是边缘区域)。通常的,编码单元可为固定的w_cu x h_cu,w_cu和h_cu均为2个N次方(N大于等于0),如16x4,8x4,16x2,8x2,8x1,4x1等。
作为一种可能的示例,编码单元可为固定的16x4。若独立编码单元的尺寸为64x4,则可将独立编码单元等分为4个编码单元;若独立编码单元的尺寸为72x4,则编码单元划分为:4个16x4+1个8x4。
值得注意的是,编码单元既可以是包括亮度Y、色度Cb、色度Cr三个分量(或红R、绿G、蓝B三分量),也可以仅包含其中的某一个分量。若包含三个分量,几个分量的尺寸可以完全一样,也可以不一样,具体与图像输入格式相关。
值得注意的是,过程③是视频编解码方法中一个可选的步骤,视频编/解码器可以对过程②获得的独立编码单元进行残差系数(或残差值)进行编/解码。
过程④:对于编码单元,可以将其可再将其分成一个或多个互相不重叠的预测组(Prediction Group,PG),PG也可简称为Group,各个PG按照选定预测模式进行编解码,得到PG的预测值,组成整个编码单元的预测值,基于预测值和编码单元的原始值,获得编码单元的残差值。
过程⑤:基于编码单元的残差值,对编码单元进行分组,获得一个或多个相不重叠的残差小块(residual block,RB),各个RB的残差系数按照选定模式进行编解码,形成残差系数流。具体的,可分为对残差系数进行变换和不进行变换两类。
其中,过程⑤中残差系数编解码方法的选定模式可以包括,但不限于下述任一种:半定长编码方式、指数哥伦布(Golomb)编码方法、Golomb-Rice编码方法、截断一元码编码方法、游程编码方法、直接编码原始残差值等。
例如,视频编码器可直接对RB内的系数进行编码。
又如,视频编码器也可对残差块进行变换,如DCT、DST、Hadamard变换等,再对变换后的系数进行编码。
作为一种可能的示例,当RB较小时,视频编码器可直接对RB内的各个系数进行统一量化,再进行二值化编码。若RB较大,可进一步划分为多个系数组(coefficient group,CG),再对各个CG进行统一量化,再进行二值化编码。在本申请的一些实施例中,系数组(CG)和量化组(QG)可以相同。
下面以半定长编码方式对残差系数编码的部分进行示例性说明。首先,将一个RB块内残差绝对值的最大值定义为修整最大值(modified maximum,mm)。其次,确定该RB块内残差系数的编码比特数(同一个RB块内残差系数的编码比特数一致)。例如,若当前RB块的关键限值(critica llimit,CL)为2,当前残差系数为1,则编码残差系数1需要2个比特,表示为01。若当前RB块的CL为7,则表示编码8-bit的残差系数和1-bit的符号位。CL的确定是去找满足当前子块所有残差都在[-2^(M-1),2^(M-1)]范围之内的最小M值。若同时存在-2^(M-1)和2^(M-1)两个边界值,则M应增加1,即需要M+1个比特编码当前RB块的所有残差;若仅存在-2^(M-1)和2^(M-1)两个边界值中的一个,则需要编码一个Trailing位来确定该边界值是-2^(M-1)还是2^(M-1);若所有残差均不存在-2^(M-1)和2^(M-1)中的任何一个,则无需编码该Trailing位。
另外,对于某些特殊的情况,视频编码器也可以直接编码图像的原始值,而不是残差值。
下面将结合附图对本申请实施例的实施方式进行详细描述。
图5为本申请提供的一种图像的解码方法的流程示意图,该图像的解码方法可应用于图1所示出的视频译码系统,该图像的解码方法可以由解码端20执行,具体的,该解码方法可以由解码端20包括的视频解码器200执行,请参见图5,本实施例提供的解码方法包括以下步骤。
S510,视频解码器200对获取到的码流进行解析,获得一个或多个图像帧。
其中,上述的一个图像帧包括一个或多个CU。
如图5所示,一个图像帧可以包括3×5=15个CU。值得注意的是,一个图像帧也可以包括更多个CU,如20个CU;一个图像帧还可以包括更少个CU,如1个CU或2个CU等。
S520,视频解码器200确定一个图像帧的多个QP值。
其中,一个CU包括多个QG,一个QG对应一个QP值。
上述的一个CU包含多个残差系数,且该一个QG包含前述多个残差系数中的一部分残差系数,该一部分残差系数共用一个QP值。
值得注意的是,在编码过程中,视频编码器对该残差系数进行量化后,该残差系数又被称为水平值。而在本领域中,残差系数和水平值也可以统称为残差系数,本实施例均以残差系数来表示,不应理解为对本申请的限定。
相较于一个CU中所有的残差系数都采用相同的QP值导致的图像失真较大,在本实施例提供的视频图像的解码过程中,一个CU可以被划分为多个QG,每个QG中的一个或多个残差系数共用一个QP值,进而,视频解码器可以对图像帧对应的一个或多个CU进行更精细粒度的QP决策,在保证一定压缩率的情况下,降低了图像帧的解码失真,提高了视频图像解码的真实性和准确性。
在一种可选的实现方式中,一个QG可以包括一个像素点或多个像素点,每个像素点都具有相应的残差系数。
在第一种可能的情形中,当一个QG包括一个像素点时,QG的量化过程也可以称为图像帧的自适应逐点量化,该逐点量化也可以适用于不做QG划分的CU中。逐点量化的特点是允许每个像素点使用不同的QP,相当于把QP的粒度精细化到像素级。如此,采用逐点量化的方法对图像帧进行反量化,可以在图像帧的压缩率不变的情况下,大幅度提高图像帧的主观质量。
在第二种可能的情形中,当一个QG包括多个像素点时,一个QG包括的多个像素点对应的残差系数可以共用一个QP值。
在另一种可选的实现方式中,CU可以是通过以下方式确定的:视频解码器200按照第一规则划分一个图像帧包括的一个CU,获取多个QG。
该第一规则包括划分域和划分方式,该划分域为变换域或像素域,该划分方式包括均匀划分和非均匀划分中至少一种。
一个CU包含的多个残差系数的位置由坐标进行标记,该标记可以包括横坐标和纵坐标。例如,残差系数的位置坐标为(i,j),i为横坐标,j为纵坐标。下面提供几种可能的示例,对第一规则和CU中QG的划分进行说明。
在第一种可能的示例中,若划分域的类型为变换域,则视频解码器200划分一个CU获取多个QG的过程包括:将多个残差系数中,坐标和未达到第一坐标阈值的残差系数划分至第一QG,坐标和达到第一坐标阈值的残差系数划分至第二QG。该坐标和为残差系数的横坐标与纵坐标之和。
图6为本申请提供的一种变换域划分的示意图,一个CU包括16个残差系数对应的区域,其中,左上角的残差系数的坐标为(1,1),右下角的残差系数的坐标为(4,4)。
QG(2-1)示出了一种变换域的QG二分方式,对于(i,j)位置的残差系数,满足“i+j≤阈值1”的残差系数为第一QG,其他为第二QG,如该阈值1为5.5。另外,也可以是满足“i+j<阈值1”的残差系数为第一QG,其他为第二QG。
在第二种可能的示例中,若划分域的类型为变换域,视频解码器200划分一个CU获取多个QG的过程包括:将多个残差系数中,坐标和未达到第一坐标阈值的残差系数划分至第一QG,坐标和达到第一坐标阈值、未达到第二坐标阈值的残差系数划分至第二QG,坐标和达到第二坐标阈值的残差系数划分至第三QG。该第二坐标阈值大于第一坐标阈值。
如图6所示,QG(3-1)示出了一种变换域的QG三分方式,对于(i,j)位置的残差系数,满足“i+j<阈值1”的残差系数为第一QG,满足“阈值1≤i+j<阈值2”的残差系数为第二QG,满足“i+j≥阈值2”的残差系数为第三QG,阈值2大于阈值1,如阈值1=5.5,阈值2=6.5。
另外,也可以是满足“i+j≤阈值1”的残差系数为第一QG,满足“阈值1<i+j≤阈值2”的残差系数为第二QG,满足“i+j>阈值2”的残差系数为第三QG,阈值2大于阈值1,如阈值1=5,阈值2=6。
在第三种可能的示例中,若划分域的类型为变换域,视频解码器200划分一个CU获取多个QG的过程包括:对多个残差系数进行排序,将多个残差系数中未达到第一比例阈值的残差系数划分至第一QG,达到第一比例阈值的残差系数划分至第二QG。该多个残差系数的排序方式为以下任一种:Z字形、反向Z字形。
例如,如图6所示,QG(2-2)示出了一种Z字形的变换域的QG二分方式,前7/16(43.75%)的残差系数被划分到第一QG,剩余的残差系数被划分到第二QG。
又如,如图6所示,QG(2-3)示出了一种反Z字形的变换域的QG二分方式,前7/16(43.75%)的残差系数被划分到第一QG,剩余的残差系数被划分到第二QG。
在第四种可能的示例中,若划分域的类型为变换域,视频解码器200划分一个CU获取多个QG的过程包括:对多个残差系数进行排序,将多个残差系数中未达到第一比例阈值的残差系数划分至第一QG,达到第一比例阈值、但未达到第二比例阈值的残差 系数划分至第二QG,达到第二比例阈值的残差系数划分至第三QG。该第二比例阈值大于第一比例阈值。
例如,如图6所示,QG(3-2)示出了一种Z字形的变换域的QG三分方式,前5/16(31.25%)的残差系数被划分到第一QG,后3/16(18.75%)的残差系数被划分到第二QG,剩余的残差系数被划分到第三QG。
又如,如图6所示,QG(3-3)示出了一种反Z字形的变换域的QG三分方式,前5/16(31.25%)的残差系数被划分到第一QG,后3/16(18.75%)的残差系数被划分到第二QG,剩余的残差系数被划分到第三QG。
上述第一至第四种可能的示例仅为本实施例为说明CU中变换域的划分而给出的示例,不应理解为对本申请的限定。在划分域的类型为变换域时,坐标阈值、比例阈值的选取可以根据CU的图像内容或者视频编解码的需求进行确定,本申请对此不予限定。另外,为提高一个CU内的QP决策精度,变换域还可以划分为更多个QG,如4、5、10或更多等,本申请对此不予限定。
在划分域的类型为像素域时,下面提供几种可能的示例,对第一规则和CU中QG的划分进行说明。
在第五种可能的示例中,视频解码器200将多个残差系数进行水平或垂直方向的对称划分,获得包含的残差系数的数量一致的两个QG。这两个QG包含的残差系数的比例为1:1。
图7为本申请提供的一种像素域划分的示意图,QG_像素(pixel,P)(2-1)示出了一种在CU的垂直方向上进行对称二分的示例,QG_P(2-4)示出了一种在CU的水平方向上进行对称二分的示例。
在第六种可能的示例中,视频解码器200将多个残差系数进行水平或垂直方向的对称划分,获得三个QG。三个QG中两个不相邻的QG包含的残差系数的数量一致,且另一个QG包含的残差系数的数量与两个不相邻的QG包含的残差系数的数量和是一致的。
如图7所示,QG_P(3-1)示出了一种在CU的垂直方向上进行对称三分的示例,两侧的QG包含的残差系数的数量相同,这三个QG包含的残差系数的数量比例为1:2:1。QG_P(3-4)示出了一种在CU的垂直方向上进行对称三分的示例,这三个QG包含的残差系数的数量比例为1:2:1。
在第七种可能的示例中,视频解码器200将多个残差系数进行水平或垂直方向的划分,获得包含的残差系数的数量不一致的两个QG。
例如,如图7所示出的QG_P(2-2)给出了一种在CU的垂直方向上进行二分的示例,这两个QG包含的残差系数的比例为1:3。
又如,QG_P(2-3)给出了一种在CU的垂直方向上进行二分的示例,这两个QG包含的残差系数的比例为3:1。
又如,QG_P(2-5)给出了一种在CU的水平方向上进行二分的示例,这两个QG包含的残差系数的比例为1:3。
又如,QG_P(2-6)给出了一种在CU的垂直方向上进行二分的示例,这两个QG包含的残差系数的比例为3:1。
在第八种可能的示例中,视频解码器200将多个残差系数进行水平或垂直方向的划分,获得三个QG。这三个QG包含的残差系数不存在对称关系。
例如,如图7所示出的QG_P(3-2)给出了一种在CU的垂直方向上进行三分的示例,这三个QG包含的残差系数的比例为1:1:2。
又如,QG_P(3-3)给出了一种在CU的垂直方向上进行三分的示例,这三个QG包含的残差系数的比例为2:1:1。
又如,QG_P(3-5)给出了一种在CU的水平方向上进行三分的示例,这三个QG 包含的残差系数的比例为1:1:2。
又如,QG_P(3-6)给出了一种在CU的垂直方向上进行三分的示例,这三个QG包含的残差系数的比例为2:1:1。
上述第五至第八种可能的示例仅为本实施例为说明CU中像素域的划分而给出的示例,不应理解为对本申请的限定。在划分域的类型为像素域时,QG包含的残差系数的比例可以根据CU的图像内容或者视频编解码的需求进行确定,本申请对此不予限定。另外,为提高一个CU内的QP决策精度,像素域还可以划分为更多个QG,如4、5、10或更多等,本申请对此不予限定。
如此,视频解码器200对像素域和变换域的残差系数采用不同的QG划分方式,QP量化过程也不同,从而降低了图像解码的失真。关于QP量化的过程,可参考本文中关于QP值的获取过程的阐述,此处不予赘述。
请继续参见图5,在视频解码器200获取到一个图像帧的多个QP值之后,本实施例提供的图像的解码方法还包括以下步骤S530。
S530,视频解码器200依据多个QP值对一个图像帧进行解码。
例如,视频解码器200依据多个QP值对一个图像帧进行解码后,获得经解码后的图像。在视频解码器200解码多个图像帧后,获取到码流解码后的视频。
如此,在本实施例中,视频解码器可以按照QG为粒度进行图像解码的QP量化,由于一个CU可以被划分为多个QG,且一个QG对应一个QP值,相较于一个CU中所有的残差系数都采用相同的QP值导致的图像失真较大,在本实施例中,视频解码器可以对图像帧对应的一个或多个CU进行更精细粒度的QP决策,在保证一定压缩率的情况下,降低了图像帧的解码失真,提高了视频图像解码的真实性和准确性。
在本申请提供的技术基础上,视频编/解码的过程中,视频编码器/视频解码器会获取每个QG(或CU)的QP值,这里提供几种可能的实现方式。
在一种可能的实现方式中,一个QG对应的一个QP值包括亮度QP值和色度QP值。亮度QP值(QP_Y)是指对图像帧的亮度(Luminance或Luma)进行量化或反量化所需的QP值,色度QP值是指对图像帧的色度(Chrominance或Chroma)进行量化或反量化所需的QP值。示例的,针对于上述的S520,视频解码器200确定一个图像帧的多个QP值,可以包括以下两种可能的情形。
第一种可能的情形,视频解码器200分别获取一个QG的亮度QP值和色度QP值。
第二种可能的情形,首先,视频解码器200获取一个QG的亮度QP值;其次,视频解码器200基于亮度QP值确定一个QG的色度QP值。如,色度QP为QP_Y加上图像参数集(picture parameter set,PPS)层和Slice层的QP偏移值。
示例的,上述的亮度QP值和色度QP值可以是视频解码器200解析码流获得的。
在对CU包括的QG进行QP决策(或称QP量化)时,视频解码器200可以先获取CU级的QP值,再获取该CU中QG级的QP值。
下面提供两种可能的实现方式来确定一个图像帧的多个QP值。
第一种可能的实现方式,直接编码/解码:视频解码器200解析码流以获得一个图像帧的标记信息,该标记信息用于指示一个QG的QP值,和/或,标记信息用于指示一个CU的QP值。
例如,在近无损压缩中,小QP值出现的概率高于大QP值,所以视频解码器200可采用截断一元码、截断莱斯码或者指数哥伦布码等方式,利用码流携带的标记信息中包括的QP值,对QG包括的残差系数(水平值)进行反量化。
如此,在视频编码器200对一个CU进行QP决策时,该标记信息可以携带有该CU的QP值。在无需对CU中的QG进行QP决策的情况下,视频解码器200可以依据码流携带的标记信息来确定CU的QP值,避免了视频解码器对码流中的图像帧进行推导来获得CU的QP值,减少了视频解码器的计算资源消耗,提高了图像解码的效率。
又如,在视频编码器200对一个CU中的多个QG进行QP决策时,该标记信息可以携带有该多个QG中任一个QG的QP值。
又如,该标记信息可以携带有CU的QP值,以及该CU中任一个QG的QP值。
如此,在视频编码器200对一个QG进行QP决策时,该标记信息可以携带有该QG的QP值,避免了视频解码器对码流中的图像帧进行推导来获得QG的QP值,减少了视频解码器的计算资源消耗,提高了图像解码的效率。
第二种可能的实现方式,预测编码/解码:视频解码器200对实际(编码)QP值与预测QP(predict QP,predQP)值的差值(deltaQP)进行编码,包括以下几个步骤:首先,获取当前块(CU或QG)的predQP;其次,确定当前块的deltaQP;最后,确定实际(编码)QP值为:QP=predQP+deltaQP。
下面以视频解码器200确定一个图像帧中一个QG的QP值为例进行详细说明,如图8所示,图8为本申请提供的一种采用预测编码方式获取QP的流程示意图,该获取QP的过程可以由视频解码器或视频编码器来实现,这里以视频解码器200为例进行说明,该获取QP的过程包括以下步骤。
S810,视频解码器200获取QG的预测QP值。
在一种可行的实现方式中,视频解码器200将QG所在CU的QP值作为该QG的预测QP值。示例的,该CU的QP值可以是视频解码器200解析码流的标记信息确定。
在另一种可行的实现方式中,视频解码器200可以先获取QG所在CU中与该QG相邻的至少一个其他QG的QP值,并依据该至少一个其他QG的QP值,确定前述QG的预测QP值。例如,视频解码器200将与该QG相邻的其他QG的QP值作为该QG的预测QP值。
S820,视频解码器200获取QG的QP偏移量。
在本文中,QP偏移量可以用deltaQP来表示。
请继续参见图8,针对于S820,本实施例提供了两种可能的实现方式。
在第一种可选的实现方式中,视频解码器200可以利用码流携带的标记信息来确定QG的QP偏移量,如图8所示出的S820A。
S820A,视频解码器200解析码流以获得指示了QG的QP偏移量的标记信息。
例如,视频解码器200获取到码流后,对该码流进行解析以获取QG所在图像帧的标记信息,该标记信息用于指示该QG的QP偏移量(deltaQP)。
在第二种可选的实现方式中,视频解码器200可以利用推导信息来确定QG的QP偏移量,如图8所示出的S820B。
S820B,视频解码器200依据QG的推导信息确定QG的QP偏移量。
该推导信息可以为以下任意一种或几种的组合:该QG的平坦度信息或纹理度信息、码流缓冲区的剩余空间或失真约束信息。其中,平坦度信息或纹理度信息,用于指示QG的图像梯度;失真约束信息指示了一个图像帧所包括的多个QG中任一个QG的失真阈值;码流缓冲区的剩余空间用于指示码流缓冲区(如buffer空间)的可用余量。
在第一种情形中,推导信息为平坦度信息或纹理度信息,视频解码器200可以依据该平坦度信息或纹理度信息推导QG的QP偏移量。例如,视频解码器200计算当前块(QG)的纹理复杂度,对于纹理复杂度高(如达到纹理复杂度阈值)的QG,使用大QP(如20);对于纹理复杂度低(如未达到纹理复杂度阈值)的QG,使用小QP(如5)。
在第二种情形中,推导信息为码流缓冲区的剩余空间,视频解码器200计算整幅图像所有像素的平均比特数BPPtotal,以及剩余未编码像素的平均比特数BPPleft,若BPPleft>BPPtotal,则减小QP,否则增大QP。其中,BPPtotal和BPPleft可以采用下述公式来获取。
Figure PCTCN2022131068-appb-000001
Figure PCTCN2022131068-appb-000002
在第三种情形中,若推导信息为失真约束信息,不妨设D为量化失真(反量化后的重建值与量化前的残差之差),并且对任意矩阵A记:
Figure PCTCN2022131068-appb-000003
Figure PCTCN2022131068-appb-000004
Figure PCTCN2022131068-appb-000005
(1),如果对像素域的残差做量化,则像素域的最大失真不超过△的充分必要条件是‖D‖ max≤△,由此可推导像素域的QP值。
(2),如果对变换域的残差做量化,不妨设R为像素域的残差,U,V为水平变换矩阵和垂直变换矩阵,则像素域的最大失真与变换域的最大失真之间满足下述公式。于是,像素域的最大失真不超过△的充分条件是:变换域的最大失真满足
Figure PCTCN2022131068-appb-000006
Figure PCTCN2022131068-appb-000007
由此可推导变换域的QP值。
‖U -1(URV+D)V -1-R‖ max=‖U -1DV -1max≤‖U -1‖D‖ max‖V -11
请继续参见图8,本实施例提供的获取QP的过程还包括以下步骤。
S830,视频解码器200依据QG的预测QP值和QP偏移量,确定QG的QP值。
例如,视频解码器200将QG的预测QP值与QP偏移量之和作为该QG的QP值。
值得注意的是,在S820B提供的第三种情形中,若视频解码器200获取QG的QP值可能存在以下的情况。
视频解码器200依据QG的参考QP值确定对应的预测失真,若预测失真小于或等于失真阈值,将该参考QP值作为QG的QP值;若预测失真大于失真阈值,将由失真阈值确定的QP值作为QG的QP值。
在示例1中,若推导信息仅包括失真约束信息,该参考QP值是指S810确定的QG的预测QP值。
在示例2中,若推导信息包括失真约束信息和纹理度信息(或平坦度信息),该参考QP值可以是指由纹理度信息(或平坦度信息)确定的deltaQP和S810确定的预测QP值相加获得的QP值。
在示例3中,若推导信息包括失真约束信息和码流缓冲区的剩余空间,该参考QP值可以是指由码流缓冲区的剩余空间确定的deltaQP和S810确定的预测QP值相加获得的QP值。
也就是说,上述的推导信息可以是用于确定QG的deltaQP,也可以是用于直接确定QG的实际编码QP值,推导信息具体的使用过程可以根据视频编解码中QP量化/反量化的需求进行确定,上述的3种情形和3种示例不应理解为对本申请的限定。
前述示例1~示例3仅为本实施例为说明视频解码器200利用推导信息来确定QG的QP值的示例,不应理解为对本申请的限定,在另一些示例中,推导信息可以同时包括失真约束信息、纹理度信息(或平坦度信息)和码流缓冲区的剩余空间,本申请对此不予限定。
值得注意的是,上述的色度/亮度QP值、直接编码/预测编码、码流携带/推导信息解析仅为本申请提供的一些示例,不应理解为对本申请的限定。
在一种可选的实现方式中,一个图像帧至少包括第一部分CU和第二部分CU,第一部分CU和第二部分CU不具有重叠区域,且第一部分CU和第二部分CU的QP值 的获取方式不同。如第一部分CU的QP值获取方式是码流的标记信息携带的,第二部分CU的QP值获取方式是视频解码器200推导的。
具体的,上述视频解码器200确定一个图像帧的多个QP值的过程可以包括以下过程:第一,视频解码器200解析码流以获得一个图像帧的标记信息,该标记信息包括第一部分CU的QP偏移量,视频解码器200依据标记信息确定第一部分CU的QP值;第二,视频解码器200针对于第二部分CU,获取第二部分CU的预测QP值,视频解码器200还依据第二部分CU的预测QP值和推导信息,确定第二部分CU的QP值。关于推导信息的相关内容可以参考前述对QG的推导信息的描述,将其中的QG替换为CU即可,此处不予赘述。
例如,视频解码器200将一幅图像分为若干个区域,对不同区域的CU采用不同的QP处理方式。在图像级传输不同区域的基准QP值,在CU级传输不同区域的标记信息。不同区域的CU级QP可以通过码流传输的方式获取,也可以通过解码端推导的方式获取。如视频解码器将图像划分为感兴趣(region of interset,ROI)区域和非ROI区域,对于ROI区域的CU(第一部分CU),通过码流传输的方式(如上述的标记信息)获取QP值;对于非ROI区域的CU(第二部分CU),通过解码端推导(如上述的推导信息)的方式获取QP值。
作为一种可选的实现方式,一个图像帧的一个CU可以包括多个QG,这多个QG可以是部分进行了QP量化,也可以是全部进行了QP量化。
例如,在S520的过程中,若一个图像帧的一个CU包括的所有QG在编码过程中均进行了量化,则视频解码器200可以首先确定该一个图像帧的一个CU包括的所有QG的扫描顺序。继而,视频解码器200针对于所有QG中的每一个QG,按照扫描顺序获取每一个QG的QP值。其中,扫描顺序包括以下任意一种:从上到下、从左到右、Z字形或反向Z字形顺序。
具体的,如果只对某些QG做量化,其他QG不做量化,则需要标识做量化的那些QG的位置,然后为这些QG获取相应的QP。例如,可以为这些QG每个都编码一个QP偏移量。
又如,在S520的过程中,若一个图像帧的一个CU包括的部分QG在编码过程中进行了量化,则视频解码器200可以解析码流以确定一个图像帧的一个CU中被标记的一个或多个QG,该被标记的一个或多个QG在解码过程中需进行QP反量化;进而,视频解码器200针对于被标记的一个或多个QG中每一个QG,获取每一个QG的QP值。
具体的,如果要对所有QG都做量化,则需要按照某种扫描顺序依次为每个QG获取相应的QP,其中的扫描顺序与划分方式有关,可以是从上到下、从左到右、Z字形或反向Z字形顺序。例如,可以为这些QG每个都编码一个QP偏移量。
如此,针对于一个图像帧的一个CU中所有的QG,可以利用码流携带的标记信息来区分该所有QG的QP量化方式(部分量化或全部量化),避免了视频解码器无差别的进行QP量化,减少了视频解码器进行QP量化所需的计算资源和图像失真,提高了视频的解码效率额准确性。
另外,当一个QG仅包括一个图像帧中的一个像素点的情况下,一个像素点对应一个QP值,且一个图像帧所包括的所有像素点中至少两个像素点的QP值不同,该像素点的QP值量化过程(点预测模式)可以采用逐点量化技术,例如,该逐点量化技术包括以下过程:视频解码器200可以根据当前像素点周围的已重建像素信息来自适应调整当前像素的QP值。其中已重建像素信息包括但不限于像素值、平坦度信息或纹理度信息、背景亮度、对比度等。
值的注意的是,自适应的逐点量化技术可以应用于QG中,也可以适用于不做QG划分的CU中。逐点量化技术的特点是允许一个图像帧中每个像素点使用不同的QP值, 相当于把QP值量化的粒度精细化到像素级。
逐点量化技术的一种实现方式是:不妨设QP pred是当前CU或QG的QP值,QP JND≥0是人眼恰可察觉失真(Just Noticeable Distortion,JND)对应的QP值,offset>0是QP偏移值(可以在码流中传输,也可以预先设定),则当前像素的QP值调整为:
当前像素
Figure PCTCN2022131068-appb-000008
其中,阈值2>阈值1。或者,
当前像素
Figure PCTCN2022131068-appb-000009
Figure PCTCN2022131068-appb-000010
上述两种当前像素的QP值确定方式仅为本实施例提供的示例,不应理解为对本申请的限定。
如此,视频解码器采用逐点量化技术,可以在不改变图像帧的压缩率的前提下,大幅度提升图像帧的主观质量,减少图像帧的失真。
下面给出一种可能的实现方式,来说明图像帧中的像素点的QP量化过程。
示例的,视频解码器200获取一个像素点的预测QP值,并依据一个像素点的预测QP值和推导信息,确定一个像素点的QP值。
在一种可能的情形中,一个像素点的预测QP值为该像素点所在的CU或QG的QP值。在另一种可能的情形中,一个像素点的预测QP值为根据该像素点周围的一个或多个已重建像素点的QP值推导得到,其中,推导的方法包括计算均值(如多个像素点的QP值的平均值)、中位数(如多个像素点的QP值的中位数)或众数(如多个像素点的QP值中出现频次最大的QP值)中至少一种。
上述像素点的推导信息可以是该像素点周围的一个或多个已重建像素点的信息。其中,该已重建像素点的信息包括以下任意一种或几种的组合:像素值、平坦度信息或纹理度信息、背景亮度、对比度。值得注意的是,前述的几种信息仅为本实施例提供的示例,不应理解为对本申请的限定。
此外,上述“周围”的意思可以理解为待确定QP值的像素点的相邻像素点,这里提供几种可能的示例进行说明,如图9所示,图9为本申请提供的一种像素点的分布示意图,图9中的(A)示出了一种以当前像素点为中心的正方形区域划分为例的示意,这里给出了2种可能的情形:情形1,已重建像素点是指以当前像素点为中心,边长为3的正方形区域中的像素点,如图9的(A)中所示出的周围像素点1;情形2,已重建像素点是指以当前像素点为中心,边长为5的正方形区域中的像素点,如图9的(A)中所示出的周围像素点2。
图9中的(B)示出了一种以当前像素点为中心的菱形区域划分为例的示意,这里给出了2种可能的情形:情形1,已重建像素点是指以当前像素点为中心,对角线长度为3的菱形区域中的像素点,如图9的(B)中所示出的周围像素点1;情形2,已重建像素点是指以当前像素点为中心,对角线长度为5的菱形区域中的像素点,如图9的(B)中所示出的周围像素点2。
图9仅为本实施例为说明在当前像素点周围的已重建像素点的示例,不应理解为对本申请的限定。在另一些可能的示例中,当前像素点周围的已重建像素点还可以是指与该当前像素点上下相邻、或左右相邻的一个或两个像素点。
作为一种可选的实现方式,依据一个像素点的预测QP值和推导信息,确定一个像素点的QP值,包括:根据一个像素点周围的一个或多个已重建像素点的信息确定像素点的指示信息;如果指示信息小于或等于第一阈值,并且预测QP值大于或等于人眼恰 可察觉失真对应的QP值,则将人眼恰可察觉失真对应的QP值作为像素点的QP值。
第一阈值可以是预设的,也可以是根据视频编解码的压缩率或失真率要求进行确定的。另外,第一阈值也可以是根据使用者的输入信息确定的。
值得注意的是,人眼恰可察觉失真对应的QP值是图像级或CU级信息。
例如,人眼恰可察觉失真对应的QP值是从码流中解析得到,如该码流中携带有人眼恰可察觉失真对应的QP值,如20。
又如,人眼恰可察觉失真对应的QP值是根据周围的已重建CU的平坦度信息或纹理度信息、背景亮度、对比度信息推导得到。关于推导获得QP值的过程可以参考上述图9的相关内容,此处不予赘述。
另外,人眼恰可察觉失真对应的QP值还可以是视频编码器或视频解码器设置的预设值,如15。也就是说,人眼恰可察觉失真对应的QP值不仅可以携带在码流的标记信息中,也可以是视频编码器或视频解码器在视频编解码过程中解析的,还可以是预设值的QP值。本实施例将人眼恰可察觉失真对应的QP值引入当前像素点的QP值决策,使得每个像素点满足人眼恰可察觉失真相应的判断信息,降低了图像失真,提高了图像的主观质量。
这里给出一种可能的具体示例,以来说明像素点的QP值确定过程,如表3所示,表3示出了一种当前像素点的区间二分的QP值确定示意。
表3
区间二分 ≤阈值 >阈值
方式一 min(QP pred,QP JND) QP pred
方式二 min(QP pred,QP JND) QP pred+offset
方式三 QP pred QP pred+offset
其中,其中offset>0是QP偏移值(可以在码流中传输,也可以预先设定)。
方式一:当指示信息小于或等于阈值时,将QP pred和QP JND中的较小值作为当前像素点的QP值;当指示信息大于阈值时,将像素点的预测QP值(QP pred)作为当前像素点的QP值。
方式二:当指示信息小于或等于阈值时,将QP pred和QP JND中的较小值作为当前像素点的QP值;当指示信息大于阈值时,将QP pred与QP偏移量(offset)之和作为当前像素点的QP值。
方式三:当指示信息小于或等于阈值时,将QP pred作为当前像素点的QP值;当指示信息大于阈值时,将像素点的QP pred与QP偏移量(offset)之和作为当前像素点的QP值。
值得注意的是,上述示例以及表3仅为本实施例提供的当前像素点的QP值的可能获取方式,不应理解为对本申请的限定。
如此,将人眼恰可察觉失真对应的QP值引入当前像素点的QP值决策过程中,使得每个像素点满足人眼恰可察觉失真相应的判断信息,降低了图像失真,提高了图像的主观质量。
在视频解码器200获取到一个图像帧的QP值之后,针对于上述的S530,本申请提供一种可能的实现方式,图10为本申请提供的一种图像帧解码的流程示意图,S530可以包括以下步骤S1010~S1030。
S1010,针对于多个QP值中每一个QP值,获取QP值对应的量化步长(Qstep)。
视频解码器根据QP值,可通过公式推导和查表中至少一种方式获取Qstep,下面提供了四种可能的实现方法。
方法一:Qstep=2×QP+1。
方法二:不妨记octave为QP的位阶,也即QP每增加octave,Qstep增加一倍,通 常选取octave为6或8。记offset为一个整数偏移值,则:
Q step=2 (QP+offset)/octave
方法三:不妨记octave为QP的位阶,也即QP每增加octave,Qstep增加一倍,通常选取octave为6或8。记offset为一个整数偏移值,
Figure PCTCN2022131068-appb-000011
分别表示向上取整和向下取整。记
Figure PCTCN2022131068-appb-000012
则量化步长为:Q step=2 T
方法四:下面给出一种量化和反量化的示例。
量化:
Figure PCTCN2022131068-appb-000013
反量化:c′=l*Q step
其中c是要量化的残差系数(变换域或像素域),l是量化后的水平值,c′是反量化后的重建值,Qstep是量化步长,f∈[0,1)是控制舍入的参数,[0,1-f)是量化死区(水平值为0的区间)。
不妨记f∈[0,1)是控制舍入的参数,{c i}是当前QG或CU中所有待量化的残差系数,T可以利用方法三中的公式获得,记:
Figure PCTCN2022131068-appb-000014
则当T<M时,Qstep由下述公式给出,否则量化系数和重建值均为零。
Figure PCTCN2022131068-appb-000015
该方法四中一种可能的实现方式是:取a=f*2 T-M或者取a=2 T-M,其中M可以在码流中传输,也可以直接取为比特深度。
在量化像素域的残差时,方法三需要通过clip操作才能保证量化系数可用“M-T”位比特表示,方法四的量化系数天然的可用“M-T”位比特表示,无需clip。
另外,在JPEG-XS中均匀量化的方案是在方法四中取f=0.5,并在码流中传输M。
值得注意的是,QP值越大,Qstep越大,量化越粗糙,量化失真越大,系数编码的码率越小。参数f与量化死区的长度有关,f越小,量化死区越长,量化后的水平值越靠近零点。当f=0.5时,上述S1010的方法四提供的量化和反量化公式相当于四舍五入,量化失真最小。当f<0.5时,f越小,量化失真越大,系数编码的码率越小。H.265中选取:I图像f=1/3,B/P图像f=1/6。
请继续参见图8,上述的S530还可以包括以下步骤。
S1020,获取QP值对应的QG所包含的水平值。
S1030,依据选择的量化器组合,对QG的水平值进行反量化。
其中,该量化器组合包括一个或多个量化器,该量化器为均匀量化器或非均匀量化器。
例如,均匀量化器是指均匀标量量化器,其量化或反量化公式可以参考上述S1010的方法四提供的量化和反量化公式,参数f∈[0,1)有以下取法:
取法一:f取为0.5或其他固定值。
取法二:f可根据QP值、预测模式以及是否做变换来自适应确定。
又如,非均匀量化器是指非均匀标量量化器,量化水平值、量化区间和反量化的重建值之间的对应关系可通过查表获得。一种可能的实现方式如下表1。
表1
Figure PCTCN2022131068-appb-000016
其中,0≤x 0<x 1<x 2<x 3<…,重建值和量化区间是非均匀的,并且重建值是所在量化区间的概率质心。
量化器组合可以采用一个非均匀标量量化器,或者采用多个非均匀标量量化器。
例如,量化器组合是通过码流携带的标记信息确定的。
又如,残差系数在QG中的分布情况确定的。如,视频解码器自适应选择使用哪个量化器,选择依据可以是模式信息或变换信息,模式信息或变换信息与残差系数在QG中的分布情况相关。
针对于上述的S1030中,对QG的水平值进行反量化,其可以包括以下过程。
首先,确定QG的划分域类型。
其次,若QG为划分域类型为变换域,则从解码端的矩阵模板库中选择与QG的参数信息匹配的量化矩阵。
其中,矩阵模板库包括多种类型的量化矩阵模板,该参数信息包括以下任意一种或多种的组合:QG的尺寸、QG所在的CU的尺寸、亮色度通道信息和平坦度信息。
最后,利用QG的量化矩阵对QG中的水平值进行反量化,获得QG的残差系数。
像素域中不同位置的残差系数同等重要,所以不使用量化矩阵。变换域的系数分为低频系数和高频系数,通过量化矩阵使得高低频系数采用不同的量化步长,可以在保证一定压缩率的同时提升图像的主观质量。
量化矩阵的元素分布具有特定的模板,本申请允许不同大小的编码块采用不同的量化矩阵模板,并且大尺寸的量化矩阵可由一种或多种小尺寸的量化矩阵上采样得到。矩阵模板库包括的量化矩阵模板由以下任意一种或多种类型的变换方式获得:离散余弦变换(DCT)、离散正弦变换(DST)、整数变换或离散小波变换(DWT)。如图11所示,图11为本申请提供的量化矩阵模板的示意图,量化矩阵模板具体包括。
(1),如果水平和垂直方向都是DCT/DST变换(参见图11中的(1)),同一反对角线上的系数具有相同的频率,采用相同的量化步长,不同反对角线的系数可以采用不同的量化步长。
(2),如果水平和垂直方向都是小波变换(参见图11中的(2)),采用四叉树模板,四个子块分别对应低/中/高频,同一频段的子块采用相同的量化步长,不同频段的子块可以采用不同的量化步长。
(3),如果垂直方向用DCT/DST变换,水平方向用小波变换(参见图11中的(3)),则先垂直方向二等分,左边子块为小波变换的低频,右边子块为小波变换的高频。每个子块内部上边为DCT/DST变换的低频,下边为DCT/DST变换的高频。可以令矩阵元素A`/B`/C`/D`分别等于A/B/C/D加上同一个偏移值。
(4),如果水平方向用DCT/DST变换,垂直方向用小波变换(参见图11中的(4)),则先水平方向二等分,上边子块为小波变换的低频,下边子块为小波变换的高频。每个子块内部左边为DCT/DST变换的低频,右边为DCT/DST变换的高频。可以令矩阵元素A`/B`/C`/D`分别等于A/B/C/D加上同一个偏移值。
可选的,矩阵模板库包括的多种类型的量化矩阵模板可以包括平坦块模板和纹理块模板。平坦块模板中频率高于频率阈值的残差系数的Qstep大于或等于平坦块模板中频率未达到频率阈值的残差系数的Qstep。纹理块模板中频率高于频率阈值的残差系数的Qstep小于或等于纹理块模板中频率未达到频率阈值的残差系数的Qstep。
例如,视频解码器根据平坦度信息将当前块标记为平坦块或纹理块,然后按照纹理 掩蔽效应设计量化矩阵模板:(1),如果当前块(QG)是平坦块,则量化矩阵中高频系数的Qstep大于等于低频系数的Qstep,因为人眼对平坦块的低频失真比高频失真更敏感,所以高频系数容许更大损失。(2),如果当前块(QG)是纹理块,则量化矩阵中高频系数的Qstep小于等于低频系数的Qstep,因为人眼对纹理块的高频失真比低频失真更敏感,所以优先保护纹理块的高频系数。
也就是说,在图像的解码方法中,视频解码器首先获取QG的QP和Qstep,从码流中解析出水平值,然后自适应选择量化器,对水平值进行反量化,得到重建值,从而实现图像帧的解码。
综上,相较于一个CU中所有的残差系数都采用相同的QP值导致的图像失真较大,在本实施例提供的视频图像的解码过程中,一个CU可以被划分为多个QG,每个QG中的一个或多个残差系数共用一个QP值,进而,视频解码器可以对图像帧对应的一个或多个CU进行更精细粒度的QP决策,在保证一定压缩率的情况下,降低了图像帧的解码失真,提高了视频图像解码的真实性和准确性。
相应的,在图像的编码方法中,视频编码器首先获取QG的QP、Qstep和残差系数;自适应选择量化器,对残差系数进行量化;最后调整量化系数,得到最终的水平值,从而实现图像帧的编码。
在图2示出的视频编码器100的基础上,本申请还提供一种图像的编码方法,如图12所示,图12为本申请提供的一种图像的编码方法的流程示意图,该图像的编码方法可以由视频编码器100执行,也可以由支持视频编码器100的功能的编码端(如图1所示出的编码端10)执行,这里以视频编码器100实现编码方法为例进行说明,该图像的编码方法包括以下步骤。
S1210,视频编码器100将一个图像帧划分为一个或多个CU。
S1220,视频编码器100确定一个图像帧的多个QP值。
在一种示例中,一个CU包括多个QG,一个QG对应一个QP值。
在另一种示例中,一个CU包括多个像素点,一个像素点对应一个QP值,且多个像素点中至少两个像素点的QP值不同。
S1230,视频编码器100依据多个QP值对一个图像帧进行编码。
关于编码方法中QP值的量化,可以参考上述图4~图11中解码方法中的相应过程,此处不予赘述。
如此,相较于一个CU中所有的残差系数都采用相同的QP值导致的图像失真较大,在本实施例提供的视频图像的编码过程中,一个CU可以被划分为多个QG(或像素点),每个QG中的一个或多个残差系数共用一个QP值,进而,视频编码器可以对图像帧对应的一个或多个CU进行更精细粒度的QP决策,在保证一定压缩率的情况下,降低了图像帧的编码失真,提高了视频图像编码的真实性和准确性。
可以理解的是,为了实现上述实施例中功能,视频编码器/视频解码器包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
图13为本申请提供的一种解码装置的结构示意图,该解码装置1300包括码流解析单元1310、QP决策单元1320和图像解码单元1330。解码装置1300可以用于实现上述解码方法实施例中视频解码器或解码端的功能,因此也能实现上述解码方法实施例所具备的有益效果。在本申请的实施例中,该解码装置1300可以是如图1所示的解码端20或视频解码器200,也可以是如图3所示的视频解码器200,还可以是应用于解码端20或视频解码器200的模块。
码流解析单元1310、QP决策单元1320和图像解码单元1330,用于实现图4至图 11中任一实施例提供的解码方法。有关上述码流解析单元1310、QP决策单元1320和图像解码单元1330更详细的描述可以直接参考图4至图11所示的方法实施例中相关描述直接得到,这里不加赘述。
图14为本申请提供的一种编码装置的结构示意图,该编码装置1400包括:图像切分单元1410、QP决策单元1420和图像编码单元1430。编码装置1400可以用于实现上述解码方法实施例中视频编码器或编码端的功能,因此也能实现上述解码方法实施例所具备的有益效果。在本申请的实施例中,该编码装置1400可以是如图1所示的编码端10或视频编码器100,也可以是如图2所示的视频编码器100,还可以是应用于编码端10或视频编码器100的模块。
图像切分单元1410、QP决策单元1420和图像编码单元1430,用于实现图12所提供的编码方法。有关上述图像切分单元1410、QP决策单元1420和图像编码单元1430更详细的描述可以直接参考图4至图12所示的方法实施例中相关描述直接得到,这里不加赘述。
本申请还提供一种电子设备,如图15所示,图15为本申请提供的一种电子设备的结构示意图,电子设备1500包括处理器1510和接口电路1520。处理器1510和接口电路1520之间相互耦合。可以理解的是,接口电路1520可以为收发器或输入输出接口。可选的,电子设备1500还可以包括存储器1530,用于存储处理器1510执行的指令或存储处理器1510运行指令所需要的输入数据或存储处理器1510运行指令后产生的数据。
该电子设备1500包括处理器1510和通信接口1520。处理器1510和通信接口1520之间相互耦合。可以理解的是,通信接口1520可以为收发器或输入输出接口。可选的,电子设备1500还可以包括存储器1530,用于存储处理器1510执行的指令或存储处理器1510运行指令所需要的输入数据,或存储处理器1510运行指令后产生的数据。
当电子设备1500用于实现图4~图11所示的方法时,处理器1510和接口电路1520用于执行上述码流解析单元1310、QP决策单元1320和图像解码单元1350的功能。
当电子设备1500用于实现图12所示的方法时,处理器1510和接口电路1520用于执行上述图像切分单元1410、QP决策单元1420和图像编码单元1430的功能。
本申请实施例中不限定上述通信接口1520、处理器1510以及存储器1530之间的具体连接介质。本申请实施例在图15中以通信接口1520、处理器1510以及存储器1530之间通过总线1540连接,总线在图15中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图15中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器1530可用于存储软件程序及模块,如本申请实施例所提供的解码方法或编码方法对应的程序指令/模块,处理器1510通过执行存储在存储器1530内的软件程序及模块,从而执行各种功能应用以及数据处理。该通信接口1520可用于与其他设备进行信令或数据的通信。在本申请中该电子设备1500可以具有多个通信接口1520。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central processing Unit,CPU)、神经处理器(neural processing unit,NPU)或图形处理器(graphic processing unit,GPU),还可以是其它通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM, EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于网络设备或终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”,一般表示前后关联对象是一种“或”的关系;在本申请的公式中,字符“/”,表示前后关联对象是一种“相除”的关系。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定。

Claims (24)

  1. 一种图像的解码方法,其特征在于,所述方法由解码端执行,所述方法包括:
    解析码流以获得一个或多个图像帧,一个图像帧包括一个或多个编码单元CU;
    确定所述一个图像帧的多个量化参数QP值;其中,一个CU包括多个量化组QG,一个QG对应一个QP值;
    依据所述多个QP值对所述一个图像帧进行解码。
  2. 根据权利要求1所述的方法,其特征在于,所述一个CU包含多个残差系数,所述一个QG包含所述多个残差系数中的一部分残差系数,所述一部分残差系数共用所述一个QP值。
  3. 根据权利要求1或2所述的方法,其特征在于,所述一个QG对应的一个QP值包括亮度QP值和色度QP值;
    确定所述一个图像帧的多个QP值,包括:
    分别获取所述一个QG的亮度QP值和色度QP值;
    或者,
    获取所述一个QG的亮度QP值;
    基于所述亮度QP值确定所述一个QG的色度QP值。
  4. 根据权利要求1所述的方法,其特征在于,
    确定所述一个图像帧的多个QP值,包括:
    获取所述一个QG的预测QP值;
    依据所述一个QG的预测QP值和推导信息,确定所述一个QG的QP值;其中,
    所述推导信息为以下任意一种或几种的组合:所述一个QG的平坦度信息或纹理度信息、所述码流缓冲区的剩余空间或失真约束信息。
  5. 根据权利要求4所述的方法,其特征在于,若所述推导信息为所述失真约束信息,所述失真约束信息指示了所述多个QG中任一个QG的失真阈值;
    则依据所述一个QG的预测QP值和推导信息,确定所述一个QG的QP值,包括:
    确定所述预测QP值对应的预测失真;
    若所述预测失真小于或等于所述失真阈值,将所述预测QP值作为所述QG的QP值;
    若所述预测失真大于所述失真阈值,将由所述失真阈值确定的QP值作为所述QG的QP值。
  6. 根据权利要求4所述的方法,其特征在于,若所述推导信息为所述一个QG的平坦度信息或纹理度信息,或所述码流缓冲区的剩余空间,
    则依据所述一个QG的预测QP值和推导信息,确定所述一个QG的QP值,包括:
    依据所述推导信息确定所述一个QG的QP偏移量;
    将所述一个QG的预测QP值与所述QP偏移量之和作为所述一个QG的QP值。
  7. 根据权利要求4-6中任一项所述的方法,其特征在于,
    获取所述一个QG的预测QP值,包括:
    获取所述一个CU中与所述一个QG相邻的至少一个其他QG的QP值;
    依据所述至少一个其他QG的QP值,确定所述一个QG的预测QP值;
    或者,
    将所述一个CU的QP值作为所述一个QG的预测QP值。
  8. 根据权利要求1所述的方法,其特征在于,
    依据所述多个QP值对所述一个图像帧进行解码,包括:
    针对于所述多个QP值中每一个QP值,获取所述QP值对应的量化步长Qstep;
    获取所述QP值对应的QG所包含的水平值;
    依据选择的量化器组合,对所述QG的水平值进行反量化;所述量化器组合包括一 个或多个量化器。
  9. 根据权利要求8所述的方法,其特征在于,所述量化器为均匀量化器或非均匀量化器。
  10. 根据权利要求1所述的方法,其特征在于,所述一个QG包括所述一个图像帧的一个或多个像素点。
  11. 权利要求1所述的方法,其特征在于,所述多个QG中至少两个QG对应的QP值是不同的。
  12. 一种图像的解码方法,其特征在于,所述方法由解码端执行,所述方法包括:
    解析码流以获得一个或多个图像帧,一个图像帧包括一个或多个编码单元CU;
    确定所述一个图像帧的多个量化参数QP值;其中,一个CU包括多个像素点,一个像素点对应一个QP值,且所述多个像素点中至少两个像素点的QP值不同;
    依据所述多个QP值对所述一个图像帧进行解码。
  13. 根据权利要求12所述的方法,其特征在于,
    确定所述一个图像帧的多个QP值,包括:
    获取所述一个像素点的预测QP值;
    依据所述一个像素点的预测QP值和推导信息,确定所述一个像素点的QP值;其中,推导信息为所述一个像素点周围的一个或多个已重建像素点的信息。
  14. 根据权利要求13所述的方法,其特征在于,
    所述一个像素点的预测QP值为所述一个像素点所在的CU或QG的QP值,或者根据所述一个像素点周围的一个或多个已重建像素点的QP值推导得到,其中,推导的方法包括计算均值、中位数或众数中至少一种。
  15. 根据权利要求13或14所述的方法,其特征在于,
    所述已重建像素点为以所述一个像素点为中心,边长为3或5的正方形区域中的像素点,或者对角线长度为3或5的菱形区域中的像素点。
  16. 根据权利要求13所述的方法,其特征在于,所述已重建像素点的信息包括以下任意一种或几种的组合:
    像素值、平坦度信息或纹理度信息、背景亮度、对比度。
  17. 根据权利要求13所述的方法,其特征在于,
    依据所述一个像素点的预测QP值和推导信息,确定所述一个像素点的QP值,包括:
    根据所述一个像素点周围的一个或多个已重建像素点的信息确定所述像素点的指示信息;
    如果所述指示信息小于或等于第一阈值,并且所述预测QP值大于或等于人眼恰可察觉失真对应的QP值,则将人眼恰可察觉失真对应的QP值作为所述像素点的QP值;
    其中,所述人眼恰可察觉失真对应的QP值是预设值,或者,从码流中解析得到,或者,根据周围的已重建CU的平坦度信息或纹理度信息、背景亮度、对比度信息推导得到。
  18. 一种图像的编码方法,其特征在于,所述方法由编码端执行,所述方法包括:
    将一个图像帧划分为一个或多个编码单元CU;
    确定所述一个图像帧的多个量化参数QP值;其中,一个CU包括多个量化组QG,一个QG对应一个QP值;
    依据所述多个QP值对所述一个图像帧进行编码。
  19. 一种图像的编码方法,其特征在于,所述方法由编码端执行,所述方法包括:
    将一个图像帧划分为一个或多个编码单元CU;
    确定所述一个图像帧的多个量化参数QP值;其中,一个CU包括多个像素点,一个像素点对应一个QP值,且所述多个像素点中至少两个像素点的QP值不同;
    依据所述多个QP值对所述一个图像帧进行编码。
  20. 一种图像的解码装置,其特征在于,所述解码装置包括:码流解析单元、QP决策单元和图像解码单元;
    所述码流解析单元、所述QP决策单元和所述图像解码单元,用于实现权利要求1至权利要求17中任一项所述的方法。
  21. 一种图像的编码装置,其特征在于,所述编码装置包括:图像切分单元、QP决策单元和图像编码单元;
    所述图像切分单元、所述QP决策单元和所述图像编码单元,用于实现权利要求18或权利要求19所述的方法。
  22. 一种视频译码系统,其特征在于,包括编码端和解码端,所述编码端与所述解码端通信连接,所述解码端用于实现权利要求1至权利要求17中任一项所述的方法,所述编码端用于实现权利要求18或权利要求19所述的方法。
  23. 一种电子设备,其特征在于,包括处理器和存储器,所述存储器用于存储计算机指令,所述处理器用于从存储器中调用并运行所述计算机指令,实现权利要求1至权利要求19中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被电子设备执行时,实现权利要求1至19中任一项所述的方法。
PCT/CN2022/131068 2021-11-11 2022-11-10 解码方法、编码方法及装置 WO2023083245A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111334223.8 2021-11-11
CN202111334223.8A CN116074531A (zh) 2021-11-11 2021-11-11 解码方法、编码方法及装置

Publications (1)

Publication Number Publication Date
WO2023083245A1 true WO2023083245A1 (zh) 2023-05-19

Family

ID=86170459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131068 WO2023083245A1 (zh) 2021-11-11 2022-11-10 解码方法、编码方法及装置

Country Status (3)

Country Link
CN (2) CN116527927A (zh)
TW (1) TWI829424B (zh)
WO (1) WO2023083245A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109479139A (zh) * 2016-07-28 2019-03-15 联发科技股份有限公司 视频处理系统中参考量化参数推导的方法与装置
CN110881129A (zh) * 2018-09-05 2020-03-13 华为技术有限公司 视频解码方法及视频解码器
CN111602395A (zh) * 2018-01-19 2020-08-28 高通股份有限公司 用于视频译码的量化组
WO2020231228A1 (ko) * 2019-05-15 2020-11-19 현대자동차주식회사 영상 복호화 장치에서 이용하는 역양자화장치 및 방법
CN112997486A (zh) * 2018-09-11 2021-06-18 北京达佳互联信息技术有限公司 使用低复杂度网格编码量化的视频解码的方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109479139A (zh) * 2016-07-28 2019-03-15 联发科技股份有限公司 视频处理系统中参考量化参数推导的方法与装置
CN111602395A (zh) * 2018-01-19 2020-08-28 高通股份有限公司 用于视频译码的量化组
CN110881129A (zh) * 2018-09-05 2020-03-13 华为技术有限公司 视频解码方法及视频解码器
CN112997486A (zh) * 2018-09-11 2021-06-18 北京达佳互联信息技术有限公司 使用低复杂度网格编码量化的视频解码的方法和装置
WO2020231228A1 (ko) * 2019-05-15 2020-11-19 현대자동차주식회사 영상 복호화 장치에서 이용하는 역양자화장치 및 방법

Also Published As

Publication number Publication date
CN116074531A (zh) 2023-05-05
TW202327358A (zh) 2023-07-01
CN116527927A (zh) 2023-08-01
TWI829424B (zh) 2024-01-11

Similar Documents

Publication Publication Date Title
US10778978B2 (en) System and method of cross-component dynamic range adjustment (CC-DRA) in video coding
TW201841501A (zh) 用於視訊寫碼之多種類型樹架構
US10362310B2 (en) Entropy coding techniques for display stream compression (DSC) of non-4:4:4 chroma sub-sampling
US20110317757A1 (en) Intra prediction mode signaling for finer spatial prediction directions
TW201836355A (zh) 視訊解碼方法
KR20190029796A (ko) 디스플레이 스트림 압축 (dsc) 을 위한 엔트로피 코딩 기법들
JP2018531556A6 (ja) 非4:4:4クロマサブサンプリングのディスプレイストリーム圧縮(dsc)のためのエントロピーコーディング技法
WO2019086033A1 (zh) 视频数据解码方法及装置
US20230379500A1 (en) Residual and coefficients coding for video coding
WO2018203202A1 (en) Quantization partitioning for enhanced image compression
WO2024022367A1 (zh) 图像解码方法、编码方法及装置
WO2024022359A1 (zh) 一种图像编解码方法及装置
WO2024022039A1 (zh) 一种视频图像解码方法、编码方法、装置及存储介质
CN116684609A (zh) 图像编解码方法、装置及存储介质
WO2023083245A1 (zh) 解码方法、编码方法及装置
WO2023138562A1 (zh) 图像解码方法、图像编码方法及相应的装置
WO2023138532A1 (zh) 一种视频解码方法、装置、视频解码器及存储介质
WO2023138391A1 (zh) 系数解码方法、装置、图像解码器及电子设备
TWI821013B (zh) 視頻編解碼方法及裝置
US20240179314A1 (en) Residual and coefficients coding for video coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22892039

Country of ref document: EP

Kind code of ref document: A1