WO2022156688A1 - 分层编解码的方法及装置 - Google Patents

分层编解码的方法及装置 Download PDF

Info

Publication number
WO2022156688A1
WO2022156688A1 PCT/CN2022/072627 CN2022072627W WO2022156688A1 WO 2022156688 A1 WO2022156688 A1 WO 2022156688A1 CN 2022072627 W CN2022072627 W CN 2022072627W WO 2022156688 A1 WO2022156688 A1 WO 2022156688A1
Authority
WO
WIPO (PCT)
Prior art keywords
component
signal
signal component
quality factor
feature map
Prior art date
Application number
PCT/CN2022/072627
Other languages
English (en)
French (fr)
Inventor
毛珏
杨海涛
王晶
崔泽
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to MX2023008449A priority Critical patent/MX2023008449A/es
Priority to KR1020237026723A priority patent/KR20230129068A/ko
Priority to JP2023543103A priority patent/JP2024503712A/ja
Priority to EP22742167.4A priority patent/EP4277274A1/en
Publication of WO2022156688A1 publication Critical patent/WO2022156688A1/zh
Priority to US18/223,126 priority patent/US20240007658A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks

Definitions

  • Embodiments of the present invention relate to the technical field of artificial intelligence (AI)-based video or image compression, and in particular, to a method and apparatus for layered encoding and decoding.
  • AI artificial intelligence
  • Video compression codec technology has a wide range of applications in multimedia services, broadcasting, video communication and storage, such as broadcast digital TV, video transmission over the Internet and mobile networks, real-time conversation applications such as video chat and video conferencing, DVD and Blu-ray Security applications for optical discs, video content capture and editing systems, and camcorders.
  • Video compression devices typically use software and/or hardware on the source side to encode video data prior to transmission or storage, thereby reducing the amount of data required to represent digital video images. Then, the compressed data is received by the video decompression device at the destination side.
  • the module in the super-prior structure extracts the latent variables of the estimated probability distribution, and the latent variables are transmitted to the decoding end as side information through quantization and arithmetic coding.
  • the bit rate ratio of the Y, U, and V components is fixed.
  • the YUV component code rate is fixed, which will lead to large distortion of the encoded image.
  • the present application provides a method and device for layered encoding and decoding, which can adapt to image content with different color characteristics.
  • layered codec refers to dividing a video signal into a first signal component and a second signal component; or dividing a video signal into a first signal component, a second signal component and a third signal component.
  • the first signal component is a Y component
  • the second signal component is a UV component, a U component or a V component.
  • the third signal component is a V component, or when the second signal component is a V component, the third signal component is a U component.
  • the present application provides an encoding method.
  • the encoding method comprises: applying a control signal of a first signal component of the video signal to a first characteristic map of the first signal component to obtain a second characteristic map of the first signal component, wherein the first characteristic map is obtained.
  • the control signal of a signal component is obtained by learning;
  • the control signal of the second signal component of the video signal is applied to the first feature map of the second signal component to obtain the second feature map of the second signal component,
  • the control signal of the second signal component is obtained by learning; and the code stream of the video signal is obtained according to the second characteristic map of the first signal component and the second characteristic map of the second signal component.
  • obtaining the code stream of the video signal according to the second feature map of the first signal component and the second feature map of the second signal component includes: the second feature map of the component and the second feature map of the second signal component, or, for the second feature map of the first signal component and the second feature map of the second signal component processed by the neural network , or, for the second feature map of the first signal component processed by the neural network and the second feature map of the second signal component, or, for the second feature map of the first signal component processed by the neural network
  • the feature map and the second feature map of the second signal component processed by the neural network are entropy encoded to obtain the code stream of the video signal.
  • obtaining the code stream of the video signal according to the second feature map of the first signal component and the second feature map of the second signal component includes: The second feature map of the component and the second feature map of the second signal component are jointly processed to obtain a combined feature map, and entropy coding is performed on the combined feature map to obtain a code stream of the video signal; Or, perform joint processing on the second feature map of the first signal component and the second feature map of the second signal component processed by the neural network to obtain a combined feature map, and perform a combined feature map on the combined feature map.
  • Entropy coding to obtain the code stream of the video signal; or, performing joint processing on the second feature map of the first signal component and the second feature map of the second signal component processed by the neural network to obtain The combined feature map, entropy coding is performed on the combined feature map to obtain the code stream of the video signal; or, the second feature map of the first signal component processed by the neural network is processed by the neural network.
  • the second feature map of the second signal component is subjected to joint processing to obtain a joint feature map, and entropy coding is performed on the joint feature map to obtain a code stream of the video signal.
  • control signal of the first signal component is obtained from N candidate first control signals according to the quality factor of the first signal component, where N is an integer greater than 1; and according to The quality factor of the second signal component obtains the control signal of the second signal component from M candidate second control signals, where M is an integer greater than 1.
  • N and M may be equal or unequal, which is not limited in this application.
  • the first signal component is a Y component and the second signal component is a UV component
  • a control signal matrix ⁇ q y1 , q y2 , . . . q yi . _ _ _
  • the index i obtains the control signal q yi of the first signal component
  • the control signal qu uvj of the second signal component is obtained according to the index j of the quality factor of the UV component.
  • the code stream of the video signal includes the index i of the quality factor of the Y component and the index j of the quality factor of the UV component.
  • the first signal component is a Y component and the second signal component is a UV component
  • a control signal matrix ⁇ q c1 , q c2 , ...q ci ... q cN ⁇ , where c is 2 representing the Y component and the UV component, and N is an integer greater than 1; according to the index i of the quality factor of the video signal, the first signal component and the second signal component are obtained.
  • the code stream of the video signal includes the index i of the quality factor of the video signal. or
  • the first signal component is a Y component and the second signal component is a UV component
  • the quality factor of the Y component is used as a fully connected
  • the input of the network outputs the control signal of the Y component; the quality factor of the UV component is used as the input of the fully connected network, and the control signal of the UV component is output.
  • the code stream of the video signal includes the quality factor of the Y component and the quality factor of the UV component.
  • the method further includes: applying a control signal of a third signal component of the video signal to the third signal
  • the first feature map of the component is obtained, and the second feature map of the third signal component is obtained, wherein the control signal of the third signal component is obtained by learning, wherein when the second signal component is the U component, the first feature map of the third signal component is obtained.
  • the three signal components are V components, or when the second signal components are V components, the third signal components are U components.
  • the control for generating the Y component is learned by learning
  • the control signal q yi of the first signal component is obtained according to the index i of the quality factor of the Y component;
  • the control signal q uj of the second signal component is obtained according to the index j of the quality factor of the U component;
  • the control signal q vk of the third signal component is obtained according to the index k of the quality factor of the V component.
  • the code stream of the video signal includes the index i of the quality factor of the Y component, the index j of the quality factor of the U component, and the index k of the quality factor of the V component.
  • the video signal is generated by learning Control signal matrix ⁇ q c1 , q c2 , ... q ci ... q cN ⁇ , wherein c is 3 representing Y component, U component and V component, and N is an integer greater than 1; according to the index i of the quality factor of the video signal A control signal qci comprising the first signal component, the second signal component and the third signal component is obtained.
  • the code stream of the video signal includes the index i of the quality factor of the video signal.
  • the first signal component is a Y component
  • the second signal component is a U component
  • the third signal component is a V component
  • the The quality factor of the Y component is used as the input of the fully connected network, and the control signal of the Y component is output
  • the quality factor of the U component is used as the input of the fully connected network, and the control signal of the U component is output
  • the quality factor is used as the input of the fully connected network, and the control signal of the V component is output.
  • the code stream of the video signal includes the quality factor of the Y component, the quality factor of the U component, and the quality factor of the V component.
  • the present application provides a decoding method.
  • the decoding method includes: obtaining a code stream of the video signal; entropy decoding the code stream to obtain a feature map of a first signal component of the video signal and a feature map of a second signal component of the video signal ; Obtain a reconstruction map of the first signal component according to the response signal of the first signal component and the feature map of the first signal component, wherein the response signal of the first signal component is obtained by learning; according to the A response signal of the second signal component and a feature map of the second signal component, obtaining a reconstruction map of the second signal component, wherein the response signal of the second signal component is obtained by learning; and according to the first signal a reconstructed map of the components and a reconstructed map of the second signal component, reconstructing the video signal.
  • the response signal of the decoding end is similar to the control signal of the encoding end, and in order to distinguish the decoding end is called a response signal, and the encoding end is called a control signal.
  • the response signal at the decoding end includes a response vector, or includes a response vector and an offset vector.
  • the code stream further includes quality factor information of the first signal component and quality factor information of the second signal component, wherein the quality factor information of the first signal component is The quality factor of the first signal component or the index of the quality factor of the first signal component, and the quality factor information of the second signal component is the quality factor of the second signal component or the quality factor of the second signal component.
  • the index of the quality factor obtain the response signal of the first signal component through the quality factor information of the first signal component; obtain the response signal of the second signal component through the quality factor information of the second signal component .
  • the quality factor of the first signal component when the quality factor information of the first signal component is the quality factor of the first signal component, the quality factor of the first signal component is one of N; when the first signal component When the quality factor information of the first signal component is the index of the quality factor of the first signal component, the value range of the index of the quality factor of the first signal component is 0 to N-1 or 1 to N, where N is greater than 1 integer;
  • the quality factor of the second signal component takes one of M values; when the quality factor of the second signal component is When the factor information is the index of the quality factor of the second signal component, the value range of the index of the quality factor of the second signal component is 0 to M-1 or 1 to M, where M is an integer greater than 1.
  • the code stream includes the index i of the quality factor of the Y component and the The index j of the quality factor of the UV component is generated by learning the response signal matrix ⁇ g y1 , g y2 , ... g yi ...
  • the response signal g yi of the first signal component is obtained according to the index i of the quality factor of the Y component;
  • the index j of the quality factor of the UV component yields the response signal g uvj of the second signal component.
  • the code stream includes the index i of the quality factor of the video signal
  • the response signal matrix ⁇ g c1 , g c2 , ... g ci ... g cN ⁇ of the video signal is generated by learning, where c is 2 to represent the Y component and the UV component, and N is an integer greater than 1;
  • the index i of the quality factor results in a response signal gci comprising the first signal component and the second signal component.
  • the code stream includes the quality factor of the first signal component and the first signal component
  • the quality factor of the two signal components is realized by the fully connected network, then the quality factor of the Y component is used as the input of the fully connected network, and the response signal of the Y component is output; the quality factor of the UV component is used as the fully connected network. input, and output the response signal of the UV component.
  • the method further includes: performing entropy decoding on the code stream to obtain a third signal component of the video signal. a feature map; obtaining a reconstruction map of the third signal component according to the response signal of the third signal component and the feature map of the third signal component, wherein the response signal of the third signal component is obtained by learning, wherein
  • the reconstructing the video signal includes: reconstructing the video signal according to the reconstructed map of the first signal component, the reconstructed map of the second signal component and the reconstructed map of the third signal component.
  • the code stream includes the The index i of the quality factor of the Y component, the index j of the quality factor of the U component, and the index k of the quality factor of the V component, generate the response signal matrix ⁇ g y1 , g of the first signal component by learning y2 , ... g yi ... g yN ⁇ , the response signal matrix ⁇ g u1 , g u2 , ...
  • the response signal g yi of the first signal component is obtained according to the index i of the quality factor of the Y component; according to the The index j of the U component quality factor obtains the response signal guj of the second signal component; and the response signal g vk of the third signal component is obtained according to the index k of the V component quality factor.
  • the code stream includes all
  • the index i of the quality factor of the video signal is generated by learning the response signal matrix ⁇ g c1 , g c2 , ... g ci ... g cN ⁇ of the video signal, where c is 3 and represents the Y component, the U component and the V component, N is an integer greater than 1; a response signal g ci including the first signal component, the second signal component and the third signal component is obtained according to the index i of the quality factor of the video signal.
  • the code stream includes all The quality factor of the first signal component, the quality factor of the second signal component and the quality factor of the third signal component are realized by a fully connected network, then the quality factor of the Y component is used as the input of the fully connected network , output the response signal of the Y component; use the quality factor of the U component as the input of the fully connected network, output the response signal of the U component; and use the quality factor of the V component as the input of the fully connected network, output the quality factor of the V component response signal.
  • the present application provides an encoder, including a processing circuit, configured to execute the method according to any one of the first aspect and the first aspect.
  • the present application provides a decoder, including a processing circuit, configured to perform the method described in any one of the second aspect and the second aspect.
  • the present application provides a computer program product, comprising program code, which, when executed on a computer or processor, is used to execute any one of the above-mentioned first aspect and the first aspect, the above-mentioned second aspect and the second aspect.
  • the method of any of the aspects is not limited to:
  • the present application provides an encoder, comprising: one or more processors; a non-transitory computer-readable storage medium coupled to the processors and storing a program executed by the processors, wherein the When executed by the processor, the program causes the decoder to execute the method according to any one of the first aspect and the first aspect.
  • the present application provides a decoder comprising: one or more processors; a non-transitory computer-readable storage medium coupled to the processors and storing a program executed by the processors, wherein the When executed by the processor, the program causes the encoder to execute the method described in any one of the second aspect and the method described in the second aspect.
  • the present application provides a non-transitory computer-readable storage medium, comprising program code, which, when executed by a computer device, is used to execute any one of the above-mentioned first aspect and the first aspect, the above-mentioned second aspect and The method of any one of the second aspect.
  • the present invention relates to an encoding device, which has the function of implementing the behavior in the above-mentioned first aspect or any one of the method embodiments of the first aspect.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the encoding device includes: a first control module, configured to apply a control signal of a first signal component of the video signal to a first feature map of the first signal component to obtain the a second characteristic map of the first signal component, wherein the control signal of the first signal component is obtained by learning; a second control module, used for applying the control signal of the second signal component of the video signal to the second a first feature map of the signal component to obtain a second feature map of the second signal component, wherein the control signal of the second signal component is obtained by learning; and an encoding module for obtaining a second feature map of the first signal component according to the first Two feature maps and a second feature map of the second signal component to obtain the code stream of the video signal.
  • These modules may perform the corresponding functions in the first aspect or any method example of the first aspect. For details, please refer to the detailed description in the method example, which will not be repeated here.
  • the present invention relates to a decoding apparatus, which has the function of implementing the behavior in the method embodiment of any one of the second aspect or the second aspect.
  • the functions can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the decoding device includes: a decoding module, configured to obtain a code stream of the video signal, and perform entropy decoding on the code stream to obtain a feature map of the first signal component of the video signal and a feature map of the second signal component of the video signal; a first control module configured to obtain a reconstruction of the first signal component according to the response signal of the first signal component and the feature map of the first signal component The response signal of the first signal component is obtained by learning; the second control module is configured to obtain the second signal according to the response signal of the second signal component and the characteristic map of the second signal component a reconstructed map of the component, wherein the response signal of the second signal component is obtained by learning; and a reconstruction module, configured to reconstruct the video according to the reconstructed map of the first signal component and the reconstructed map of the second signal component Signal.
  • These modules may perform the corresponding functions in the second aspect or any method example of the second aspect. For details, please refer to the detailed description in the method example, which will not be repeated here.
  • the code rate of the YUV component is fixed. Due to the different color characteristics of different images, the fixed bit rate allocation will lead to the phenomenon of poor coding performance on some video image content.
  • the above aspects of the present application respectively control the feature maps of the corresponding signal components through the control signal of each signal component, so that the bit rate allocation between the YUV components can be supported, and the image content with different color characteristics can be adapted.
  • FIG. 1A is a block diagram of an example video coding system for implementing embodiments of the present invention, wherein the system encodes or decodes video images based on deep learning;
  • FIG. 1B is a block diagram of another example of an example of a video coding system for implementing embodiments of the present invention, wherein the system encodes or decodes video images based on deep learning;
  • FIG. 1C is a block diagram of yet another example of a video coding system for implementing embodiments of the present invention, wherein the video encoder and/or video decoder encode or decode video images based on deep learning;
  • FIG. 2 is a block diagram of an example example of a video encoder for implementing embodiments of the present invention, wherein the video encoder 20 encodes video images based on deep learning;
  • FIG. 3 is a block diagram of an example example of a video decoder for implementing embodiments of the present invention, wherein the video decoder 30 decodes video images based on deep learning;
  • FIG. 4 is a schematic block diagram of a video decoding apparatus for implementing an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of a video decoding apparatus for implementing an embodiment of the present invention.
  • Figure 6 is a schematic diagram of the YUV format
  • FIG. 7A is a schematic diagram of a layered coding and decoding structure provided by an embodiment of the present application.
  • Fig. 7B is an embodiment based on the encoding method of Fig. 7A;
  • Fig. 7C is an embodiment based on the decoding method of Fig. 7A;
  • 7D is another schematic diagram of a layered coding and decoding structure provided by an embodiment of the present application.
  • FIG. 8A is a schematic diagram of a layered coding and decoding structure provided by an embodiment of the present application.
  • Figure 8B is an embodiment of Figure 8A
  • 9A is a schematic diagram of a layered coding and decoding structure provided by an embodiment of the present application.
  • Figure 9B is an embodiment of Figure 9A
  • FIG. 10 is a schematic structural diagram illustrating an encoding apparatus 1000 according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram illustrating a decoding apparatus 1100 according to an embodiment of the present application.
  • Embodiments of the present application provide an AI-based video image compression technology, and specifically provide a method and apparatus in layered encoding and decoding, so as to improve a traditional end-to-end hybrid video encoding and decoding system.
  • Video coding generally refers to the processing of sequences of images that form a video or video sequence. In the field of video coding, the terms “picture”, “frame” or “image” may be used as synonyms.
  • Video encoding (or commonly referred to as encoding) includes two parts: video encoding and video decoding. Video encoding is performed on the source side and typically involves processing (eg, compressing) the original video image to reduce the amount of data required to represent the video image (and thus store and/or transmit more efficiently). Video decoding is performed on the destination side and typically involves inverse processing relative to the encoder to reconstruct the video image.
  • the "encoding" of a video image in relation to the embodiments should be understood as the “encoding” or “decoding” of a video image or a video sequence.
  • the encoding part and the decoding part are also collectively referred to as codec (encoding and decoding, CODEC).
  • the original video image can be reconstructed, ie the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission).
  • further compression is performed through quantization, etc. to reduce the amount of data required to represent the video image, and the decoder side cannot completely reconstruct the video image, that is, the quality of the reconstructed video image is higher than that of the original video image. low or poor.
  • encoders typically process and encode video at the block (video block) level, eg, by spatial (intra) prediction and temporal (inter) prediction to generate prediction blocks; block) to subtract the prediction block to get the residual block; transform the residual block in the transform domain and quantize the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side will process inversely with respect to the encoder Partially applied to encoded or compressed blocks to reconstruct the current block for representation. Additionally, the encoder needs to repeat the decoder's processing steps so that the encoder and decoder generate the same predictions (eg, intra- and inter-prediction) and/or reconstructed pixels for processing, ie, encoding subsequent blocks.
  • the encoder needs to repeat the decoder's processing steps so that the encoder and decoder generate the same predictions (eg, intra- and inter-prediction) and/or reconstructed pixels for processing, ie, encoding subsequent blocks.
  • FIG. 1A is a schematic block diagram of an exemplary decoding system. As shown in FIG. 1A , after a video is captured by a video capture device, a series of preprocessing is performed, and then the processed video is compressed and encoded to obtain an encoded code stream. Use the sending module to send the code stream through the transmission network to the receiving module, and after decoding by the decoder, it can be rendered and displayed. In addition, the code stream after video encoding can also be directly stored.
  • IB is a schematic block diagram of an exemplary coding system 10, such as a video coding system 10 (or simply coding system 10) that may utilize the techniques of this application.
  • Video encoder 20 (or encoder 20 for short) and video decoder 30 (or decoder 30 for short) in video coding system 10 represent devices, etc. that may be used to perform techniques in accordance with the various examples described in this application .
  • the decoding system 10 includes a source device 12 for providing encoded image data 21 such as encoded images to a destination device 14 for decoding the encoded image data 21 .
  • the source device 12 includes an encoder 20 and, alternatively, an image source 16 , a preprocessor (or preprocessing unit) 18 such as an image preprocessor, and a communication interface (or communication unit) 22 .
  • Image source 16 may include or be any type of image capture device for capturing real-world images, etc., and/or any type of image generation device, such as a computer graphics processor or any type of user for generating computer animation images. Devices used to acquire and/or provide real-world images, computer-generated images (e.g., screen content, virtual reality (VR) images, and/or any combination thereof (e.g., augmented reality, AR) images).
  • the image source may be any type of memory or storage that stores any of the above-mentioned images.
  • the image (or image data) 17 may also be referred to as the original image (or original image data) 17 .
  • the preprocessor 18 is used to receive (raw) image data 17 and to preprocess the image data 17 to obtain a preprocessed image (preprocessed image data) 19 .
  • the preprocessing performed by the preprocessor 18 may include trimming, color format conversion (eg, from RGB to YCbCr), toning, or denoising. It is understood that the preprocessing unit 18 may be an optional component.
  • a video encoder (or encoder) 20 is used to receive preprocessed image data 19 and to provide encoded image data 21 (described further below with respect to FIG. 2 etc.).
  • the communication interface 22 in the source device 12 can be used to: receive the encoded image data 21 and send the encoded image data 21 (or any other processed version) over the communication channel 13 to another device such as the destination device 14 or any other device for storage or rebuild directly.
  • the destination device 14 includes a decoder 30 and may additionally, alternatively, include a communication interface (or communication unit) 28 , a post-processor (or post-processing unit) 32 and a display device 34 .
  • the communication interface 28 in the destination device 14 is used to receive the encoded image data 21 (or any other processed version) directly from the source device 12 or from any other source device such as a storage device, for example, the storage device is an encoded image data storage device, The encoded image data 21 is supplied to the decoder 30 .
  • Communication interface 22 and communication interface 28 may be used through a direct communication link between source device 12 and destination device 14, such as a direct wired or wireless connection, etc., or through any type of network, such as a wired network, a wireless network, or any Combination, any type of private network and public network, or any type of combination, send or receive encoded image data (or encoded data) 21 .
  • the communication interface 22 may be used to encapsulate the encoded image data 21 into a suitable format such as a message, and/or use any type of transfer encoding or processing to process the encoded image data for transmission over a communication link or communication network transfer on.
  • the communication interface 28 corresponds to the communication interface 22 and may be used, for example, to receive transmission data and process the transmission data using any type of corresponding transmission decoding or processing and/or decapsulation to obtain encoded image data 21 .
  • Both the communication interface 22 and the communication interface 28 can be configured as a one-way communication interface as indicated by the arrow in FIG. 1B from the corresponding communication channel 13 of the source device 12 to the destination device 14, or a two-way communication interface, and can be used to send and receive messages etc. to establish a connection, acknowledge and exchange any other information related to a communication link and/or data transfer such as encoded image data transfer, etc.
  • a video decoder (or decoder) 30 is used to receive the encoded image data 21 and provide a decoded image (or decoded image data) 31 (described further below with reference to FIG. 3 etc.).
  • the post-processor 32 is configured to perform post-processing on the decoded image data 31 (also referred to as reconstructed image data) such as a decoded image to obtain post-processed image data 33 such as a post-processed image.
  • Post-processing performed by post-processing unit 32 may include, for example, color format conversion (eg, from YCbCr to RGB), toning, trimming, or resampling, or any other processing used to generate decoded image data 31 for display by display device 34, etc. .
  • a display device 34 is used to receive post-processed image data 33 to display the image to a user or viewer or the like.
  • Display device 34 may be or include any type of display for representing the reconstructed image, eg, an integrated or external display screen or display.
  • the display screen may include a liquid crystal display (LCD), an organic light emitting diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display ), digital light processor (DLP), or any other type of display.
  • the decoding system 10 also includes a training engine 25 for training the encoder 20 or the decoder 30 to perform hierarchical encoding processing on the reconstructed image.
  • the training data in the embodiment of the present application includes: a training matrix set, where the training matrix set includes a pre-filtering luminance matrix, a quantization step size matrix, and a post-filtering luminance matrix of the image block, wherein the pixel points at the corresponding positions in the pre-filtering luminance matrix correspond to The luminance value of the pixel at the corresponding position in the corresponding image block before filtering, the pixel point at the corresponding position in the quantization step size matrix corresponds to the quantization step size value corresponding to the luminance value of the pixel at the corresponding position in the corresponding image block, after filtering The pixel point at the corresponding position in the luminance matrix corresponds to the filtered luminance value of the pixel at the corresponding position in the corresponding image block.
  • a plurality of matrices in the set of training matrices may be input to the training engine 25, for example, in the manner shown in Figures 6a to 6c.
  • multiple matrices in the training matrix set are directly input to the training engine 25, and the multiple matrices are all two-dimensional matrices.
  • FIG. 6 b some or all of the multiple matrices in the training matrix set are selected and processed for merging to obtain a multi-dimensional matrix, and then the multi-dimensional matrix is input to the training engine 25 .
  • FIG. 6c some or all of the multiple matrices in the training matrix set are selected for addition (or multiplication) to obtain a two-dimensional matrix, and then the two-dimensional matrix is input to the training engine 25 .
  • the above-mentioned training data may be stored in a database (not shown), and the training engine 25 trains to obtain a target model based on the training data (for example, it may be a neural network for hierarchical encoding and decoding, etc.). It should be noted that the embodiments of the present application do not limit the source of the training data, for example, the training data may be obtained from the cloud or other places to perform model training.
  • the training engine 25 trains the target model so that the pre-filtered pixels approximate the original pixel values.
  • Each training process can use a mini-batch size of 64 images and an initial learning rate of 1e-4, following a step size of 10.
  • the training data may be data generated by the encoder under different QP quantization parameter settings.
  • the target model can be used to implement the layered encoding and decoding method provided by the embodiments of the present application, that is, the reconstructed image or image block is input into the target model after relevant preprocessing, and the filtered image or image block can be obtained.
  • the target model in the embodiment of the present application may specifically be a filter network, and the target model will be described in detail below with reference to FIGS. 7A-7D .
  • the target model trained by the training engine 25 can be applied to the decoding system 10, for example, the source device 12 (eg, the encoder 20) or the destination device 14 (eg, the decoder 30) shown in FIG. 1B.
  • the training engine 25 can train on the cloud to obtain the target model, and then the decoding system 10 downloads and uses the target model from the cloud; or, the training engine 25 can train on the cloud to obtain the target model and use the target model, and the decoding system 10 directly downloads the target model from the cloud. Get the processing result.
  • FIG. 1B shows the source device 12 and the destination device 14 as separate devices
  • device embodiments may also include the source device 12 and the destination device 14 or the functions of the source device 12 and the destination device 14 at the same time, that is, include the source device 12 and the destination device 14 at the same time.
  • Device 12 or corresponding function and destination device 14 or corresponding function In these embodiments, source device 12 or corresponding functionality and destination device 14 or corresponding functionality may be implemented using the same hardware and/or software or by separate hardware and/or software, or any combination thereof.
  • the existence and (exact) division of the different units or functions in the source device 12 and/or the destination device 14 shown in FIG. 1B may vary depending on the actual device and application.
  • Encoder 20 eg, video encoder 20 or decoder 30 (eg, video decoder 30), or both, may be implemented by processing circuitry as shown in FIG. 1C, eg, one or more microprocessors, digital signal processors (digital signal processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA), discrete logic, hardware, special-purpose processor for video encoding, or any combination thereof .
  • Encoder 20 may be implemented by processing circuitry 46 to include the various modules discussed with reference to encoder 20 of FIG. 2 and/or any other encoder system or subsystem described herein.
  • Decoder 30 may be implemented by processing circuitry 46 to include the various modules discussed with reference to decoder 30 of FIG.
  • the processing circuitry 46 may be used to perform various operations discussed below. As shown in FIG. 5, if parts of the techniques are implemented in software, a device may store the instructions of the software in a suitable non-transitory computer-readable storage medium and execute the instructions in hardware using one or more processors, thereby Implement the techniques of the present invention.
  • One of the video encoder 20 and the video decoder 30 may be integrated in a single device as part of a combined codec (encoder/decoder, CODEC), as shown in FIG. 1C .
  • Source device 12 and destination device 14 may include any of a variety of devices, including any type of handheld or stationary device, such as a laptop or laptop, cell phone, smartphone, tablet or tablet, camera, Desktop computers, set-top boxes, televisions, display devices, digital media players, video game consoles, video streaming devices (eg, content service servers or content distribution servers), broadcast receiving equipment, broadcast transmitting equipment, etc., and may not Use or use any type of operating system.
  • source device 12 and destination device 14 may be equipped with components for wireless communication.
  • source device 12 and destination device 14 may be wireless communication devices.
  • the video coding system 10 shown in FIG. 1B is merely exemplary, and the techniques provided herein may be applicable to video coding settings (eg, video encoding or video decoding) that do not necessarily include the encoding device and the Decode any data communication between devices.
  • data is retrieved from local storage, sent over a network, and so on.
  • the video encoding device may encode and store the data in memory, and/or the video decoding device may retrieve and decode the data from the memory.
  • encoding and decoding are performed by devices that do not communicate with each other but merely encode data to and/or retrieve and decode data from memory.
  • Video coding system 40 may include imaging device 41, video encoder 20, video decoder 30 (and/or video encoder/decoder implemented by processing circuit 46), antenna 42, one or more processors 43, a or multiple memory stores 44 and/or display devices 45 .
  • imaging device 41, antenna 42, processing circuit 46, video encoder 20, video decoder 30, processor 43, memory storage 44, and/or display device 45 can communicate with each other.
  • video coding system 40 may include only video encoder 20 or only video decoder 30 .
  • antenna 42 may be used to transmit or receive an encoded bitstream of video data.
  • display device 45 may be used to present video data.
  • Processing circuitry 46 may include application-specific integrated circuit (ASIC) logic, graphics processors, general purpose processors, and the like.
  • Video coding system 40 may also include an optional processor 43, which may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, and the like.
  • the memory memory 44 may be any type of memory, such as volatile memory (eg, static random access memory (SRAM), dynamic random access memory (DRAM), etc.) or non-volatile memory volatile memory (eg, flash memory, etc.), etc.
  • memory storage 44 may be implemented by cache memory.
  • processing circuitry 46 may include memory (eg, cache memory, etc.) for implementing image buffers, and the like.
  • video encoder 20 implemented by logic circuitry may include an image buffer (eg, implemented by processing circuitry 46 or memory memory 44 ) and a graphics processing unit (eg, implemented by processing circuitry 46 ).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include video encoder 20 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 2 and/or any other encoder system or subsystem described herein.
  • Logic circuits may be used to perform the various operations discussed herein.
  • video decoder 30 may be implemented by processing circuitry 46 in a similar manner to implement various of the types discussed with reference to video decoder 30 of FIG. 3 and/or any other decoder systems or subsystems described herein. module.
  • logic circuit-implemented video decoder 30 may include an image buffer (implemented by processing circuit 46 or memory memory 44) and a graphics processing unit (eg, implemented by processing circuit 46).
  • the graphics processing unit may be communicatively coupled to the image buffer.
  • the graphics processing unit may include video decoder 30 implemented by processing circuitry 46 to implement the various modules discussed with reference to FIG. 3 and/or any other decoder system or subsystem described herein.
  • antenna 42 may be used to receive an encoded bitstream of video data.
  • the encoded bitstream may include data, indicators, index values, mode selection data, etc., as discussed herein related to encoded video frames, such as data related to encoded partitions (eg, transform coefficients or quantized transform coefficients). , (as discussed) optional indicators, and/or data defining the encoding split).
  • Video coding system 40 may also include video decoder 30 coupled to antenna 42 for decoding the encoded bitstream.
  • Display device 45 is used to present video frames.
  • video decoder 30 may be used to perform the opposite process.
  • video decoder 30 may be operable to receive and parse such syntax elements, decoding the associated video data accordingly.
  • video encoder 20 may entropy encode the syntax elements into an encoded video bitstream. In such instances, video decoder 30 may parse such syntax elements and decode related video data accordingly.
  • VVC Versatile Video Coding
  • VCEG ITU-T Video Coding Experts Group
  • MPEG ISO/IEC Motion Picture Experts Group
  • HEVC High-Efficiency Video Coding
  • JCT-VC Joint Collaboration Team on Video Coding
  • FIG. 2 is a schematic block diagram of an example of a video encoder 20 for implementing the techniques of this application.
  • the video encoder 20 includes an input terminal (or input interface) 201 , a residual calculation unit 204 , a transform processing unit 206 , a quantization unit 208 , an inverse quantization unit 210 , an inverse transform processing unit 212 , and a reconstruction unit 214 , a loop filter 220 , a decoded picture buffer (DPB) 230 , a mode selection unit 260 , an entropy encoding unit 270 and an output terminal (or output interface) 272 .
  • DPB decoded picture buffer
  • Mode selection unit 260 may include inter prediction unit 244 , intra prediction unit 254 , and partition unit 262 .
  • Inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown).
  • the video encoder 20 shown in FIG. 2 may also be referred to as a hybrid video encoder or a hybrid video codec-based video encoder.
  • the residual calculation unit 204, the transform processing unit 206, the quantization unit 208 and the mode selection unit 260 constitute the forward signal path of the encoder 20, while the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop
  • the path filter 220, the decoded picture buffer (DPB) 230, the inter-frame prediction unit 244 and the intra-frame prediction unit 254 constitute the backward signal path of the encoder, wherein the backward signal path of the encoder 20 corresponds to the decoding signal path of the decoder (see decoder 30 in Figure 3).
  • Inverse quantization unit 210 inverse transform processing unit 212 , reconstruction unit 214 , loop filter 220 , decoded image buffer 230 , inter prediction unit 244 , and intra prediction unit 254 also make up the “built-in decoder” of video encoder 20 .
  • the quantization unit 208 is configured to quantize the transform coefficients 207 by, for example, scalar quantization or vector quantization, to obtain quantized transform coefficients 209 .
  • the quantized transform coefficients 209 may also be referred to as quantized residual coefficients 209 .
  • the quantization process may reduce the bit depth associated with some or all of the transform coefficients 207 .
  • n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m.
  • the degree of quantization can be modified by adjusting the quantization parameter (QP).
  • QP quantization parameter
  • QP quantization parameter
  • the quantization parameter may be an index into a predefined set of suitable quantization step sizes.
  • Quantization may include dividing by the quantization step size, and corresponding or inverse dequantization performed by the inverse quantization unit 210 or the like may include multiplying by the quantization step size.
  • Embodiments according to some standards such as HEVC may be used to use quantization parameters to determine the quantization step size.
  • the quantization step size can be calculated from the quantization parameter using a fixed-point approximation of an equation involving division.
  • the video encoder 20 may be used to output a quantization parameter (QP), eg, directly or after being encoded or compressed by the entropy encoding unit 270, eg, such that the video Decoder 30 may receive and decode using the quantization parameters.
  • QP quantization parameter
  • the inverse quantization unit 210 is used to perform inverse quantization of the quantization unit 208 on the quantized coefficients to obtain the dequantized coefficients 211, for example, perform inverse quantization with the quantization scheme performed by the quantization unit 208 according to or using the same quantization step size as the quantization unit 208 Program.
  • Dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211, corresponding to transform coefficients 207, but due to losses caused by quantization, inverse quantized coefficients 211 are usually not identical to transform coefficients.
  • the reconstruction unit 214 (eg, summer 214 ) is used to add the transform block 213 (ie, the reconstructed residual block 213 ) to the prediction block 265 to obtain the reconstructed block 215 in the pixel domain, eg, the The pixel value and the pixel value of the prediction block 265 are added.
  • the loop filter unit 220 (or “loop filter” 220 for short) is used to filter the reconstruction block 215 to obtain the filter block 221, or generally to filter the reconstructed pixels to obtain filtered pixel values.
  • loop filter units are used to smooth pixel transitions or improve video quality.
  • the loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as self- Adaptive loop filter (ALF), noise suppression filter (NSF), or any combination.
  • the loop filter unit 220 may include a deblocking filter, a SAO filter, and an ALF filter. The order of the filtering process can be deblocking filter, SAO filter and ALF filter.
  • LMCS luma mapping with chroma scaling
  • SBT sub-block transform
  • ISP intra sub-partition
  • loop filter unit 220 is shown in FIG. 2 as a loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter.
  • Filter block 221 may also be referred to as filter reconstruction block 221 .
  • video encoder 20 may be used to output loop filter parameters (eg, SAO filter parameters, ALF filter parameters, or LMCS parameters), eg, directly or by entropy
  • the encoding unit 270 performs entropy encoding and outputs, eg, so that the decoder 30 can receive and decode using the same or different loop filter parameters.
  • FIG. 3 illustrates an example video decoder 30 for implementing the techniques of this application.
  • the video decoder 30 is adapted to receive the encoded image data 21 (eg, the encoded bitstream 21 ) encoded by the encoder 20 , for example, to obtain a decoded image 331 .
  • the encoded image data or bitstream includes information for decoding the encoded image data, such as data representing image blocks of an encoded video slice (and/or encoded block groups or encoded blocks) and associated syntax elements.
  • decoder 30 includes entropy decoding unit 304, inverse quantization unit 310, inverse transform processing unit 312, reconstruction unit 314 (eg, summer 314), loop filter 320, decoded image buffer (DBP) ) 330 , a mode application unit 360 , an inter prediction unit 344 and an intra prediction unit 354 .
  • Inter prediction unit 344 may be or include a motion compensation unit.
  • video decoder 30 may perform a decoding process that is substantially the inverse of the encoding process described with reference to video encoder 100 of FIG. 2 .
  • the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the decoded image buffer DPB 230, the inter prediction unit 344 and the intra prediction unit 354 also constitute a video encoder 20 "built-in decoders".
  • the inverse quantization unit 310 may be functionally the same as the inverse quantization unit 110
  • the inverse transform processing unit 312 may be functionally the same as the inverse transform processing unit 122
  • the reconstruction unit 314 may be functionally the same as the reconstruction unit 214
  • the loop Filter 320 may be functionally identical to loop filter 220
  • decoded image buffer 330 may be functionally identical to decoded image buffer 230 .
  • the explanations of the corresponding units and functions of the video encoder 20 apply correspondingly to the corresponding units and functions of the video decoder 30 .
  • Inverse quantization unit 310 may be operable to receive quantization parameters (QPs) (or information related to inverse quantization in general) and quantization coefficients from encoded image data 21 (eg, parsed and/or decoded by entropy decoding unit 304), and based on The quantization parameters inverse quantize the decoded quantized coefficients 309 to obtain inverse quantized coefficients 311 , which may also be referred to as transform coefficients 311 .
  • the inverse quantization process may include using quantization parameters calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization, as well as the degree of inverse quantization that needs to be performed.
  • the reconstruction unit 314 (eg, summer 314) is used to add the reconstructed residual block 313 to the prediction block 365 to obtain the reconstructed block 315 in the pixel domain, for example, the pixel point values of the reconstructed residual block 313 and the prediction block 365 pixel values are added.
  • the loop filter unit 320 (in or after the encoding loop) is used to filter the reconstruction block 315 to obtain a filter block 321, so as to smoothly perform pixel transitions or improve video quality, etc.
  • the loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a self- Adaptive loop filter (ALF), noise suppression filter (NSF), or any combination.
  • the loop filter unit 220 may include a deblocking filter, a SAO filter, and an ALF filter. The order of the filtering process can be deblocking filter, SAO filter and ALF filter.
  • LMCS luma mapping with chroma scaling
  • SBT sub-block transform
  • ISP intra sub-partition
  • loop filter unit 320 is shown in FIG. 3 as a loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.
  • the decoder 30 is configured to output the decoded image 311 through the output terminal 312, etc., to display to the user or for the user to view.
  • embodiments of the coding system 10, encoder 20 and decoder 30, as well as other embodiments described herein, may also be used for still image processing or codecs, That is, the processing or coding of a single image in video codecs that is independent of any previous or consecutive images.
  • image processing is limited to a single image 17, inter prediction unit 244 (encoder) and inter prediction unit 344 (decoder) may not be available.
  • All other functions (also referred to as tools or techniques) of video encoder 20 and video decoder 30 are equally applicable to still image processing, such as residual calculation 204/304, transform 206, quantization 208, inverse quantization 210/310, (inverse ) transform 212/312, partition 262/362, intra prediction 254/354 and/or loop filtering 220/320, entropy encoding 270 and entropy decoding 304.
  • FIG. 4 is a schematic diagram of a video decoding apparatus 400 according to an embodiment of the present invention.
  • Video coding apparatus 400 is suitable for implementing the disclosed embodiments described herein.
  • the video coding apparatus 400 may be a decoder, such as the video decoder 30 in FIG. 1B, or an encoder, such as the video encoder 20 in FIG. 1B.
  • the video decoding apparatus 400 includes: an input port 410 (or input port 410) for receiving data and a receiver unit (receiver unit, Rx) 420; a processor, a logic unit or a central processing unit (central processing unit) for processing data , CPU) 430; for example, the processor 430 here can be a neural network processor 430; a transmitter unit (transmitter unit, Tx) 440 for transmitting data and an output port 450 (or output port 450); memory 460.
  • the video coding apparatus 400 may also include optical-to-electrical (OE) components and electrical-to-optical (EO) components coupled to the input port 410, the receiving unit 420, the transmitting unit 440, and the output port 450, Exit or entrance for optical or electrical signals.
  • OE optical-to-electrical
  • EO electrical-to-optical
  • the processor 430 is implemented by hardware and software.
  • Processor 430 may be implemented as one or more processor chips, cores (eg, multi-core processors), FPGAs, ASICs, and DSPs.
  • the processor 430 communicates with the ingress port 410 , the receiving unit 420 , the sending unit 440 , the egress port 450 and the memory 460 .
  • the processor 430 includes a decoding module 470 (eg, a neural network (NN) based decoding module 470).
  • the decoding module 470 implements the embodiments disclosed above. For example, the transcoding module 470 performs, processes, prepares or provides various encoding operations.
  • decoding module 470 is implemented as instructions stored in memory 460 and executed by processor 430 .
  • Memory 460 includes one or more magnetic disks, tape drives, and solid-state drives, and may serve as an overflow data storage device for storing programs when such programs are selected for execution, and for storing instructions and data read during program execution.
  • Memory 460 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), ternary content addressable memory (ternary) content-addressable memory, TCAM) and/or static random-access memory (SRAM).
  • FIG. 5 is a simplified block diagram of an apparatus 500 provided by an exemplary embodiment, and the apparatus 500 can be used as either or both of the source device 12 and the destination device 14 in FIG. 1B .
  • the processor 502 in the apparatus 500 may be a central processing unit.
  • the processor 502 may be any other type of device or devices, existing or to be developed in the future, capable of manipulating or processing information.
  • the disclosed implementations may be implemented using a single processor, such as processor 502 as shown, using more than one processor is faster and more efficient.
  • the memory 504 in the apparatus 500 may be a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may be used as memory 504 .
  • Memory 504 may include code and data 506 accessed by processor 502 via bus 512 .
  • the memory 504 may also include an operating system 508 and application programs 510 including at least one program that allows the processor 502 to perform the methods described herein.
  • applications 510 may include applications 1 through N, and also include video coding applications that perform the methods described herein.
  • Apparatus 500 may also include one or more output devices, such as display 518 .
  • display 518 may be a touch-sensitive display that combines a display with touch-sensitive elements that may be used to sense touch input.
  • Display 518 may be coupled to processor 502 through bus 512 .
  • bus 512 in device 500 is described herein as a single bus, bus 512 may include multiple buses.
  • secondary storage may be directly coupled to other components of the device 500 or accessed through a network, and may include a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Accordingly, the apparatus 500 may have various configurations.
  • the embodiment of the present invention relates to an AI image lossy encoding scheme oriented to color component rate allocation, which can be applied to the video decoding system, encoder and decoder described in FIG. 1A to FIG. 5 .
  • the method provided in this application is mainly used for the YUV component code rate allocation process, which is mainly controlled by the encoder.
  • a corresponding control unit can also be added to the decoder.
  • Video image signals usually include one luminance component and two chrominance components.
  • the luminance component is usually represented by the symbol Y
  • the chrominance component is usually represented by the symbols U and V.
  • commonly used YUV formats include the following formats.
  • the crosses represent the luminance component sampling points
  • the circles represent each chrominance component sampling point:
  • 4:4:4 format Indicates that the chroma components are not downsampled
  • 4:2:2 format Indicates that the chrominance component is downsampled 2:1 horizontally with respect to the luminance component, and there is no vertical downsampling. For every two U samples or V samples, each scan line contains four Y samples;
  • 4:2:0 format Indicates that the chrominance component is subjected to 2:1 horizontal downsampling and 2:1 vertical downsampling relative to the luminance component.
  • the chrominance component of the image block is an image block of N ⁇ N size.
  • the embodiment of the present invention will take the 4:2:0 format as an example to explain the solution technology of the present invention. But it can be understood that the technical solution of the present invention can be used for other YUV formats, or mutual prediction between different components in other video image formats, such as RGB format, in addition to the YUV 4:2:0 format.
  • the current block may be a square block, or a non-square rectangular block or other shaped area, and the technical solutions provided by the embodiments of the present invention are also applicable.
  • the embodiment of the present invention adopts the expression of the first signal component and the second signal component.
  • the first signal component may be a chrominance component
  • the second signal component may be a luminance component
  • the first signal component may be a luminance component.
  • the signal component may be any one of the three signal components of R, G, and B
  • the second signal component may be one of the three signal components of R, G, and B that is different from the first signal component; If the image signal is decomposed into a plurality of signal components in a similar manner, the first signal component and the second signal component can be defined by a similar method.
  • a quality factor may be input to a rate control module (or called a rate allocation control module), which generates a control signal of each component feature map; the control vector of the control signal of each component corresponds to Multiply the feature maps of , to obtain the feature value to be encoded after quantization.
  • a rate control module or called a rate allocation control module
  • the AI image decoding system 700 for YUV bit rate allocation can be used for a video image encoder and decoder.
  • the image decoding system 700 includes a first signal component processing module (such as a Y component processing module), a second signal component processing module (such as a UV component processing module), a rate allocation control module, and an entropy coding module at the encoding end. module; and a first signal component processing module (Y component processing module 2 ), a second signal component processing module (eg, UV component processing module 2 ), and an entropy decoding module at the decoding end.
  • the image decoding system 700 optionally includes a joint processing module, a joint processing module 2, and a quality response module (also referred to as a rate allocation control module 2 or a rate control module 2).
  • the quality factor of the Y component and the quality factor of the UV component are input to the rate allocation control module, and the output control signal of this module acts on the Y component feature map of the Y component processing module and the UV component feature map of the UV component processing module respectively ( It can be called the first feature map), and outputs the second feature map of each signal component, so as to realize the code rate allocation of Y and UV.
  • the code stream of the video signal is obtained.
  • the feature maps output by the Y component processing module and the UV component processing module are directly cascaded together, or the feature maps output by the Y component processing module and the UV component processing module are directly added to form the final output feature map of the encoder.
  • the feature maps are entropy encoded.
  • the feature map output by the Y component processing module and the UV component processing module is input to the joint processing module to obtain the feature map finally output by the encoder, and entropy coding is performed on the feature map.
  • FIG. 7B is an embodiment of an encoding method.
  • Step 701 Obtain a control signal of the first signal component according to the quality factor of the first signal component.
  • Step 702 Obtain a control signal of the second signal component according to the quality factor of the second signal component.
  • the embodiment shown in FIG. 7B may obtain the control signal of the first signal component from N candidate first control signals according to the quality factor of the first signal component, where N is an integer greater than 1; and A control signal of the second signal component is obtained from M candidate second control signals according to the quality factor of the second signal component, where M is an integer greater than 1.
  • N and M may be equal or unequal, which is not limited in this application.
  • Step 703 applying the control signal of the first signal component to the first characteristic map of the first signal component to obtain a second characteristic map of the first signal component.
  • Step 704 applying the control signal of the second signal component to the first characteristic map of the second signal component to obtain a second characteristic map of the second signal component.
  • control signal is generated by network learning, and acts on the feature map of at least one layer of the network in each module in the Y component processing module and the UV component processing module (may be called the first feature map), and then Output the second feature map. For example, acting on the output of the last layer of the network.
  • Step 705 Obtain a code stream of the video signal according to the second feature map of the first signal component and the second feature map of the second signal component.
  • obtaining the code stream of the video signal includes:
  • Entropy coding is performed on the second feature map of the first signal component and the second feature map of the second signal component to obtain a code stream of the video signal;
  • Entropy coding is performed on the second feature map of the first signal component and the second feature map of the second signal component processed by the neural network to obtain a code stream of the video signal;
  • Entropy coding is performed on the second feature map of the first signal component and the second feature map of the second signal component processed by the neural network to obtain a code stream of the video signal;
  • Entropy coding is performed on the second feature map of the first signal component processed by the neural network and the second feature map of the second signal component processed by the neural network to obtain a code stream of the video signal.
  • obtaining the code stream of the video signal according to the second feature map of the first signal component and the second feature map of the second signal component includes:
  • Joint processing is performed on the second feature map of the first signal component processed by the neural network and the second feature map of the second signal component processed by the neural network to obtain a combined feature map, and the combined feature map is obtained.
  • the feature map is entropy encoded to obtain the code stream of the video signal.
  • the rate allocation control module learns to generate N candidate first control signals of the first signal component (such as the control vector matrix ⁇ q yi , q y2 , ... q yi . . . q yN ⁇ ), M candidate second control signals of the second signal component (such as a control vector matrix ⁇ q uv1 , q uv2 , ... q uvj , ...
  • the control vector realizes the control of different component feature maps by channel-by-channel multiplication with the corresponding feature maps.
  • the code stream of the video signal sent to the decoding end includes the index i of the quality factor of the Y component and the index j of the quality factor of the UV component.
  • control signal includes the control vector and offset vector of the Y and UV feature maps: as in the aforementioned method, the control vector q yi and the offset vector b yi of the first signal component are obtained according to the index i of the quality factor of the Y component ; Obtain the control vector q uvj and the offset vector b uvi of the second signal component according to the index j of the quality factor of the UV component. Then, the control vector is multiplied with the corresponding feature map channel by channel, and then the corresponding offset vector is added to realize the control of different component feature maps.
  • control signal of the Y component and the control signal of the UV component are used as a two-tuple, and the rate allocation control module learns to generate N candidate control signals of the video signal (such as the control vector matrix ⁇ q c1 , q c2 ) , ... q ci ... q cN ⁇ ), at this time, c is 2, and each control vector q ci includes both the control vector of the first signal component and the control component of the second signal component.
  • the control signal q ci containing the first signal component and the second signal component is then obtained by the index i of the quality factor of the video signal.
  • each offset vector of the video signal includes both the offset component of the first signal component and the offset component of the second signal component.
  • the code stream of the video signal sent to the decoding end includes the index i of the quality factor of the video signal.
  • the Y component quality factor and the UV component quality factor are used as the input of the fully connected network, and the control vector q yi and the control vector q uvj are output, and the control vector and the corresponding feature map are multiplied channel by channel to achieve different Control of component feature maps.
  • the Y component quality factor and UV component quality factor are used as the input of the fully connected network, and the offset vector b yi and the offset vector b uvj can also be output, and then the control vector and the corresponding feature map are multiplied channel by channel, and the offset vector corresponds to the corresponding
  • the feature maps are added channel by channel to realize the control of different component feature maps.
  • the code stream of the video signal sent to the decoding end includes the quality factor of the Y component and the quality factor of the UV component.
  • the decoding end performs entropy decoding on the received code stream to obtain a feature map, and the feature map is decomposed into a Y component feature map and a UV component feature map.
  • the feature map obtained by entropy decoding is first input to the joint processing sub-module 2 to obtain the Y component feature map and the UV component feature map.
  • the Y component feature map and the UV component feature map are respectively input to the Y component processing module 2, and the UV component processing module 2 outputs the Y component reconstruction map and the UV component reconstruction map.
  • the quality factor of the Y component and the quality factor of the UV component are input to the quality response module, and the output response signal of this module acts on the Y component feature map of the Y component processing module and the UV component feature map of the UV component processing module respectively, so as to realize Y, Adaptive quality response for UV components.
  • the quality response can also be called quality control, just to distinguish it from the quality control at the encoding end, which is called the quality response at the decoding end.
  • the decoding end obtains the code stream of the video signal from the encoding end, and performs entropy decoding on the code stream to obtain the first signal component (such as the Y component) of the video signal.
  • the first signal component such as the Y component
  • the second signal component eg, UV component
  • the decoding end also obtains the quality factor information of the first signal component and the quality factor information of the second signal component from the code stream, wherein the quality factor information of the first signal component is the quality factor information of the first signal component.
  • the quality factor or the index of the quality factor of the first signal component, and the quality factor information of the second signal component is the quality factor of the second signal component or the index of the quality factor of the second signal component.
  • the quality factor of the first signal component takes one of N values; when the quality factor of the first signal component is When the factor information is the index of the quality factor of the first signal component, the value range of the index of the quality factor of the first signal component is 0 to N ⁇ 1 or 1 to N, where N is an integer greater than 1.
  • the quality factor information of the second signal component is the quality factor of the second signal component
  • the quality factor of the second signal component is one of M; When the quality factor information of the component is the index of the quality factor of the second signal component, the value range of the index of the quality factor of the second signal component is 0 to M-1 or 1 to M, where M is greater than 1 the integer.
  • the decoder When the joint feature map is passed from the encoder, the decoder also needs to perform entropy decoding on the joint feature map, and obtain the feature map of the first signal component and the second signal component through neural network processing. feature map.
  • Step 712 Obtain a response signal of the first signal component according to the quality factor information of the first signal component.
  • Step 713 Obtain a response signal of the second signal component through the quality factor information of the second signal component.
  • the decoding end needs to generate the first signal component by learning
  • the response signal matrix ⁇ g of the first signal component is obtained by taking the reciprocal of the control signal matrix ⁇ q y1 , q y2 , ...
  • q uvN ⁇ of the first signal component at the encoder end is reciprocal to obtain the response signal of the second signal component Matrix ⁇ g uv1 , g uv2 ,...g uvj ,...g uvM ⁇ , wherein N and M are integers greater than 1; the response signal g yi of the first signal component is obtained according to the index i of the quality factor of the Y component ; Obtain the response signal g uvj of the second signal component according to the index j of the quality factor of the UV component.
  • the decoding end needs to generate the response signal matrix ⁇ g c1 , g c2 , ...g of the video signal by learning ci ...g cN ⁇ , where c is 2 representing the Y component and the UV component, and N is an integer greater than 1; optionally, through the control signal matrix ⁇ q c1 , q c2 , ... q ci ... q of the video signal at the encoding end Take the reciprocal of cN ⁇ to obtain the response signal matrix ⁇ g c1 , g c2 , ... g ci ...
  • g cN ⁇ of the video signal where c is 2 representing the Y component and the UV component, and N is an integer greater than 1; according to the The index i of the quality factor of the video signal results in a response signal gci comprising the first signal component and the second signal component.
  • the decoding end uses the quality factor of the Y component as a fully connected network input, output the response signal of the Y component; take the quality factor of the UV component as the input of the fully connected network, output the response signal of the UV component.
  • Step 714 Obtain a reconstruction map of the first signal component according to the response signal of the first signal component and the feature map of the first signal component.
  • Step 715 Obtain a reconstruction map of the second signal component according to the response signal of the second signal component and the feature map of the second signal component.
  • obtaining the reconstructed map of the first signal component according to the response signal of the first signal component and the feature map of the first signal component includes:
  • obtaining the reconstruction map of the second signal component includes:
  • obtaining the reconstructed map of the first signal component according to the response signal of the first signal component and the feature map of the first signal component includes:
  • obtaining the reconstruction map of the second signal component includes:
  • Step 716 reconstructs the video signal according to the reconstructed map of the first signal component and the reconstructed map of the second signal component.
  • the AI image decoding system 710 for YUV bit rate allocation can be used for a video image encoder and decoder.
  • the image decoding system 710 includes a first signal component processing module (such as a Y component processing module), a second signal component processing module (such as a U component processing module), and a third signal component processing module (such as a U component processing module) at the encoding end.
  • a first signal component processing module such as a Y component processing module
  • a second signal component processing module such as a U component processing module
  • a third signal component processing module such as a U component processing module
  • the image decoding system 710 optionally includes a joint processing module, a joint processing module 2, and a quality response module (also referred to as a rate allocation control module 2 or a rate control module 2).
  • the quality factor of the Y component, the quality factor of the U component and the quality factor of the V component are input to the rate allocation control module, and the output control signal of this module acts on the Y component feature map of the Y component processing module and the U component processing module respectively.
  • the U component feature map and the V component feature map (may be referred to as the first feature map) of the V component processing module output the second feature map of each signal component, thereby realizing the code rate allocation of Y, U, and V. Then, according to the second characteristic map of each signal component, the code stream of the video signal is obtained.
  • the feature maps output by the Y-component processing module, the U-component processing module, and the V-component processing module are directly cascaded together, or the feature maps output by the Y-component processing module, the U-component processing module, and the V-component processing module are directly added together to form
  • the final output feature map of the encoder is entropy encoded.
  • the feature maps output by the Y component processing module, the U component processing module and the V component processing module are input to the joint processing module to obtain a feature map finally output by the encoder, and entropy coding is performed on the feature map.
  • the control signal of the first signal component may be obtained from N candidate first control signals according to the quality factor of the first signal component (Y component); according to the second signal component ( The quality factor of the U component) is obtained from the M candidate second control signals, and the control signal of the second signal component is obtained; according to the quality factor of the third signal component (V component), the L candidate third control signals are obtained. , the control signal of the third signal component is obtained.
  • N, M and L are integers greater than 1, which may be equal or unequal, which are not limited in this application.
  • the control signal is generated by network learning, and acts on the feature map of at least one layer of the network in each module of the Y component processing module, the U component processing module, and the V component processing module (which may be referred to as the first layer of the network). feature map), and then output the second feature map. For example, acting on the output of the last layer of the network. Since the control signal can act on the first feature map of any layer of the network in the Y component processing module, the U component processing module, and the V component processing module, after outputting the second feature map, you can continue to the second feature map. Process the neural network.
  • obtaining the code stream of the video signal includes:
  • Entropy coding is performed on the second feature map of the first signal component, the second feature map of the second signal component, and the second feature map of the third signal component to obtain a code stream of the video signal ;or,
  • Entropy coding is performed on the second feature map of the first signal component, the second feature map of the second signal component processed by the neural network, and the second feature map of the third signal component to obtain the the code stream of the video signal; or,
  • Entropy is performed on the second feature map of the first signal component, the second feature map of the second signal component processed by the neural network, and the second feature map of the third signal component processed by the neural network encoding to obtain the code stream of the video signal; or,
  • Entropy coding is performed on the second feature map of the first signal component, the second feature map of the second signal component, and the second feature map of the third signal component processed by the neural network to obtain the the code stream of the video signal; or,
  • Entropy coding is performed on the second feature map of the first signal component, the second feature map of the second signal component, and the second feature map of the third signal component processed by the neural network to obtain the the code stream of the video signal; or,
  • Entropy is performed on the second feature map of the first signal component processed by the neural network, the second feature map of the second signal component processed by the neural network, and the second feature map of the third signal component encoding to obtain a code stream of the video signal;
  • a second feature map of the first signal component processed by the neural network, a second feature map of the second signal component processed by the neural network, and a second feature map of the third signal component processed by the neural network feature map, entropy coding is performed to obtain the code stream of the video signal;
  • Entropy is performed on the second feature map of the first signal component processed by the neural network, the second feature map of the second signal component, and the second feature map of the third signal component processed by the neural network encoding to obtain the code stream of the video signal.
  • joint processing can also be performed on the second feature map obtained above or the processed feature maps and combinations to obtain a joint feature map, and the combined feature map can be processed on the combined feature map. Entropy coding to obtain the code stream of the video signal.
  • the rate allocation control module learns to generate N candidate first control signals of the first signal component (such as the control vector matrix ⁇ q y1 , q y2 , ... q ) yi ... q yN ⁇ ), M candidate second control signals of the second signal component (eg, control vector matrices ⁇ q u1 , q u2 ,... L candidate third control signals (such as control vector matrices ⁇ q v1 , q v2 , ..., q vk ...
  • the control of the first signal component is obtained according to the index i of the quality factor of the Y component signal q yi ; obtain the control signal q uj of the second signal component according to the index j of the quality factor of the U component; and obtain the control signal q vk of the third signal component according to the index k of the quality factor of the V component .
  • the control vector realizes the control of different component feature maps by channel-by-channel multiplication with the corresponding feature maps.
  • the code stream of the video signal sent to the decoding end includes the index i of the quality factor of the Y component, the index j of the quality factor of the U component, and the index k of the quality factor of the V component.
  • control signal includes the control vector and the offset vector of the Y, U, and V feature maps: as in the aforementioned method, the control vector q yi , an offset vector b yi ; obtain a control vector q uj and an offset vector b ui of the second signal component according to the index j of the quality factor of the U component; and obtain the first signal component according to the index k of the quality factor of the V component
  • the control vector q vk and the offset vector b vi of the three signal components. Then, the control vector is multiplied with the corresponding feature map channel by channel, and then the corresponding offset vector is added to realize the control of different component feature maps.
  • control signal of the Y component, the control signal of the U component, and the control signal of the V component are used as triples, and the rate allocation control module learns to generate N candidate control signals of the video signal (such as the control vector matrix ⁇ q c1 , q c2 , ... q ci ... q cN ⁇ ), at this time, c is 3, and each control vector q ci includes not only the control component of the first signal component, but also the control vector of the second signal component, and also the control vector of the first signal component. Control vector for three signal components.
  • the control signal q ci containing the first signal component, the second signal component and the third signal component is then obtained by the index i of the quality factor of the video signal.
  • each offset vector of the video signal includes both the offset component of the first signal component, the offset component of the second signal component, and the offset component of the third signal component.
  • the code stream of the video signal sent to the decoding end includes the index i of the quality factor of the video signal
  • the Y component quality factor, the U component quality factor, and the V component quality factor are used as the input of the fully connected network, and the control vector q yi , the control vector q uj , and the control vector q vk are output, and the control vector is combined with the control vector q vk .
  • the corresponding feature maps are multiplied channel by channel to realize the control of different component feature maps.
  • the Y component quality factor, U component quality factor, and V component quality factor are used as the input of the fully connected network, and the offset vector by yi , the offset vector b uj , and the offset vector b vk can also be output, and then the control vector and the corresponding The feature maps are multiplied channel by channel, and the offset vector and the corresponding feature maps are added channel by channel to realize the control of different component feature maps.
  • the code stream of the video signal sent to the decoding end includes the quality factor of the Y component, the quality factor of the U component, and the quality factor of the V component.
  • the decoding end performs entropy decoding on the received code stream to obtain a feature map, and the feature map is decomposed into a Y component feature map, a U component feature map, and a V component feature map.
  • the feature map obtained by entropy decoding is first input to the joint processing sub-module 2 to obtain a Y component feature map, a U component feature map, and a V component feature map.
  • the Y component feature map, the U component feature map, and the V component feature map are respectively input to the Y component processing module 2, the U component feature map 2, and the V component processing module 2 outputs the Y component reconstruction map, the U component reconstruction map, and the V component reconstruction map.
  • the quality factor of the Y component, the quality factor of the U component, and the quality factor of the V component are input to the quality response module, and the output control signal of this module acts on the Y component feature map of the Y component processing module and the U component feature map of the U component processing module respectively.
  • V component feature map of the V component processing module so as to realize the adaptive quality response of Y, U, V components.
  • the generation method of the response signal is similar to that of the control signal, except that it is easy to distinguish.
  • the encoding end is called the control signal, and the decoding end is called the response signal.
  • the decoding end obtains the code stream of the video signal from the encoding end; entropy decoding is performed on the code stream to obtain the feature map of the first signal component (such as the Y component) of the video signal, the first signal component of the video signal
  • the characteristic map of two signal components such as U component
  • the characteristic map of the second signal component such as V component
  • obtain the reconstruction map of the first signal component wherein the response signal of the first signal component is obtained by learning
  • the response signal of the second signal component and the feature map of the second signal component obtain the The reconstruction map of the second signal component, wherein the response signal of the second signal component is obtained by learning
  • the third signal is obtained according to the response signal of the third signal component and the feature map of the third signal component a reconstructed map of the components, wherein the response signal of the third signal component is obtained by learning, and is based on the reconstructed map of the first signal component
  • the decoding end further obtains the quality factor information of the first signal component, the quality factor information of the second signal component and the quality factor information of the third signal component from the code stream.
  • the quality factor information of the first signal component and the second signal component is similar to the embodiment of FIG. 7A .
  • the quality factor information of the third signal component may also be the quality factor of the third signal component or an index of the quality factor of the third signal component. Then, a response signal of the third signal component is obtained through the quality factor information of the third signal component.
  • the quality factor of the third signal component takes one of L values; when the quality factor of the third signal component is When the factor information is the index of the quality factor of the third signal component, the value range of the index of the quality factor of the first signal component is 0 to L-1 or 1 to L, where L is an integer greater than 1.
  • L, M, and N may be equal or unequal, which is not limited in this application.
  • the decoder When the joint feature map is transmitted from the encoder, the decoder also needs to perform entropy decoding on the joint feature map, and obtain the feature map of the first signal component and the second signal component through neural network processing.
  • the characteristic map of , and the characteristic map of the third signal component are the characteristic map of the third signal component.
  • the decoding end It is necessary to generate the response signal matrix ⁇ g y1 , g y2 , ... g yi ... g yN ⁇ of the first signal component and the response signal matrix ⁇ g u1 , g u2 , ... g uj ... of the second signal component through learning , g uM ⁇ , the response signal matrix of the third signal component ⁇ g v1 , g v2 , ..., g vk ...
  • the control signal matrix ⁇ q y1 , q y2 , ... q yi ... q yN ⁇ of the first signal component of the terminal takes the reciprocal to obtain the response signal matrix ⁇ g y1 , g y2 , ... g yi ... g yN ⁇ of the first signal component , the control signal matrix ⁇ q u1 , q u2 , ... q u , ...
  • q uM ⁇ of the second signal component at the encoding end is taken by taking the reciprocal to obtain the response signal matrix ⁇ gu1 , gu2 , ...g uj , ... of the second signal component g uM ⁇ , and the control signal matrix ⁇ q v1 , q v2 , ... , g vk ...
  • the response signal g yi of the first signal component is obtained according to the index i of the quality factor of the Y component; according to the quality factor of the U component
  • the index j of obtains the response signal guj of the second signal component; and obtains the response signal g vk of the third signal component according to the index k of the quality factor of the V component.
  • the decoding end needs to generate the response signal matrix ⁇ g c1 , g c2 , ...g of the video signal by learning ci ...g cN ⁇ , where c is 3 representing the Y component, U component and V component, and N is an integer greater than 1; optionally, through the control signal matrix ⁇ q c1 , q c2 , ... q of the video signal at the encoding end Take the reciprocal of ci ... q cN ⁇ to obtain the response signal matrix ⁇ g c1 , g c2 , ... g ci ...
  • g cN ⁇ of the video signal, where c is 3 and represents the Y component, the U component and the V component, and N is greater than 1
  • the integer of ; obtains the response signal g ci including the first signal component, the second signal component and the third signal component according to the index i of the quality factor of the video signal.
  • the decoding end The quality factor of the Y component is used as the input of the fully connected network, and the response signal of the Y component is output; the quality factor of the U component is used as the input of the fully connected network, and the response signal of the U component is output; The quality factor of the V component is used as the input of the fully connected network, and the response signal of the V component is output.
  • Reconstruction diagrams obtained for the first signal component and the second signal component are similar to FIG. 7A , which will not be repeated here.
  • obtaining the reconstructed map of the third signal component according to the response signal of the third signal component and the feature map of the third signal component includes:
  • the third signal component is obtained according to the response signal of the third signal component and a feature map of the third signal component
  • the reconstructed graph includes:
  • the UV components are combined and combined, and the three components of Y, U, and V are processed separately in 7D, or combined and combined, for example, two components of Y, UV, or other combinations of Y, U, and V can be combined. .
  • FIG. 8A and 8B show a specific example.
  • the overall block diagram of the technical solution of this embodiment is shown in FIG. 8A , wherein the quality factor of the Y component and the quality factor of the UV component are input to the rate allocation control module, and the control vector q yi and the control vector q uvi output by the module are respectively applied to the processing of the Y component
  • the Y component feature map output by the module and the UV component feature map output by the UV component processing module so as to realize the code rate allocation of the YUV component.
  • the decoding end inputs the quality response module through the quality factor of the Y component and the quality factor of the UV component, and the module outputs the control vector g yi and the control vector g uvi to act on the feature map of the Y component and the feature map of the UV component respectively, so as to realize the respective quality gains of the YUV components corresponding.
  • This embodiment does not constrain the specific network structures of the Y component processing module, the UV component processing module, the joint processing module, the probability estimation module, the Y component processing module 2, the UV component processing module 2, and the joint processing module 2, in order to facilitate the understanding of FIG. 8B Give a concrete example.
  • the first step is to obtain the feature maps of the Y and UV components:
  • the Y and UV components are input into the Y component processing module and the UV component processing module respectively, and the network outputs the feature maps of the Y and UV components.
  • the Y-component processing module includes two convolutional layers and two non-linear layers.
  • the horizontal and vertical downsampling factors in the two convolutional layers are both 2, and the Y-component processing module outputs a Y-component feature map.
  • the UV component processing module includes two convolution layers and two non-linear layers.
  • the first convolution layer has a horizontal and vertical downsampling factor of 1, that is, no downsampling operation is performed.
  • the downsampling factor of the second convolutional layer in the UV component processing module is 2 in the horizontal and vertical directions.
  • the UV component processing module outputs the UV component feature map.
  • the Y component feature map and the UV component feature map have the same width and height.
  • the quality factor of the Y component and the quality factor of the UV component are input to the rate allocation module to obtain the control vector q yi and the control vector q uvi , and the control vector q yi and the control vector q uvi are multiplied by the feature maps of the Y and UV components channel by channel
  • the feature maps of the processed Y and UV components are obtained, the processed feature maps of the Y and UV components are added or cascaded together, input to the joint processing module, and the feature map to be encoded is output.
  • the code rate allocation module is composed of control matrices Q y and Qu uv , and uses the Y component quality factor and the UV component quality factor as the index values of the control matrices Q y and Qu uv , and obtains the control vector q yi , control vector q yi , control vector q uvi .
  • the control matrices Q y and Q uv are obtained through network learning.
  • the Y component quality factor and the UV component quality factor are set arbitrary values.
  • the control matrix Q y is a two-dimensional matrix of size K ⁇ N
  • the control matrix Q uv is a two-dimensional matrix of size L ⁇ M
  • each element in the two matrices is a parameter that can be learned by the network.
  • K represents the number of Y component feature maps
  • L represents the number of UV component feature maps
  • N represents N groups of Y component quality factor candidate values
  • M represents M groups of UV component quality factor candidate values.
  • the optional value of the Y component quality factor is ⁇ 0.5, 0.7, 0.8, 1.0 ⁇
  • the optional value of the UV component quality factor is ⁇ 0.15, 0.2, 0.25, 0.3 ⁇ .
  • the feature map to be encoded is input into the entropy encoding module, and the code stream is output.
  • the feature map to be encoded is input into the encoded feature map Hyper Entropy module, and the probability distribution of the symbols to be encoded is output. Arithmetic coding is performed based on the probability distribution of the symbols to be coded, and a code stream is output. At the same time, the Y component quality factor and UV component quality factor information are written into the code stream.
  • the Y component quality factor and UV component quality factor information can be expressed in the following three ways to write into the code stream:
  • Scheme 1 Predefine the number of candidate values for the quality factor of the Y component and the quality factor of the UV component and the candidate values, and transmit the index numbers of the quality factor of the Y component and the quality factor of the UV component in the respective candidate lists to the decoding end.
  • N 4 and M as 3 as an example
  • the optional value of the quality factor of the Y component is ⁇ 0.5, 0.7, 0.8, 1.0 ⁇
  • the optional value of the quality factor of the UV component is ⁇ 0.15, 0.2, 0.25 ⁇ .
  • UV component index j is written into the code stream, i, j is 0, 1, 2, 3, when i is 1, it means that the quality factor of the Y component is 0.7, and when j is 0, it means the quality of the UV component The factor is 0.15.
  • Option 2 Predefine the number of candidate values and candidate values after the combination of the quality factor of the Y component and the UV component.
  • the number of candidates for the combined value of the quality factor of the Y and UV components is 6, and the candidate list is ⁇ (0.5, 0.25), (0.7 ,0.15), (0.7, 0.25), (0.8, 0.1), (0.8, 0.2), (1.0, 0.2) ⁇
  • write the index number i into the code stream the value of i is 0, 1, 2, 3, 4,5, when i is 1, it means that the quality factor of Y, U, V components is (0.7, 0.15).
  • Option 3 Write the Y component quality factor and the UV component quality factor directly into the code stream and transmit it to the decoding end, for example, write (1.0, 0.2) into the code stream.
  • the code stream is input to the entropy decoding module, and arithmetic decoding is performed to obtain the feature map, the quality factor of the Y component, and the quality factor of the UV component.
  • arithmetic decoding is performed based on the probability distribution estimated by the Hyper Entropy module.
  • the fifth step decode to obtain the feature map input joint processing module 2, output the feature map with the number of channels M, and split the feature map with the number of channels M into the Y component feature map with the number of channels K and the UV with the number of channels L Component feature map.
  • the splitting scheme ensures that K ⁇ M and L ⁇ M are sufficient.
  • the joint processing module 2 includes two layers of convolution layers and one layer of nonlinear layers.
  • the Y component quality factor and the UV component quality factor are input to the quality response module to obtain the response vector g yi , the response vector g uvi , the response vector g yi , the response vector g uvi and the feature maps of the Y and UV components are multiplied channel by channel to obtain Feature maps of Y and UV components after mass gain.
  • the feature maps of the Y and UV components after the quality gain are respectively input into the Y component processing module 2 and the UV component processing module 2, and output the Y component reconstruction map and the UV component reconstruction map.
  • the quality response module is composed of response matrices G y and G uv .
  • the decoded Y component quality factor and UV component quality factor are used as the index values of the response matrices G y and G uv , and the response vector g yi is obtained from the G y and G uv indices. , the response vector g uvi .
  • the response matrices G y and G uv are learned by the network.
  • the response matrix G y is a two-dimensional matrix of size K ⁇ N
  • the response matrix G uv is a two-dimensional matrix of size L ⁇ M
  • each element in the two matrices is a parameter that can be learned by the network.
  • K represents the number of Y component feature maps
  • L represents the number of UV component feature maps
  • N represents N groups of Y component quality factor candidate values
  • M represents M groups of UV component quality factor candidate values.
  • control matrices Q y and Qu uv are respectively obtained to obtain the response matrices G y and Gu uv .
  • the network modules and control matrix parameters involved in the above steps 1 to 7 are trained and learned.
  • the present application uses the adaptive moment estimation (Adaptive Moment Estimation, Adam) optimization algorithm to optimize the training of the neural network, and the ImgeNet dataset is the training dataset. Since this network structure is oriented towards image coding, the training optimization objective is to minimize the rate-distortion joint loss function, and its function expression is:
  • p(y) represents the probability distribution estimated by the probability estimator
  • x y is the original value of the Y component
  • x y ' is the reconstructed image of the Y component
  • x u is the original value of the U component
  • x u ' is the reconstructed image of the U component
  • x v is the original value of the V component
  • x v ′ is the reconstructed image of the V component
  • w y is the quality factor of the Y component
  • w u is the quality factor of the U component
  • w v is the quality factor of the V component.
  • is a constant, matching the target code rate.
  • the optional values of Y, U, V component quality factors (w y , w u , w v ) are ⁇ (0.5, 0.25, 0.25), (0.7, 0.4, 0.4), (0.8, 0.1 , 0.1), (1.0, 0.2, 0.2) ⁇ .
  • the weight group index i is randomly selected from ⁇ 0, 1, 2, 3 ⁇ , and the weight value group (w yi , w ui , w vi ) and the control vector q yi and the control vector q to be learned are determined according to i uvi , train and learn the network module and control matrix parameters according to the optimization objective.
  • This embodiment provides a technical solution in the case where Y, U, and V components are combined into two components, Y and UV.
  • Y, U, and V such as ⁇ YU, V ⁇ , ⁇ YV, U ⁇ , this application solution still applies.
  • UV components are further divided into U components and V components to be processed separately, and the solution idea of the present application is still applicable.
  • the present application derives the control vector obtained by network learning based on the weight values of the Y, U, and V components, and performs different degrees of distortion control on the feature maps of the Y and UV components according to the control vector, so as to realize the difference between the Y and UV components. Bit rate allocation. Therefore, the present application has the following advantages:
  • this embodiment adopts a U component processing module and a V component processing module to process U component and V component data respectively.
  • the quality factor of the Y component, the quality factor of the U component, and the quality factor of the V component are used as the input of the rate allocation control module, and the output control signal is used for any layer in the Y component processing module, the U component processing module, and the V component processing module.
  • the feature map is processed to realize the code rate allocation among the Y, U, and V components.
  • the quality factor of the Y component, the quality factor of the U component, and the quality factor of the V component are used as the input of the quality response module, and the output control signal is used to perform any layer feature map in the Y component processing module, the U component processing module, and the V component processing module.
  • Mass Gain Response This application does not allocate the code rate to the control module, quality response module, Y component processing module, U component processing module, joint processing module, Y component processing module 2, U component processing module 2, V component processing module 2, joint processing module 2,
  • the specific network structures of the entropy encoding module and the entropy decoding module are constrained, and a specific example is given to facilitate understanding of FIG. 9A .
  • the Y component quality factor, U component quality factor, and V component quality factor are input to the rate allocation control module, which is composed of a fully connected network, and the module outputs control signals: control vector, offset vector .
  • the Y, U, and V signals to be encoded are respectively input to the Y component processing module, the U component processing module, and the V component processing module.
  • the feature map output by each convolutional layer in this module is multiplied by its corresponding control vector channel by channel, and then its corresponding offset vector is added channel by channel.
  • the feature map output by the nonlinear layer in the module is multiplied channel-by-channel with its corresponding control vector.
  • the output of each layer of network in the Y component processing module is processed by the control signal of the rate control module.
  • the U and V components are treated similarly to the Y component.
  • FIG. 9B shows a schematic diagram of a specific network structure.
  • the network structures of the Y component processing module, the U component processing module, and the V component processing module are the same except for the first convolution layer.
  • the horizontal and vertical downsampling factor of the first convolutional layer in the Y component processing module is 2.
  • the horizontal downsampling factor of the first layer convolutional layer is 1, that is, no downsampling operation is performed
  • the vertical downsampling factor is 2.
  • the horizontal and vertical downsampling factor of the first convolutional layer in the Y component processing module is 2.
  • the horizontal and vertical downsampling factors of the first convolutional layer in the U component processing module and the V component processing module are 1, that is, no downsampling operation is performed.
  • the Y component feature map, the U component feature map, and the V component feature map are concatenated and spliced together to form the feature map to be encoded, input the entropy encoding module, and output the code stream.
  • the feature map to be encoded is input into the encoded feature map Hyper Entropy module, and the probability distribution of the symbols to be encoded is output. Arithmetic coding is performed based on the probability distribution of the symbols to be coded, and a code stream is output.
  • the Y component quality factor, U component quality factor, and U component quality factor information are written into the code stream.
  • the code stream is input to the entropy decoding module, and arithmetic decoding is performed to obtain the feature map and the information of the quality factor of the Y component, the quality factor of the U component, and the quality factor of the V component.
  • arithmetic decoding is performed based on the probability distribution estimated by the Hyper Entropy module.
  • the sixth step is to input the decoded feature map into the joint processing module 2, and output the feature map.
  • the Y component quality factor, the U component quality factor, and the V component quality factor are input to the quality response module to obtain the response vector g yi , the response vector g ui , and the response vector g vi .
  • the response vector g yi and the feature map output by the second convolution layer in the Y component processing module 2 are multiplied channel by channel to obtain the feature map after quality gain, and the U and V components are processed similarly.
  • the Y component processing module 2 , the U component processing module 2 , and the V component processing module 2 output the reconstructed images of the Y, U, and V components.
  • the Y component quality factor, U component quality factor, and V component quality factor are input to the quality response module, which is composed of a fully connected network, and the module outputs response vector g yi , response vector g ui , and response vector g vi .
  • the quality response module is composed of response matrices G y , Gu and G v , and uses the decoded Y component quality factor, U component quality factor, and V component quality factor as the response matrix.
  • the index values of G y , G u and G v , the response vector g yi , the response vector g ui , and the response vector g vi are obtained from the indices of G y , G u and G v .
  • the response matrices G y , Gu and G v are obtained through network learning.
  • the network training process is similar to the embodiment shown in FIGS. 8A and 8B , and will not be repeated here.
  • control signal acts on the output of each layer of the network in the Y component processing module, the U component processing module, and the V component processing module.
  • control signal only acts on the Y component processing module, the U component processing module, and the V component Handle the output of part of the network in the module.
  • the response signal only acts on the output of the middle layer of the Y component processing module, the U component processing module, and the V component processing module.
  • the control signal acts on the Y component processing module, the U component processing module, and the V component Process the output of any one or more layers of networks in the module.
  • This embodiment provides a technical solution in the case where Y, U, and V are separately processed as three components.
  • Y, U, and V such as ⁇ YU, V ⁇ , ⁇ YV, U ⁇ , ⁇ Y, UV ⁇ .
  • the quality factors of different components of YUV in the present invention are input to the rate allocation control module, and the output control signal of this module acts on the feature maps of different components respectively, thereby realizing the rate allocation of different components.
  • the different components may refer to three components of Y, U, and V, two components of Y and UV, or other combinations of Y, U, and V.
  • control signal refers to the control vector q i , which is generated according to the quality factors of different components: the network learns to obtain the weight matrix ⁇ q c1 , q c2 , ..., q cN ⁇ of the same component, where c is 2 or 3 representing The number of different components, N is the number of quality factor candidate values.
  • the corresponding control vector q ci of different components is obtained according to the quality factor indexes of different components.
  • control signal refers to a control vector q and an offset vector b
  • the quality factors of different components are used as the input of the fully connected network
  • the corresponding control vector q and offset vector b of the different components are output.
  • FIG. 10 is a schematic structural diagram illustrating an encoding apparatus 1000 according to an embodiment of the present application.
  • the encoding means may correspond to the video encoder 20 .
  • the encoding device 1000 includes a first control module 1001 , a first control module 1002 and an encoding module 1003 .
  • the first control module 1001 is configured to act on the control signal of the first signal component of the video signal on the first feature map of the first signal component to obtain the second feature map of the first signal component, The control signal of the first signal component is obtained by learning;
  • the second control module 1002 is configured to apply the control signal of the second signal component of the video signal to the first feature map of the second signal component to obtain a second feature map of the second signal component, wherein the control signal of the second signal component is obtained by learning;
  • an encoding module 1003, configured to The second characteristic map of the signal component is used to obtain the code stream of the video signal.
  • the encoding apparatus 1000 may further include the rate allocation control module described in the previous embodiments.
  • the encoding apparatus 1000 is used to implement the encoding method introduced in the foregoing embodiments. For detailed functions, refer to the descriptions of the foregoing embodiments, and the description will not be repeated here.
  • FIG. 11 is a schematic structural diagram illustrating a decoding apparatus 1100 according to an embodiment of the present application.
  • the decoding apparatus 1100 may correspond to the video decoder 30 .
  • the decoding apparatus 1100 includes a decoding module 1101 , a first control module 1102 , a first control module 1103 and a reconstruction module 1104 .
  • the decoding module 1101 is configured to obtain a code stream of the video signal, and perform entropy decoding on the code stream to obtain a feature map of the first signal component of the video signal and a feature map of the second signal component of the video signal a feature map;
  • the first control module 1102 is configured to obtain a reconstructed map of the first signal component according to the response signal of the first signal component and the feature map of the first signal component, wherein the first signal component The response signal is obtained through learning;
  • the second control module 1103 is configured to obtain a reconstruction map of the second signal component according to the response signal of the second signal component and the feature map of the second signal component, wherein the The response signal of the second signal component is obtained by learning;
  • the reconstruction module 1104 is configured to reconstruct the video signal according to the reconstruction map of the first signal component and the reconstruction map of the second signal component.
  • the decoding apparatus 1100 may further include the quality response module described in the previous embodiments.
  • the decoding apparatus 1100 is configured to implement the decoding methods introduced in the foregoing embodiments. For detailed functions, refer to the descriptions of the foregoing embodiments, and the descriptions will not be repeated here.
  • Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application.
  • the computer program product may comprise a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory or may be used to store instructions or data structures desired program code in the form of any other medium that can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are used to transmit instructions from a website, server, or other remote source
  • the coaxial cable Wire, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media.
  • computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media.
  • magnetic disks and optical disks include compact disks (CDs), laser disks, optical disks, digital versatile disks (DVDs), and Blu-ray disks, where disks typically reproduce data magnetically, while disks reproduce optically with lasers data. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • the term "processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in combination with into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the techniques of this application may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chip set).
  • IC integrated circuit
  • Various components, modules, or units are described herein to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in codec hardware units in conjunction with suitable software and/or firmware, or by interoperating hardware units (including one or more processors as described above) supply.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请提供了分层编解码的方法及装置。涉及基于人工智能(AI)的视频或图像压缩技术领域。该编码方法包括把视频信号的第一信号分量的控制信号作用于第一信号分量的第一特征图,获得第一信号分量的第二特征图,其中第一信号分量的控制信号通过学习获得;把视频信号的第二信号分量的控制信号作用于第二信号分量的第一特征图,获得第二信号分量的第二特征图,其中第二信号分量的控制信号通过学习获得;以及根据第一信号分量的第二特征图和第二信号分量的第二特征图,获得视频信号的码流。本申请能够适配不同色彩特性的图像内容。

Description

分层编解码的方法及装置 技术领域
本发明实施例涉及基于人工智能(AI)的视频或图像压缩技术领域,尤其涉及一种分层编解码的方法及装置。
背景技术
视频压缩编解码技术在多媒体服务,广播,视频通信和存储等领域都有广泛的应用,例如广播数字电视、互联网和移动网络上的视频传输、视频聊天和视频会议等实时会话应用、DVD和蓝光光盘、视频内容采集和编辑系统以及可携式摄像机的安全应用。
即使在影片较短的情况下也需要对大量的视频数据进行描述,当数据要在带宽容量受限的网络中发送或以其它方式传输时,这样可能会造成困难。因此,视频数据通常要先压缩然后在现代电信网络中传输。由于内存资源可能有限,当在存储设备上存储视频时,视频的大小也可能成为问题。视频压缩设备通常在信源侧使用软件和/或硬件,以在传输或存储之前对视频数据进行编码,从而减少用来表示数字视频图像所需的数据量。然后,压缩的数据在目的地侧由视频解压缩设备接收。在有限的网络资源以及对更高视频质量的需求不断增长的情况下,需要改进压缩和解压缩技术,这些改进的技术能够提高压缩率而几乎不影响图像质量。
近年来,将深度学习应用于在端到端的图像编解码技术领域逐渐成为一种趋势。在采用混合架构的视频编码器和视频解码器中,对特征图进行熵编码时,假定特征值满足零均值的高斯分布,超先验结构估计高斯分布的方差,得到特征值的概率分布模型,算术编码模块基于估计的概率分布对特征图进行熵编码。为了使解码端同样能准确地估计特征图的概率分布,超先验结构中模块提取估计概率分布的隐变量,隐变量经由量化、算术编码作为边信息传递到解码端。在这样的机制下,输入图像为YUV格式时,Y、U、V分量的码率占比固定。但是由于图像内容色彩特性不同,YUV分量码率固定,会导致编码的图像失真较大。
发明内容
本申请提供一种分层编解码的方法及装置,能够适配不同色彩特性的图像内容。
在本申请中,分层编解码是指把视频信号分为第一信号分量和第二信号分量;或者 把视频信号分为第一信号分量,第二信号分量和第三信号分量。所述第一信号分量为Y分量,所述第二信号分量为UV分量,U分量或V分量。当所述第二信号分量为U分量时,所述第三信号分量为V分量,或当所述第二信号分量为V分量,所述第三信号分量为U分量。
第一方面,本申请提供一种编码方法。所述编码方法包括:把所述视频信号的第一信号分量的控制信号作用于所述第一信号分量的第一特征图,获得所述第一信号分量的第二特征图,其中所述第一信号分量的控制信号通过学习获得;把所述视频信号的第二信号分量的控制信号作用于所述第二信号分量的第一特征图,获得所述第二信号分量的第二特征图,其中所述第二信号分量的控制信号通过学习获得;以及根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流。
在一种可能的实现方式中,根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流包括:对所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,或者,对所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,或者,对经神经网络处理的所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,或者,对经神经网络处理的所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流。
在一种可能的实现方式中,根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流包括:对所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,对所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,对经神经网络处理的所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,对经神经网络处理的所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流。
在一种可能的实现方式中,根据所述第一信号分量的质量因子从N个候选第一控制信号中,获得所述第一信号分量的控制信号,其中N为大于1的整数;以及根据所述第 二信号分量的质量因子从M个候选第二控制信号中,获得所述第二信号分量的控制信号,其中M为大于1的整数。N和M可以相等或者不等,本申请对此不做限定。
在一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为UV分量时,通过学习生成所述Y分量的控制信号矩阵{q y1,q y2,…q yi…q yN}和所述UV分量的控制信号矩阵{q uv1,q uv2,…q uvj,…q uvM},其中N和M为大于1的整数;根据所述Y分量的质量因子的索引i得到所述第一信号分量的控制信号q yi,根据所述UV分量的质量因子的索引j得到所述第二信号分量的控制信号q uvj。此时所述视频信号的码流中包括所述Y分量的质量因子的索引i和所述UV分量的质量因子的索引j。或者
在另一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为UV分量时,通过学习生成所述视频信号的控制信号矩阵{q c1,q c2,…q ci…q cN},其中c为2代表Y分量和UV分量,N为大于1的整数;根据所述视频信号的质量因子的索引i得到包含所述第一信号分量和所述第二信号分量的控制信号q ci。此时所述视频信号的码流中包括所述视频信号的质量因子的索引i。或者
在再一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为UV分量时,通过全连接网络实现,则将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的控制信号;将所述UV分量的质量因子作为全连接网络的输入,输出所述UV分量的控制信号。此时所述视频信号的码流中包括所述Y分量的质量因子和所述UV分量的质量因子。
在一种可能的实现方式中,当所述第二信号分量为U分量或者V分量时,所述方法还包括:把所述视频信号的第三信号分量的控制信号作用于所述第三信号分量的第一特征图,获得所述第三信号分量的第二特征图,其中所述第三信号分量的控制信号通过学习获得,其中当所述第二信号分量为U分量时,所述第三信号分量为V分量,或当所述第二信号分量为V分量,所述第三信号分量为U分量。
在一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为U分量时,所述第三信号分量为V分量,通过学习生成所述Y分量的控制信号矩阵{q y1,q y2,…q yi…q yN}、所述U分量的控制信号矩阵{q u1,q u2,…q uj…,q uM}、所述V分量的控制信号矩阵{q v1,q v2,…,q vk…q vL},其中N,M和L为大于1的整数;根据所述Y分量质量因子的索引i得到所述第一信号分量的控制信号q yi;根据所述U分量质量因子的索引j得到所述第二信号分量的控制信号q uj;以及根据所述V分量质量因子的索引k得到所述第三信号分量的控制信号q vk。此时所述视频信号的码流中包括所述Y分量的质量因 子的索引i、所述U分量的质量因子的索引j、和所述V分量质量因子的索引k。或者
在另一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为U分量时,所述第三信号分量为V分量,通过学习生成所述视频信号的控制信号矩阵{q c1,q c2,…q ci…q cN},其中c为3代表Y分量、U分量和V分量,N为大于1的整数;根据所述视频信号的质量因子的索引i得到包含所述第一信号分量、所述第二信号分量和所述第三信号分量的控制信号q ci。此时所述视频信号的码流中包括所述视频信号的质量因子的索引i。或者
在再一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为U分量时,所述第三信号分量为V分量,通过全连接网络实现,则将所述Y分量的质量因子作为全连接网络的输入,输出Y分量的控制信号;将所述U分量的质量因子作为全连接网络的输入,输出U分量的控制信号;以及将所述V分量的质量因子作为全连接网络的输入,输出V分量的控制信号。此时所述视频信号的码流中包括所述Y分量的质量因子、所述U分量的质量因子和所述V分量的质量因子。
第二方面,本申请提供一种解码方法。所述解码方法包括:获得所述视频信号的码流;对所述码流进行熵解码以获得所述视频信号的第一信号分量的特征图和所述视频信号的第二信号分量的特征图;根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图,其中所述第一信号分量的响应信号通过学习获得;根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图,其中所述第二信号分量的响应信号通过学习获得;以及根据所述第一信号分量的重建图和所述第二信号分量的重建图,重建所述视频信号。
需要说明的是,本申请中解码端的响应信号与编码端的控制信号类似,为了区分解码端称为响应信号,编码端称为控制信号。解码端的响应信号包括响应向量,或者包括响应向量和偏移向量。
在一种可能的实现方式中,所述码流中还包括所述第一信号分量的质量因子信息和所述第二信号分量的质量因子信息,其中所述第一信号分量的质量因子信息为所述第一信号分量的质量因子或者所述第一信号分量的质量因子的索引,所述第二信号分量的质量因子信息为所述第二信号分量的质量因子或者所述第二信号分量的质量因子的索引;通过所述第一信号分量的质量因子信息,获得所述第一信号分量的响应信号;通过所述第二信号分量的质量因子信息,获得所述第二信号分量的响应信号。
其中,当所述第一信号分量的质量因子信息为所述第一信号分量的质量因子时,所 述第一信号分量的质量因子取值为N个中的一个;当所述第一信号分量的质量因子信息为所述第一信号分量的质量因子的索引时,所述第一信号分量的质量因子的索引的取值范围为0至N-1或者1至N,其中N为大于1的整数;
当所述第二信号分量的质量因子信息为所述第二信号分量的质量因子时,所述第二信号分量的质量因子取值为M个中的一个;当所述第二信号分量的质量因子信息为所述第二信号分量的质量因子的索引时,所述第二信号分量的质量因子的索引的取值范围为0至M-1或者1至M,其中M为大于1的整数。
在一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为UV分量时,若所述码流中包括所述Y分量的质量因子的索引i和所述UV分量的质量因子的索引j,通过学习生成所述第一信号分量的响应信号矩阵{g y1,g y2,…g yi…g yN}、和所述第二信号分量的响应信号矩阵{g uv1,g uv2,…g uvj,…g uvM},其中N和M为大于1的整数;根据所述Y分量的质量因子的索引i得到所述第一信号分量的响应信号g yi;根据所述UV分量的质量因子的索引j得到所述第二信号分量的响应信号g uvj。或者
在另一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为UV分量时,若所述码流中包括所述视频信号的质量因子的索引i,通过学习生成所述视频信号的响应信号矩阵{g c1,g c2,…g ci…g cN},其中c为2代表Y分量和UV分量,N为大于1的整数;根据所述视频信号的质量因子的索引i得到包含所述第一信号分量和所述第二信号分量的响应信号g ci
在再一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为UV分量时,若所述码流中包括所述第一信号分量的质量因子和第二信号分量的质量因子,通过全连接网络实现,则将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的响应信号;将所述UV分量的质量因子作为全连接网络的输入,输出所述UV分量的响应信号。
在一种可能的实现方式中,当所述第二信号分量为U分量或者V分量时,所述方法还包括:对所述码流进行熵解码以获得所述视频信号的第三信号分量的特征图;根据所述第三信号分量的响应信号和所述第三信号分量的特征图,获得所述第三信号分量的重建图,其中所述第三信号分量的响应信号通过学习获得,其中当所述第二信号分量为U分量时,所述第三信号分量为V分量,或当所述第二信号分量为V分量,所述第三信号分量为U分量。则所述重建所述视频信号包括:根据所述第一信号分量的重建图、所述第二信号分量的重建图和和所述第三信号分量的重建图,重建所述视频信号。
在一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为U分量时,所述第三信号分量为V分量,若所述码流中包括所述Y分量的质量因子的索引i、所述U分量的质量因子的索引j,和所述V分量的质量因子的索引k,通过学习生成所述第一信号分量的响应信号矩阵{g y1,g y2,…g yi…g yN}、所述第二信号分量的响应信号矩阵{g u1,g u2,…g uj…,g uM}、所述第三信号分量的响应信号矩阵{g v1,g v2,…,g vk…g vL},其中N,M和L为大于1的整数;根据所述Y分量质量因子的索引i得到所述第一信号分量的响应信号g yi;根据所述U分量质量因子的索引j得到所述第二信号分量的响应信号g uj;以及根据所述V分量质量因子的索引k得到所述第三信号分量的响应信号g vk。或者
在另一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为U分量时,所述第三信号分量为V分量,若所述码流中包括所述视频信号的质量因子的索引i,通过学习生成所述视频信号的响应信号矩阵{g c1,g c2,…g ci…g cN},其中c为3代表Y分量、U分量和V分量,N为大于1的整数;根据所述视频信号的质量因子的索引i得到包含所述第一信号分量、所述第二信号分量和所述第三信号分量的响应信号g ci。或者
在再一种可能的实现方式中,当所述第一信号分量为Y分量,所述第二信号分量为U分量时,所述第三信号分量为V分量,若所述码流中包括所述第一信号分量的质量因子、所述第二信号分量的质量因子和所述第三信号分量的质量因子,通过全连接网络实现,则将所述Y分量的质量因子作为全连接网络的输入,输出Y分量的响应信号;将所述U分量的质量因子作为全连接网络的输入,输出U分量的响应信号;以及将所述V分量的质量因子作为全连接网络的输入,输出V分量的响应信号。
第三方面,本申请提供一种编码器,包括处理电路,用于执行根据上述第一方面及第一方面任一项所述的方法。
第四方面,本申请提供一种解码器,包括处理电路,用于执行上述第二方面及第二方面任一项所述的方法。
第五方面,本申请提供一种计算机程序产品,包括程序代码,当其在计算机或处理器上执行时,用于执行上述第一方面及第一方面任一项、上述第二方面及第二方面任一项所述的方法。
第六方面,本申请提供一种编码器,包括:一个或多个处理器;非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述解码器执行上述第一方面及第一方面任一项所述的方法。
第七方面,本申请提供一种解码器,包括:一个或多个处理器;非瞬时性计算机可 读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述编码器执行上述第二方面及第二方面任一项所述的方法所述的方法。
第八方面,本申请提供一种非瞬时性计算机可读存储介质,包括程序代码,当其由计算机设备执行时,用于执行上述第一方面及第一方面任一项、上述第二方面及第二方面任一项所述的方法。
第九方面,本发明涉及编码装置,具有实现上述第一方面或第一方面任一项的方法实施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述编码装置包括:第一控制模块,用于把所述视频信号的第一信号分量的控制信号作用于所述第一信号分量的第一特征图,获得所述第一信号分量的第二特征图,其中所述第一信号分量的控制信号通过学习获得;第二控制模块,用于把所述视频信号的第二信号分量的控制信号作用于所述第二信号分量的第一特征图,获得所述第二信号分量的第二特征图,其中所述第二信号分量的控制信号通过学习获得;以及编码模块,用于根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流。这些模块可以执行上述第一方面或第一方面任一项方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第十方面,本发明涉及解码装置,具有实现上述第二方面或第二方面任一项的方法实施例中行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,所述解码装置包括:解码模块,用于获得所述视频信号的码流,对所述码流进行熵解码以获得所述视频信号的第一信号分量的特征图和所述视频信号的第二信号分量的特征图;第一控制模块,用于根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图,其中所述第一信号分量的响应信号通过学习获得;第二控制模块,用于根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图,其中所述第二信号分量的响应信号通过学习获得;以及重建模块,用于根据所述第一信号分量的重建图和所述第二信号分量的重建图,重建所述视频信号。这些模块可以执行上述第二方面或第二方面任一项方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
在现有端到端图像编码中,对于某一特定网络学习优化时,按固定Y、U、V分量权 重值进行优化,因此,YUV分量的码率固定。由于不同图像色彩特性不同,固定码率分配会导致部分视频图像内容上编码性能较差的现象。本申请上述各方面通过各信号分量的控制信号对相应的信号分量的特征图分别进行控制,从而可以支持YUV分量间的码率分配,达到适配不同色彩特性的图像内容。
附图及以下说明中将详细描述一个或多个实施例。其它特征、目的和优点在说明、附图以及权利要求中是显而易见的。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1A为用于实现本发明实施例的视频译码系统示例的框图,其中该系统基于深度学习来编码或解码视频图像;
图1B为用于实现本发明实施例的视频译码系统示例另一示例的框图,其中该系统基于深度学习来编码或解码视频图像;
图1C为用于实现本发明实施例的视频译码系统再一示例的框图,其中该视频编码器和/或视频解码器基于深度学习来编码或解码视频图像;
图2为用于实现本发明实施例的视频编码器实例示例的框图,其中该视频编码器20基于深度学习来编码视频图像;
图3为用于实现本发明实施例的视频解码器实例示例的框图,其中该视频解码器30基于深度学习来解码视频图像;
图4为用于实现本发明实施例的视频译码装置的示意性框图;
图5为用于实现本发明实施例的视频译码装置的示意性框图;
图6为YUV格式示意图;
图7A是本申请实施例提供的分层编解码结构的示意图;
图7B是基于图7A的编码方法的一种实施例;
图7C是基于图7A的解码方法的一种实施例;
图7D是本申请实施例提供的分层编解码结构的另一示意图;
图8A是本申请实施例提供的分层编解码结构的示意图;
图8B是图8A的一种实施例;
图9A是本申请实施例提供的分层编解码结构的示意图;
图9B是图9A的一种实施例;
图10是示出根据本申请一种实施例的编码装置1000的结构示意图;
图11是示出根据本申请一种实施例的解码装置1100的结构示意图。
具体实施方式
本申请实施例提供一种基于AI的视频图像压缩技术,具体提供一种分层编解码中的方法及装置,以改进传统的基于端到端的混合视频编解码系统。
视频编码通常是指处理形成视频或视频序列的图像序列。在视频编码领域,术语“图像(picture)”、“帧(frame)”或“图片(image)”可以用作同义词。视频编码(或通常称为编码)包括视频编码和视频解码两部分。视频编码在源侧执行,通常包括处理(例如,压缩)原始视频图像以减少表示该视频图像所需的数据量(从而更高效存储和/或传输)。视频解码在目的地侧执行,通常包括相对于编码器作逆处理,以重建视频图像。实施例涉及的视频图像(或通常称为图像)的“编码”应理解为视频图像或视频序列的“编码”或“解码”。编码部分和解码部分也合称为编解码(编码和解码,CODEC)。
在无损视频编码情况下,可以重建原始视频图像,即重建的视频图像与原始视频图像具有相同的质量(假设存储或传输期间没有传输损耗或其它数据丢失)。在有损视频编码情况下,通过量化等执行进一步压缩,来减少表示视频图像所需的数据量,而解码器侧无法完全重建视频图像,即重建的视频图像的质量比原始视频图像的质量较低或较差。
几个视频编码标准属于“有损混合型视频编解码”(即,将像素域中的空间和时间预测与变换域中用于应用量化的2D变换编码结合)。视频序列中的每个图像通常分割成不重叠的块集合,通常在块级上进行编码。换句话说,编码器通常在块(视频块)级处理及编码视频,例如,通过空间(帧内)预测和时间(帧间)预测来产生预测块;从当前块(当前处理/待处理的块)中减去预测块,得到残差块;在变换域中变换残差块并量化残差块,以减少待传输(压缩)的数据量,而解码器侧将相对于编码器的逆处理部分应用于编码或压缩的块,以重建用于表示的当前块。另外,编码器需要重复解码器的处理步骤,使得编码器和解码器生成相同的预测(例如,帧内预测和帧间预测)和/或重建像素,用于处理,即编码后续块。
在以下译码系统10的实施例中,编码器20和解码器30根据图1B至图3进行描述。
图1A为示例性译码系统的示意性框图,如图1A所示,使用视频采集设备将视频采集以后,经过一系列的前处理,再对处理以后的视频进行压缩编码,得到编码码流。使 用发送模块将码流经传输网络发送到接收模块,经解码器进行解码以后,便可渲染显示。除此之外,视频编码以后的码流也可以直接进行存储。
图1B为示例性译码系统10的示意性框图,例如可以利用本申请技术的视频译码系统10(或简称为译码系统10)。视频译码系统10中的视频编码器20(或简称为编码器20)和视频解码器30(或简称为解码器30)代表可用于根据本申请中描述的各种示例执行各技术的设备等。
如图1B所示,译码系统10包括源设备12,源设备12用于将编码图像等编码图像数据21提供给用于对编码图像数据21进行解码的目的设备14。
源设备12包括编码器20,另外即可选地,可包括图像源16、图像预处理器等预处理器(或预处理单元)18、通信接口(或通信单元)22。
图像源16可包括或可以为任意类型的用于捕获现实世界图像等的图像捕获设备,和/或任意类型的图像生成设备,例如用于生成计算机动画图像的计算机图形处理器或任意类型的用于获取和/或提供现实世界图像、计算机生成图像(例如,屏幕内容、虚拟现实(virtual reality,VR)图像和/或其任意组合(例如增强现实(augmented reality,AR)图像)的设备。所述图像源可以为存储上述图像中的任意图像的任意类型的内存或存储器。
为了区分预处理器(或预处理单元)18执行的处理,图像(或图像数据)17也可称为原始图像(或原始图像数据)17。
预处理器18用于接收(原始)图像数据17,并对图像数据17进行预处理,得到预处理图像(预处理图像数据)19。例如,预处理器18执行的预处理可包括修剪、颜色格式转换(例如从RGB转换为YCbCr)、调色或去噪。可以理解的是,预处理单元18可以为可选组件。
视频编码器(或编码器)20用于接收预处理图像数据19并提供编码图像数据21(下面将根据图2等进一步描述)。
源设备12中的通信接口22可用于:接收编码图像数据21并通过通信信道13向目的设备14等另一设备或任何其它设备发送编码图像数据21(或其它任意处理后的版本),以便存储或直接重建。
目的设备14包括解码器30,另外即可选地,可包括通信接口(或通信单元)28、后处理器(或后处理单元)32和显示设备34。
目的设备14中的通信接口28用于直接从源设备12或从存储设备等任意其它源设备 接收编码图像数据21(或其它任意处理后的版本),例如,存储设备为编码图像数据存储设备,并将编码图像数据21提供给解码器30。
通信接口22和通信接口28可用于通过源设备12与目的设备14之间的直连通信链路,例如直接有线或无线连接等,或者通过任意类型的网络,例如有线网络、无线网络或其任意组合、任意类型的私网和公网或其任意类型的组合,发送或接收编码图像数据(或编码数据)21。
例如,通信接口22可用于将编码图像数据21封装为报文等合适的格式,和/或使用任意类型的传输编码或处理来处理所述编码后的图像数据,以便在通信链路或通信网络上进行传输。
通信接口28与通信接口22对应,例如,可用于接收传输数据,并使用任意类型的对应传输解码或处理和/或解封装对传输数据进行处理,得到编码图像数据21。
通信接口22和通信接口28均可配置为如图1B中从源设备12指向目的设备14的对应通信信道13的箭头所指示的单向通信接口,或双向通信接口,并且可用于发送和接收消息等,以建立连接,确认并交换与通信链路和/或例如编码后的图像数据传输等数据传输相关的任何其它信息,等等。
视频解码器(或解码器)30用于接收编码图像数据21并提供解码图像(或解码图像数据)31(下面将根据图3等进一步描述)。
后处理器32用于对解码后的图像等解码图像数据31(也称为重建后的图像数据)进行后处理,得到后处理后的图像等后处理图像数据33。后处理单元32执行的后处理可以包括例如颜色格式转换(例如从YCbCr转换为RGB)、调色、修剪或重采样,或者用于产生供显示设备34等显示的解码图像数据31等任何其它处理。
显示设备34用于接收后处理图像数据33,以向用户或观看者等显示图像。显示设备34可以为或包括任意类型的用于表示重建后图像的显示器,例如,集成或外部显示屏或显示器。例如,显示屏可包括液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light emitting diode,OLED)显示器、等离子显示器、投影仪、微型LED显示器、硅基液晶显示器(liquid crystal on silicon,LCoS)、数字光处理器(digital light processor,DLP)或任意类型的其它显示屏。
译码系统10还包括训练引擎25,训练引擎25用于训练编码器20或解码器30以对重构图像进行分层编码码处理。
本申请实施例中训练数据包括:训练矩阵集合,该训练矩阵集合包括图像块的滤波 前亮度矩阵、量化步长矩阵和滤波后亮度矩阵,其中滤波前亮度矩阵中的对应位置的像素点对应于对应图像块中的对应位置的像素的滤波前的亮度值,量化步长矩阵中的对应位置的像素点对应于对应图像块中的对应位置的像素的亮度值对应的量化步长值,滤波后亮度矩阵中的对应位置的像素点对应于对应图像块中的对应位置的像素的滤波后的亮度值。
训练矩阵集合中的多个矩阵例如可以以图6a至6c所示的方式输入训练引擎25。如图6a所示,将训练矩阵集合中的多个矩阵直接输入训练引擎25,该多个矩阵均是二维矩阵。如图6b所示,选取训练矩阵集合中的多个矩阵的部分或全部做合并处理得到多维矩阵,再将该多维矩阵输入训练引擎25。如图6c所示,选取训练矩阵集合中的多个矩阵的部分或全部做相加(或相乘)处理得到二维矩阵,再将该二维矩阵输入训练引擎25。
上述训练数据可以存入数据库(未示意)中,训练引擎25基于训练数据训练得到目标模型(例如:可以是用于分层编解码的神经网络等)。需要说明的是,本申请实施例对于训练数据的来源不做限定,例如可以是从云端或其他地方获取训练数据进行模型训练。
训练引擎25训练目标模型的过程使得滤波前像素逼近原始像素值。每个训练过程可以使用64个图像的小批量大小和1e-4的初始学习率,遵循步长大小为10。在训练数据可以是通过编码器在不同QP量化参数设置下生成的数据。目标模型能够用于实现本申请实施例提供的分层编解码方法,即,将重构得到的图像或图像块通过相关预处理后输入该目标模型,可以得到滤波后的图像或图像块。本申请实施例中的目标模型具体可以为滤波网络,下文将结合图7A-7D详细说明目标模型。
训练引擎25训练得到的目标模型可以应用于译码系统10中,例如,应用于图1B所示的源设备12(例如编码器20)或目的设备14(例如解码器30)。训练引擎25可以在云端训练得到目标模型,然后译码系统10从云端下载并使用该目标模型;或者,训练引擎25可以在云端训练得到目标模型并使用该目标模型,译码系统10从云端直接获取处理结果。
尽管图1B示出了源设备12和目的设备14作为独立的设备,但设备实施例也可以同时包括源设备12和目的设备14或同时包括源设备12和目的设备14的功能,即同时包括源设备12或对应功能和目的设备14或对应功能。在这些实施例中,源设备12或对应功能和目的设备14或对应功能可以使用相同硬件和/或软件或通过单独的硬件和/或软件或其任意组合来实现。
根据描述,图1B所示的源设备12和/或目的设备14中的不同单元或功能的存在和(准确)划分可能根据实际设备和应用而有所不同。
编码器20(例如视频编码器20)或解码器30(例如视频解码器30)或两者都可通过如图1C所示的处理电路实现,例如一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件、视频编码专用处理器或其任意组合。编码器20可以通过处理电路46实现,以包含参照图2编码器20论述的各种模块和/或本文描述的任何其它编码器系统或子系统。解码器30可以通过处理电路46实现,以包含参照图3解码器30论述的各种模块和/或本文描述的任何其它解码器系统或子系统。所述处理电路46可用于执行下文论述的各种操作。如图5所示,如果部分技术在软件中实施,则设备可以将软件的指令存储在合适的非瞬时性计算机可读存储介质中,并且使用一个或多个处理器在硬件中执行指令,从而执行本发明技术。视频编码器20和视频解码器30中的其中一个可作为组合编解码器(encoder/decoder,CODEC)的一部分集成在单个设备中,如图1C所示。
源设备12和目的设备14可包括各种设备中的任一种,包括任意类型的手持设备或固定设备,例如,笔记本电脑或膝上型电脑、手机、智能手机、平板或平板电脑、相机、台式计算机、机顶盒、电视机、显示设备、数字媒体播放器、视频游戏控制台、视频流设备(例如,内容业务服务器或内容分发服务器)、广播接收设备、广播发射设备,等等,并可以不使用或使用任意类型的操作系统。在一些情况下,源设备12和目的设备14可配备用于无线通信的组件。因此,源设备12和目的设备14可以是无线通信设备。
在一些情况下,图1B所示的视频译码系统10仅仅是示例性的,本申请提供的技术可适用于视频编码设置(例如,视频编码或视频解码),这些设置不一定包括编码设备与解码设备之间的任何数据通信。在其它示例中,数据从本地存储器中检索,通过网络发送,等等。视频编码设备可以对数据进行编码并将数据存储到存储器中,和/或视频解码设备可以从存储器中检索数据并对数据进行解码。在一些示例中,编码和解码由相互不通信而只是编码数据到存储器和/或从存储器中检索并解码数据的设备来执行。
图1C是根据一示例性实施例的包含图2的视频编码器20和/或图3的视频解码器30的视频译码系统40的实例的说明图。视频译码系统40可以包含成像设备41、视频编码器20、视频解码器30(和/或藉由处理电路46实施的视频编/解码器)、天线42、一个或多个处理器43、一个或多个内存存储器44和/或显示设备45。
如图1C所示,成像设备41、天线42、处理电路46、视频编码器20、视频解码器30、处理器43、内存存储器44和/或显示设备45能够互相通信。在不同实例中,视频译码系统40可以只包含视频编码器20或只包含视频解码器30。
在一些实例中,天线42可以用于传输或接收视频数据的经编码比特流。另外,在一些实例中,显示设备45可以用于呈现视频数据。处理电路46可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。视频译码系统40也可以包含可选的处理器43,该可选处理器43类似地可以包含专用集成电路(application-specific integrated circuit,ASIC)逻辑、图形处理器、通用处理器等。另外,内存存储器44可以是任何类型的存储器,例如易失性存储器(例如,静态随机存取存储器(static random access memory,SRAM)、动态随机存储器(dynamic random access memory,DRAM)等)或非易失性存储器(例如,闪存等)等。在非限制性实例中,内存存储器44可以由超速缓存内存实施。在其它实例中,处理电路46可以包含存储器(例如,缓存等)用于实施图像缓冲器等。
在一些实例中,通过逻辑电路实施的视频编码器20可以包含(例如,通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频编码器20,以实施参照图2和/或本文中所描述的任何其它编码器系统或子系统所论述的各种模块。逻辑电路可以用于执行本文所论述的各种操作。
在一些实例中,视频解码器30可以以类似方式通过处理电路46实施,以实施参照图3的视频解码器30和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。在一些实例中,逻辑电路实施的视频解码器30可以包含(通过处理电路46或内存存储器44实施的)图像缓冲器和(例如,通过处理电路46实施的)图形处理单元。图形处理单元可以通信耦合至图像缓冲器。图形处理单元可以包含通过处理电路46实施的视频解码器30,以实施参照图3和/或本文中所描述的任何其它解码器系统或子系统所论述的各种模块。
在一些实例中,天线42可以用于接收视频数据的经编码比特流。如所论述,经编码比特流可以包含本文所论述的与编码视频帧相关的数据、指示符、索引值、模式选择数据等,例如与编码分割相关的数据(例如,变换系数或经量化变换系数,(如所论述的)可选指示符,和/或定义编码分割的数据)。视频译码系统40还可包含耦合至天线42并用于解码经编码比特流的视频解码器30。显示设备45用于呈现视频帧。
应理解,本申请实施例中对于参考视频编码器20所描述的实例,视频解码器30可以用于执行相反过程。关于信令语法元素,视频解码器30可以用于接收并解析这种语法元素,相应地解码相关视频数据。在一些例子中,视频编码器20可以将语法元素熵编码成经编码视频比特流。在此类实例中,视频解码器30可以解析这种语法元素,并相应地解码相关视频数据。
为便于描述,参考通用视频编码(Versatile video coding,VVC)参考软件或由ITU-T视频编码专家组(Video Coding Experts Group,VCEG)和ISO/IEC运动图像专家组(Motion Picture Experts Group,MPEG)的视频编码联合工作组(Joint Collaboration Team on Video Coding,JCT-VC)开发的高性能视频编码(High-Efficiency Video Cod ing,HEVC)描述本发明实施例。本领域普通技术人员理解本发明实施例不限于HEVC或VVC。
编码器和编码方法
图2为用于实现本申请技术的视频编码器20的示例的示意性框图。在图2的示例中,视频编码器20包括输入端(或输入接口)201、残差计算单元204、变换处理单元206、量化单元208、反量化单元210、逆变换处理单元212、重建单元214、环路滤波器220、解码图像缓冲器(decoded picture buffer,DPB)230、模式选择单元260、熵编码单元270和输出端(或输出接口)272。模式选择单元260可包括帧间预测单元244、帧内预测单元254和分割单元262。帧间预测单元244可包括运动估计单元和运动补偿单元(未示出)。图2所示的视频编码器20也可称为混合型视频编码器或基于混合型视频编解码器的视频编码器。
残差计算单元204、变换处理单元206、量化单元208和模式选择单元260组成编码器20的前向信号路径,而反量化单元210、逆变换处理单元212、重建单元214、缓冲器216、环路滤波器220、解码图像缓冲器(decoded picture buffer,DPB)230、帧间预测单元244和帧内预测单元254组成编码器的后向信号路径,其中编码器20的后向信号路径对应于解码器的信号路径(参见图3中的解码器30)。反量化单元210、逆变换处理单元212、重建单元214、环路滤波器220、解码图像缓冲器230、帧间预测单元244和帧内预测单元254还组成视频编码器20的“内置解码器”。
量化
量化单元208用于通过例如标量量化或矢量量化对变换系数207进行量化,得到量化变换系数209。量化变换系数209也可称为量化残差系数209。
量化过程可减少与部分或全部变换系数207有关的位深度。例如,可在量化期间将n 位变换系数向下舍入到m位变换系数,其中n大于m。可通过调整量化参数(quantization parameter,QP)修改量化程度。例如,对于标量量化,可以应用不同程度的比例来实现较细或较粗的量化。较小量化步长对应较细量化,而较大量化步长对应较粗量化。可通过量化参数(quantization parameter,QP)指示合适的量化步长。例如,量化参数可以为合适的量化步长的预定义集合的索引。例如,较小的量化参数可对应精细量化(较小量化步长),较大的量化参数可对应粗糙量化(较大量化步长),反之亦然。量化可包括除以量化步长,而反量化单元210等执行的对应或逆解量化可包括乘以量化步长。根据例如HEVC一些标准的实施例可用于使用量化参数来确定量化步长。一般而言,可以根据量化参数使用包含除法的等式的定点近似来计算量化步长。可以引入其它比例缩放因子来进行量化和解量化,以恢复可能由于在用于量化步长和量化参数的等式的定点近似中使用的比例而修改的残差块的范数。在一种示例性实现方式中,可以合并逆变换和解量化的比例。或者,可以使用自定义量化表并在比特流中等将其从编码器向解码器指示。量化是有损操作,其中量化步长越大,损耗越大。
在一个实施例中,视频编码器20(对应地,量化单元208)可用于输出量化参数(quantization parameter,QP),例如,直接输出或由熵编码单元270进行编码或压缩后输出,例如使得视频解码器30可接收并使用量化参数进行解码。
反量化
反量化单元210用于对量化系数执行量化单元208的反量化,得到解量化系数211,例如,根据或使用与量化单元208相同的量化步长执行与量化单元208所执行的量化方案的反量化方案。解量化系数211也可称为解量化残差系数211,对应于变换系数207,但是由于量化造成损耗,反量化系数211通常与变换系数不完全相同。
重建
重建单元214(例如,求和器214)用于将变换块213(即重建残差块213)添加到预测块265,以在像素域中得到重建块215,例如,将重建残差块213的像素点值和预测块265的像素点值相加。
滤波
环路滤波器单元220(或简称“环路滤波器”220)用于对重建块215进行滤波,得到滤波块221,或通常用于对重建像素点进行滤波以得到滤波像素点值。例如,环路滤波器单元用于顺利进行像素转变或提高视频质量。环路滤波器单元220可包括一个或多个环路滤波器,例如去块滤波器、像素点自适应偏移(sample-adaptive offset,SAO)滤 波器或一个或多个其它滤波器,例如自适应环路滤波器(adaptive loop filter,ALF)、噪声抑制滤波器(noise suppression filter,NSF)或任意组合。例如,环路滤波器单元220可以包括去块滤波器、SAO滤波器和ALF滤波器。滤波过程的顺序可以是去块滤波器、SAO滤波器和ALF滤波器。再例如,增加一个称为具有色度缩放的亮度映射(luma mapping with chroma scaling,LMCS)(即自适应环内整形器)的过程。该过程在去块之前执行。再例如,去块滤波过程也可以应用于内部子块边缘,例如仿射子块边缘、ATMVP子块边缘、子块变换(sub-block transform,SBT)边缘和内子部分(intra sub-partition,ISP)边缘。尽管环路滤波器单元220在图2中示为环路滤波器,但在其它配置中,环路滤波器单元220可以实现为环后滤波器。滤波块221也可称为滤波重建块221。
在一个实施例中,视频编码器20(对应地,环路滤波器单元220)可用于输出环路滤波器参数(例如SAO滤波参数、ALF滤波参数或LMCS参数),例如,直接输出或由熵编码单元270进行熵编码后输出,例如使得解码器30可接收并使用相同或不同的环路滤波器参数进行解码。
解码器和解码方法
图3示出了用于实现本申请技术的示例性视频解码器30。视频解码器30用于接收例如由编码器20编码的编码图像数据21(例如编码比特流21),得到解码图像331。编码图像数据或比特流包括用于解码所述编码图像数据的信息,例如表示编码视频片(和/或编码区块组或编码区块)的图像块的数据和相关的语法元素。
在图3的示例中,解码器30包括熵解码单元304、反量化单元310、逆变换处理单元312、重建单元314(例如求和器314)、环路滤波器320、解码图像缓冲器(DBP)330、模式应用单元360、帧间预测单元344和帧内预测单元354。帧间预测单元344可以为或包括运动补偿单元。在一些示例中,视频解码器30可执行大体上与参照图2的视频编码器100描述的编码过程相反的解码过程。
如编码器20所述,反量化单元210、逆变换处理单元212、重建单元214、环路滤波器220、解码图像缓冲器DPB230、帧间预测单元344和帧内预测单元354还组成视频编码器20的“内置解码器”。相应地,反量化单元310在功能上可与反量化单元110相同,逆变换处理单元312在功能上可与逆变换处理单元122相同,重建单元314在功能上可与重建单元214相同,环路滤波器320在功能上可与环路滤波器220相同,解码图像缓冲器330在功能上可与解码图像缓冲器230相同。因此,视频编码器20的相应单元和功能的解释相应地适用于视频解码器30的相应单元和功能。
反量化
反量化单元310可用于从编码图像数据21(例如通过熵解码单元304解析和/或解码)接收量化参数(quantization parameter,QP)(或一般为与反量化相关的信息)和量化系数,并基于所述量化参数对所述解码的量化系数309进行反量化以获得反量化系数311,所述反量化系数311也可以称为变换系数311。反量化过程可包括使用视频编码器20为视频片中的每个视频块计算的量化参数来确定量化程度,同样也确定需要执行的反量化的程度。
重建
重建单元314(例如,求和器314)用于将重建残差块313添加到预测块365,以在像素域中得到重建块315,例如,将重建残差块313的像素点值和预测块365的像素点值相加。
滤波
环路滤波器单元320(在编码环路中或之后)用于对重建块315进行滤波,得到滤波块321,从而顺利进行像素转变或提高视频质量等。环路滤波器单元320可包括一个或多个环路滤波器,例如去块滤波器、像素点自适应偏移(sample-adaptive offset,SAO)滤波器或一个或多个其它滤波器,例如自适应环路滤波器(adaptive loop filter,ALF)、噪声抑制滤波器(noise suppression filter,NSF)或任意组合。例如,环路滤波器单元220可以包括去块滤波器、SAO滤波器和ALF滤波器。滤波过程的顺序可以是去块滤波器、SAO滤波器和ALF滤波器。再例如,增加一个称为具有色度缩放的亮度映射(luma mapping with chroma scaling,LMCS)(即自适应环内整形器)的过程。该过程在去块之前执行。再例如,去块滤波过程也可以应用于内部子块边缘,例如仿射子块边缘、ATMVP子块边缘、子块变换(sub-block transform,SBT)边缘和内子部分(intra sub-partition,ISP)边缘。尽管环路滤波器单元320在图3中示为环路滤波器,但在其它配置中,环路滤波器单元320可以实现为环后滤波器。
解码器30用于通过输出端312等输出解码图像311,向用户显示或供用户查看。
尽管上述实施例主要描述了视频编解码,但应注意的是,译码系统10、编码器20和解码器30的实施例以及本文描述的其它实施例也可以用于静止图像处理或编解码,即视频编解码中独立于任何先前或连续图像的单个图像的处理或编解码。一般情况下,如果图像处理仅限于单个图像17,帧间预测单元244(编码器)和帧间预测单元344(解码器)可能不可用。视频编码器20和视频解码器30的所有其它功能(也称为工具或技术) 同样可用于静态图像处理,例如残差计算204/304、变换206、量化208、反量化210/310、(逆)变换212/312、分割262/362、帧内预测254/354和/或环路滤波220/320、熵编码270和熵解码304。
图4为本发明实施例提供的视频译码设备400的示意图。视频译码设备400适用于实现本文描述的公开实施例。在一个实施例中,视频译码设备400可以是解码器,例如图1B中的视频解码器30,也可以是编码器,例如图1B中的视频编码器20。
视频译码设备400包括:用于接收数据的入端口410(或输入端口410)和接收单元(receiver unit,Rx)420;用于处理数据的处理器、逻辑单元或中央处理器(central processing unit,CPU)430;例如,这里的处理器430可以是神经网络处理器430;用于传输数据的发送单元(transmitter unit,Tx)440和出端口450(或输出端口450);用于存储数据的存储器460。视频译码设备400还可包括耦合到入端口410、接收单元420、发送单元440和出端口450的光电(optical-to-electrical,OE)组件和电光(electrical-to-optical,EO)组件,用于光信号或电信号的出口或入口。
处理器430通过硬件和软件实现。处理器430可实现为一个或多个处理器芯片、核(例如,多核处理器)、FPGA、ASIC和DSP。处理器430与入端口410、接收单元420、发送单元440、出端口450和存储器460通信。处理器430包括译码模块470(例如,基于神经网络(neural networks,NN)的译码模块470)。译码模块470实施上文所公开的实施例。例如,译码模块470执行、处理、准备或提供各种编码操作。因此,通过译码模块470为视频译码设备400的功能提供了实质性的改进,并且影响了视频译码设备400到不同状态的切换。或者,以存储在存储器460中并由处理器430执行的指令来实现译码模块470。
存储器460包括一个或多个磁盘、磁带机和固态硬盘,可以用作溢出数据存储设备,用于在选择执行程序时存储此类程序,并且存储在程序执行过程中读取的指令和数据。存储器460可以是易失性和/或非易失性的,可以是只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、三态内容寻址存储器(ternary content-addressable memory,TCAM)和/或静态随机存取存储器(static random-access memory,SRAM)。
图5为示例性实施例提供的装置500的简化框图,装置500可用作图1B中的源设备12和目的设备14中的任一个或两个。
装置500中的处理器502可以是中央处理器。或者,处理器502可以是现有的或今 后将研发出的能够操控或处理信息的任何其它类型设备或多个设备。虽然可以使用如图所示的处理器502等单个处理器来实施已公开的实现方式,但使用一个以上的处理器速度更快和效率更高。
在一种实现方式中,装置500中的存储器504可以是只读存储器(ROM)设备或随机存取存储器(RAM)设备。任何其它合适类型的存储设备都可以用作存储器504。存储器504可以包括处理器502通过总线512访问的代码和数据506。存储器504还可包括操作系统508和应用程序510,应用程序510包括允许处理器502执行本文所述方法的至少一个程序。例如,应用程序510可以包括应用1至N,还包括执行本文所述方法的视频译码应用。
装置500还可以包括一个或多个输出设备,例如显示器518。在一个示例中,显示器518可以是将显示器与可用于感测触摸输入的触敏元件组合的触敏显示器。显示器518可以通过总线512耦合到处理器502。
虽然装置500中的总线512在本文中描述为单个总线,但是总线512可以包括多个总线。此外,辅助储存器可以直接耦合到装置500的其它组件或通过网络访问,并且可以包括存储卡等单个集成单元或多个存储卡等多个单元。因此,装置500可以具有各种各样的配置。
本发明实施例涉及一种面向色彩分量码率分配的AI图像有损编码方案,可应用于图1A至图5所描述的视频译码系统、编码器和解码器。
需要说明的是,本申请提供的方法主要用于YUV分量码率分配过程,此过程主要由编码端控制,为使得解码端更适配,也可在解码器中增加相应控制单元。
视频图像信号通常包括一个亮度分量与两个色度分量。亮度分量通常使用符号Y表示,色度分量通常使用符号U、V表示。如图7(a)至(c)所示,常用的YUV格式包括如下格式,图7中叉表示亮度分量采样点,圈表示每色度分量采样点:
4:4:4格式:表示色度分量没有下采样;
4:2:2格式:表示色度分量相对于亮度分量进行2:1的水平下采样,没有竖直下采样。对于每两个U采样点或V采样点,每个扫描行都包含四个Y采样点;
4:2:0格式:表示色度分量相对于亮度分量进行2:1的水平下采样与2:1的竖直下采样。
在视频图像采用YUV4:2:0格式的情况下,若图像块的亮度分量为2Nx2N大小的图像块,则图像块的色度分量为NxN大小的图像块。本发明实施例将以4:2:0格式为例解释本发明 方案技术。但可以理解的,本发明技术方案除了用于YUV4:2:0格式外,还可以用于YUV其它格式,或者其它视频图像格式中不同分量间的相互预测,如RGB格式等。另一方面,当前块可以是方形块,也可以是非方形的矩形块或其它形状区域,本发明实施例提供的技术方案同样适用。
为方便叙述,本发明实施例采用第一信号分量和第二信号分量的表述。若图像信号包括亮度信号分量与色度信号分量,则第一信号分量可以是色度分量,第二信号分量可以是亮度分量;若图像信号包括R、G、B三个信号分量,则第一信号分量可以是R、G、B三个信号分量中任意一个信号分量,第二信号分量可以是与第一信号分量不同的R、G、B三个信号分量中的一个信号分量;若按照其它方式将图像信号分解为多个信号分量,则可采用类似的方法规定第一信号分量与第二信号分量。
如下面具体描述,本申请实施例可以输入质量因子到码率控制模块(或称为码率分配控制模块),该模块生成各分量特征图的控制信号;各分量的控制信号的控制向量与对应的特征图相乘,得到量化后即待编码的特征值。
如图7A所示,为本发明实施例提供的面向YUV码率分配的AI图像译码系统700,可用于视频图像编码器和解码器。如图7A所示,图像译码系统700包括编码端的第一信号分量处理模块(如Y分量处理模块)、第二信号分量处理模块(如UV分量处理模块)、码率分配控制模块、熵编码模块;以及解码端的第一信号分量处理模块(Y分量处理模块2)、第二信号分量处理模块(如UV分量处理模块2)、熵解码模块。图像译码系统700可选地包含联合处理模块、联合处理模块2、以及质量响应模块(也可称为码率分配控制模块2、或者码率控制模块2)。图像译码系统700中Y分量质量因子、UV分量质量因子输入码率分配控制模块,该模块输出控制信号分别作用于Y分量处理模块的Y分量特征图、UV分量处理模块的UV分量特征图(可以称为第一特征图),输出各信号分量的第二特征图,从而实现Y、UV的码率分配。随后,根据各信号分量的第二特征图,获得视频信号的码流。比如,Y分量处理模块、UV分量处理模块输出的特征图直接级联在一起,或Y分量处理模块、UV分量处理模块输出的特征图直接相加,形成编码器最终输出的特征图,对该特征图进行熵编码。可选的,Y分量处理模块、UV分量处理模块输出的特征图输入联合处理模块,得到编码器最终输出的特征图,对该特征图进行熵编码。
在图7A所示的架构中,图7B为一种编码方法的实施例。步骤701,根据第一信号分量的质量因子,获得所述第一信号分量的控制信号。步骤702,根据第二信号分量的质量因子,获得所述第二信号分量的控制信号。比如,图7B所示的实施例可以根据所述第一 信号分量的质量因子从N个候选第一控制信号中,获得所述第一信号分量的控制信号,其中N为大于1的整数;以及根据所述第二信号分量的质量因子从M个候选第二控制信号中,获得所述第二信号分量的控制信号,其中M为大于1的整数。N和M可以相等或者不等,本申请对此不作限定。
步骤703,把所述第一信号分量的控制信号作用于所述第一信号分量的第一特征图,获得所述第一信号分量的第二特征图。步骤704,把所述第二信号分量的控制信号作用于所述第二信号分量的第一特征图,获得所述第二信号分量的第二特征图。
比如,在一种实施例中所述控制信号由网络学习产生,作用于Y分量处理模块、UV分量处理模块中各模块中至少一层网络的特征图(可以称为第一特征图),然后输出第二特征图。例如,作用于最后一层网络的输出。
步骤705,根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流。
由于控制信号可以作用于Y分量处理模块、UV分量处理模块中各模块中任何一层网络的第一特征图,因此输出第二特征图之后,可以对第二特征图继续进行神经网络的处理。则相应地,根据Y分量的第二特征图和UV分量的第二特征图,获得所述视频信号的码流包括:
对所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流。
当图像译码系统700包含联合处理模块,根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流包括:
对所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,
对所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特 征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流。
具体说明如下:
当控制信号包括Y、UV特征图的控制向量:码率分配控制模块学习生成所述第一信号分量的N个候选第一控制信号(如控制向量矩阵{q yi,q y2,…q yi…q yN})、所述第二信号分量的M个候选第二控制信号(如控制向量矩阵{q uv1,q uv2,…q uvj,…q uvM}),使用时根据所述Y分量的质量因子的索引i得到所述第一信号分量的控制信号q yi,根据所述UV分量的质量因子的索引j得到所述第二信号分量的控制信号q uvj,其中N和M为大于1的整数。然后,控制向量通过与对应的特征图逐通道相乘实现不同分量特征图的控制。此时发送给解码端的视频信号的码流中包括所述Y分量的质量因子的索引i和所述UV分量的质量因子的索引j。
当控制信号包括Y、UV特征图的控制向量及偏移向量:如前述方法,根据所述Y分量的质量因子的索引i得到所述第一信号分量的控制向量q yi、偏移向量b yi;根据所述UV分量的质量因子的索引j得到所述第二信号分量的控制向量q uvj、偏移向量b uvi。然后,控制向量通过与对应的特征图逐通道相乘,然后加上对应的偏移向量来实现不同分量特征图的控制。
在另一种实施例中,Y分量的控制信号和UV分量的控制信号作为二元组,码率分配控制模块学习生成视频信号的N个候选控制信号(如控制向量矩阵{q c1,q c2,…q ci…q cN}),此时,c为2,每个控制向量q ci中既包括第一信号分量的控制向量,也包括第二信号分量的控制分量。然后通过视频信号的质量因子的索引i得到包含所述第一信号分量和所述第二信号分量的控制信号q ci。偏移向量类似实现,即视频信号的每个偏移向量中既包括第一信号分量的偏移分量,也包括第二信号分量的偏移分量。此时发送给解码端的视频信号的码流中包括所述视频信号的质量因子的索引i。
在再一种实施例中,Y分量质量因子、UV分量质量因子作为全连接网络的输入,输 出控制向量q yi、控制向量q uvj,通过控制向量与对应的特征图逐通道相乘,实现不同分量特征图的控制。Y分量质量因子、UV分量质量因子作为全连接网络的输入,还可以输出偏移向量b yi、偏移向量b uvj,然后通过控制向量与对应的特征图逐通道相乘,偏移向量与对应的特征图逐通道相加,实现不同分量特征图的控制。此时发送给解码端的视频信号的码流中包括所述Y分量的质量因子和所述UV分量的质量因子。
解码端对接收到的码流进行熵解码得到特征图,特征图分解为Y分量特征图、UV分量特征图。可选的,熵解码得到特征图先输入联合处理子模块2,得到Y分量特征图、UV分量特征图。
Y分量特征图、UV分量特征图分别输入Y分量处理模块2、UV分量处理模块2输出Y分量重建图、UV分量重建图。可选的,Y分量质量因子、UV分量质量因子输入质量响应模块,该模块输出响应信号分别作用于Y分量处理模块的Y分量特征图、UV分量处理模块的UV分量特征图,从而实现Y、UV分量的自适应质量响应。质量响应也可以称为质量控制,只是为了与编码端的质量控制进行区分,在解码端称为质量响应。
具体来说,以图7C为例,步骤711,解码端从编码端获得视频信号的码流,对所述码流进行熵解码以获得所述视频信号的第一信号分量(如Y分量)的特征图和所述视频信号的第二信号分量(如UV分量)的特征图。
解码端还会从码流中获得所述第一信号分量的质量因子信息和所述第二信号分量的质量因子信息,其中所述第一信号分量的质量因子信息为所述第一信号分量的质量因子或者所述第一信号分量的质量因子的索引,所述第二信号分量的质量因子信息为所述第二信号分量的质量因子或者所述第二信号分量的质量因子的索引。然后通过所述第一信号分量的质量因子信息,获得所述第一信号分量的响应信号;通过所述第二信号分量的质量因子信息,获得所述第二信号分量的响应信号。当所述第一信号分量的质量因子信息为所述第一信号分量的质量因子时,所述第一信号分量的质量因子取值为N个中的一个;当所述第一信号分量的质量因子信息为所述第一信号分量的质量因子的索引时,所述第一信号分量的质量因子的索引的取值范围为0至N-1或者1至N,其中N为大于1的整数。类似的,当所述第二信号分量的质量因子信息为所述第二信号分量的质量因子时,所述第二信号分量的质量因子取值为M个中的一个;当所述第二信号分量的质量因子信息为所述第二信号分量的质量因子的索引时,所述第二信号分量的质量因子的索引的取值范围为0至M-1或者1至M,其中M为大于1的整数。
当编码端传递过来的为联合的特征图时,解码端还需要对所述联合的特征图进行熵 解码,并经神经网络处理获得所述第一信号分量的特征图和所述第二信号分量的特征图。
步骤712,通过所述第一信号分量的质量因子信息,获得所述第一信号分量的响应信号。步骤713,通过所述第二信号分量的质量因子信息,获得所述第二信号分量的响应信号。
在一种实施例中,若所述码流中包括所述Y分量的质量因子的索引i和所述UV分量的质量因子的索引j,则解码端需要通过学习生成所述第一信号分量的响应信号矩阵{g y1,g y2,…g yi…g yN}、和所述第二信号分量的响应信号矩阵{g uv1,g uv2,…g uvj,…g uvM},其中N和M为大于1的整数;可选地,通过对编码端第一信号分量的控制信号矩阵{q y1,q y2,…q yi…q yN}取倒数得到所述第一信号分量的响应信号矩阵{g y1,g y2,…g yi…g yN}、编码端第一信号分量的控制信号矩阵{q uv1,q uv2,…q uvi…q uvN}取倒数得到和所述第二信号分量的响应信号矩阵{g uv1,g uv2,…g uvj,…g uvM},其中N和M为大于1的整数;根据所述Y分量的质量因子的索引i得到所述第一信号分量的响应信号g yi;根据所述UV分量的质量因子的索引j得到所述第二信号分量的响应信号g uvj
在另一种实施例中,若所述码流中包括所述视频信号的质量因子的索引i,则解码端需要通过学习生成所述视频信号的响应信号矩阵{g c1,g c2,…g ci…g cN},其中c为2代表Y分量和UV分量,N为大于1的整数;可选地,通过对编码端视频信号的控制信号矩阵{q c1,q c2,…q ci…q cN}取倒数,得到所述视频信号的响应信号矩阵{g c1,g c2,…g ci…g cN},其中c为2代表Y分量和UV分量,N为大于1的整数;根据所述视频信号的质量因子的索引i得到包含所述第一信号分量和所述第二信号分量的响应信号g ci
在再一种实施例中,若所述码流中包括所述第一信号分量的质量因子和第二信号分量的质量因子,则所述解码端将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的响应信号;将所述UV分量的质量因子作为全连接网络的输入,输出所述UV分量的响应信号。
步骤714,根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图。步骤715,根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图。
当所述响应信号包括响应向量,则根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图包括:
将所述第一信号分量的响应向量和所述第一信号分量的特征图相乘,获得所述第一信号分量的重建图;或将所述第一信号分量的响应向量和所述第一信号分量的特征图相 乘后并且再经神经网络处理,获得所述第一信号分量的重建图。
则根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图包括:
将所述第二信号分量的响应向量和所述第二信号分量的特征图相乘,获得所述第二信号分量的重建图;和将所述第二信号分量的响应向量和所述第二信号分量的特征图相乘后并且再经神经网络处理,获得所述第二信号分量的重建图。
当所述响应信号包括响应向量和偏移向量,则根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图包括:
将所述第一信号分量的响应向量与所述第一信号分量的特征图相乘,再与所述第一信号分量的偏移向量相加,获得所述第一信号分量的重建图;或将所述第一信号分量的响应向量与所述第一信号分量的特征图相乘,再与所述第一信号分量的偏移向量相加后并且再经神经网络处理,获得所述第一信号分量的重建图。
则根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图包括:
将所述第二信号分量的响应向量与所述第二信号分量的特征图相乘,再与所述第二信号分量的偏移向量相加,获得所述第二信号分量的重建图;或将所述第二信号分量的响应向量与所述第二信号分量的特征图相乘,再与所述第二信号分量的偏移向量相加,并且再经神经网络处理,获得所述第二信号分量的重建图。
步骤716根据所述第一信号分量的重建图和所述第二信号分量的重建图,重建所述视频信号。
如图7D所示,为本发明实施例提供的面向YUV码率分配的AI图像译码系统710,可用于视频图像编码器和解码器。如图7D所示,图像译码系统710包括编码端的第一信号分量处理模块(如Y分量处理模块)、第二信号分量处理模块(如U分量处理模块)、第三信号分量处理模块(如V分量处理模块)、码率分配控制模块、熵编码模块;以及解码端的第一信号分量处理模块(Y分量处理模块2)、第二信号分量处理模块(如U分量处理模块2)、第三信号分量处理模块(如V分量处理模块2)、熵解码模块。图像译码系统710可选地包含联合处理模块、联合处理模块2、以及质量响应模块(也可称为码率分配控制模块2、或者码率控制模块2)。图像译码系统710中Y分量质量因子、U分量质量因子以及V分量质量因子输入码率分配控制模块,该模块输出控制信号分别作用于Y分量处理模块的Y分量特征图、U分量处理模块的U分量特征图、以及V分量处理模块的V分量特征图(可以称为 第一特征图),输出各信号分量的第二特征图,从而实现Y、U、V的码率分配。随后,根据各信号分量的第二特征图,获得视频信号的码流。比如,Y分量处理模块、U分量处理模块、V分量处理模块输出的特征图直接级联在一起,或Y分量处理模块、U分量处理模块、V分量处理模块输出的特征图直接相加,形成编码器最终输出的特征图,对该特征图进行熵编码。可选的,Y分量处理模块、U分量处理模块和V分量处理模块输出的特征图输入联合处理模块,得到编码器最终输出的特征图,对该特征图进行熵编码。
图7D所示实施例可以根据所述第一信号分量(Y分量)的质量因子从N个候选第一控制信号中,获得所述第一信号分量的控制信号;根据所述第二信号分量(U分量)的质量因子从M个候选第二控制信号中,获得所述第二信号分量的控制信号;根据所述第三信号分量(V分量)的质量因子从L个候选第三控制信号中,获得所述第三信号分量的控制信号。N,M和L为大于1的整数,可以相等或者不等,本申请对此不作限定。
在图7D所示的架构中,编解码方法与图7B和7C类似,说明如下:
比如,在一种实施例中所述控制信号由网络学习产生,作用于Y分量处理模块、U分量处理模块、V分量处理模块中各模块中至少一层网络的特征图(可以称为第一特征图),然后输出第二特征图。例如,作用于最后一层网络的输出。由于控制信号可以作用于Y分量处理模块、U分量处理模块中、V分量处理模块中各模块中任何一层网络的第一特征图,因此输出第二特征图之后,可以对第二特征图继续进行神经网络的处理。则相应地,根据Y分量的第二特征图、U分量的第二特征图和V分量的第二特征图,获得所述视频信号的码流包括:
对所述第一信号分量的第二特征图、所述第二信号分量的第二特征图和所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对所述第一信号分量的第二特征图、经神经网络处理的所述第二信号分量的第二特征图,和所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对所述第一信号分量的第二特征图、经神经网络处理的所述第二信号分量的第二特征图,和经神经网络处理的所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对所述第一信号分量的第二特征图、所述第二信号分量的第二特征图,和经神经网络处理的所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图、所述第二信号分量的第二特征图和所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图、经神经网络处理的所述第二信号分量的第二特征图,和所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者
对经神经网络处理的所述第一信号分量的第二特征图、经神经网络处理的所述第二信号分量的第二特征图,和经神经网络处理的所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
对经神经网络处理的所述第一信号分量的第二特征图、所述第二信号分量的第二特征图,和经神经网络处理的所述第三信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流。
当图像译码系统710包含联合处理模块时,还可以对前述获得的第二特征图或者经过经过处理的特征图以及组合,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流。
具体说明如下:
当控制信号包括Y、U、V特征图的控制向量:码率分配控制模块学习生成所述第一信号分量的N个候选第一控制信号(如控制向量矩阵{q y1,q y2,…q yi…q yN})、所述第二信号分量的M个候选第二控制信号(如控制向量矩阵{q u1,q u2,…q uj…,q uM})和所述第三信号分量的L个候选第三控制信号(如控制向量矩阵{q v1,q v2,…,q vk…q vL}),使用时根据所述Y分量质量因子的索引i得到所述第一信号分量的控制信号q yi;根据所述U分量质量因子的索引j得到所述第二信号分量的控制信号q uj;以及根据所述V分量质量因子的索引k得到所述第三信号分量的控制信号q vk。然后,控制向量通过与对应的特征图逐通道相乘实现不同分量特征图的控制。此时发送给解码端的视频信号的码流中包括所述Y分量的质量因子的索引i、所述U分量的质量因子的索引j、和所述V分量质量因子的索引k。
当控制信号包括Y、U、V特征图的控制向量及偏移向量:如前述方法,使用时使用时根据所述Y分量质量因子的索引i得到所述第一信号分量的控制向量q yi、偏移向量b yi;根据所述U分量质量因子的索引j得到所述第二信号分量的控制向量q uj、偏移向量b ui;以及根据所述V分量质量因子的索引k得到所述第三信号分量的控制向量q vk、偏移向量 b vi。然后,控制向量通过与对应的特征图逐通道相乘,然后加上对应的偏移向量来实现不同分量特征图的控制。
在另一种实施例中,Y分量的控制信号、U分量、V分量的控制信号作为三元组,码率分配控制模块学习生成视频信号的N个候选控制信号(如控制向量矩阵{q c1,q c2,…q ci…q cN}),此时,c为3,每个控制向量q ci中既包括第一信号分量的控制分量,也包括第二信号分量的控制向量,还包括第三信号分量的控制向量。然后通过视频信号的质量因子的索引i得到包含所述第一信号分量、所述第二信号分量和第三信号分量的控制信号q ci。偏移向量类似实现,即视频信号的每个偏移向量中既包括第一信号分量的偏移分量,也包括第二信号分量的偏移分量,还包括第三信号分量的偏移分量。此时发送给解码端的视频信号的码流中包括所述视频信号的质量因子的索引i
在再一种实施例中,Y分量质量因子、U分量质量因子、V分量质量因子作为全连接网络的输入,输出控制向量q yi、控制向量q uj、和控制向量q vk,通过控制向量与对应的特征图逐通道相乘,实现不同分量特征图的控制。Y分量质量因子、U分量质量因子、V分量质量因子作为全连接网络的输入,还可以输出偏移向量b yi、偏移向量b uj,和偏移向量b vk,然后通过控制向量与对应的特征图逐通道相乘,偏移向量与对应的特征图逐通道相加,实现不同分量特征图的控制。此时发送给解码端的视频信号的码流中包括所述视频信号的码流中包括所述Y分量的质量因子、所述U分量的质量因子和所述V分量的质量因子。
解码端对接收到的码流进行熵解码得到特征图,特征图分解为Y分量特征图、U分量特征图、和V分量特征图。可选的,熵解码得到特征图先输入联合处理子模块2,得到Y分量特征图、U分量特征图、V分量特征图。
Y分量特征图、U分量特征图、V分量特征图分别输入Y分量处理模块2、U分量特征图2、V分量处理模块2输出Y分量重建图、U分量重建图、V分量重建图。可选的,Y分量质量因子、U分量质量因子、V分量质量因子输入质量响应模块,该模块输出控制信号分别作用于Y分量处理模块的Y分量特征图、U分量处理模块的U分量特征图、V分量处理模块的V分量特征图,从而实现Y、U、V分量的自适应质量响应。
所述响应信号产生方式与控制信号类似,只是便于区分,编码端称为控制信号,解码端称为响应信号。
具体来说,解码端从编码端获得视频信号的码流;对所述码流进行熵解码以获得所述视频信号的第一信号分量(如Y分量)的特征图、所述视频信号的第二信号分量(如U 分量)的特征图、和所述视频信号的第二信号分量(如V分量)的特征图;根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图,其中所述第一信号分量的响应信号通过学习获得;根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图,其中所述第二信号分量的响应信号通过学习获得;根据所述第三信号分量的响应信号和所述第三信号分量的特征图,获得所述第三信号分量的重建图,其中所述第三信号分量的响应信号通过学习获得,以及根据所述第一信号分量的重建图、所述第二信号分量的重建图、和所述第三信号分量的重建图,重建所述视频信号。
解码端还会从码流中获得所述第一信号分量的质量因子信息、所述第二信号分量的质量因子信息和所述第三信号分量的质量因子信息。其中所述第一信号分量和第二信号分量的质量因子信息与图7A实施例类似。类似的,所述第三信号分量的质量因子信息也可以为所述第三信号分量的质量因子或者所述第三信号分量的质量因子的索引。然后通过所述第三信号分量的质量因子信息,获得所述第三信号分量的响应信号。当所述第三信号分量的质量因子信息为所述第三信号分量的质量因子时,所述第三信号分量的质量因子取值为L个中的一个;当所述第三信号分量的质量因子信息为所述第三信号分量的质量因子的索引时,所述第一信号分量的质量因子的索引的取值范围为0至L-1或者1至L,其中L为大于1的整数。L,M,N可以相等或者不等,本申请对此不作限定。
当编码端传递过来的为联合的特征图时,解码端还需要对所述联合的特征图进行熵解码,并经神经网络处理获得所述第一信号分量的特征图、所述第二信号分量的特征图、和所述第三信号分量的特征图。
在一种实施例中,若所述码流中包括所述Y分量的质量因子的索引i所述U分量的质量因子的索引j,和所述V分量的质量因子的索引k,则解码端需要通过学习生成所述第一信号分量的响应信号矩阵{g y1,g y2,…g yi…g yN}、所述第二信号分量的响应信号矩阵{g u1,g u2,…g uj…,g uM}、所述第三信号分量的响应信号矩阵{g v1,g v2,…,g vk…g vL},其中N,M和L为大于1的整数;可选地,通过对编码端第一信号分量的控制信号矩阵{q y1,q y2,…q yi…q yN}取倒数得到所述第一信号分量的响应信号矩阵{g y1,g y2,…g yi…g yN},编码端第二信号分量的控制信号矩阵{q u1,q u2,…q ui…q uM}取倒数得到所述第二信号分量的响应信号矩阵{g u1,g u2,…g uj,…g uM},和编码端第三信号分量的控制信号矩阵{q v1,q v2,…q vi…q vL}取倒数得到所述第三信号分量的响应信号矩阵{g v1,g v2,…,g vk…g vL},其中N,M和L为大于1的整数;根据所述Y分量质量因子的索引 i得到所述第一信号分量的响应信号g yi;根据所述U分量质量因子的索引j得到所述第二信号分量的响应信号g uj;以及根据所述V分量质量因子的索引k得到所述第三信号分量的响应信号g vk
在另一种实施例中,若所述码流中包括所述视频信号的质量因子的索引i,则解码端需要通过学习生成所述视频信号的响应信号矩阵{g c1,g c2,…g ci…g cN},其中c为3代表Y分量、U分量和V分量,N为大于1的整数;可选地,通过对编码端视频信号的控制信号矩阵{q c1,q c2,…q ci…q cN}取倒数,得到所述视频信号的响应信号矩阵{g c1,g c2,…g ci…g cN},其中c为3代表Y分量、U分量和V分量,N为大于1的整数;根据所述视频信号的质量因子的索引i得到包含所述第一信号分量、所述第二信号分量和所述第三信号分量的响应信号g ci
在再一种实施例中,若所述码流中包括所述第一信号分量的质量因子、所述第二信号分量的质量因子和所述第三信号分量的质量因子,则所述解码端将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的响应信号;将所述U分量的质量因子作为全连接网络的输入,输出所述U分量的响应信号;将所述V分量的质量因子作为全连接网络的输入,输出所述V分量的响应信号。
获得所述第一信号分量和第二信号分量的重建图如图7A类似,此处不再赘述。
在一种实施例中,当所述响应信号包括响应向量,则根据所述第三信号分量的响应信号和所述第三信号分量的特征图,获得所述第三信号分量的重建图包括:
将所述第三信号分量的响应向量和所述第三信号分量的特征图相乘,获得所述第三信号分量的重建图;或将所述第三信号分量的响应向量和所述第三信号分量的特征图相乘后并且再经神经网络处理,获得所述第三信号分量的重建图。
在再一种实施例中,当所述响应信号包括响应向量和偏移向量,则根据所述第三信号分量的响应信号和所述第三信号分量的特征图,获得所述第三信号分量的重建图包括:
将所述第三信号分量的响应向量与所述第三信号分量的特征图相乘,再与所述第三信号分量的偏移向量相加,获得所述第三信号分量的重建图;或将所述第三信号分量的响应向量与所述第三信号分量的特征图相乘,再与所述第三信号分量的偏移向量相加后并且再经神经网络处理,获得所述第三信号分量的重建图。
实施例7A对UV分量组合合并处理,7D对Y、U、V三个分量单独进行处理,也可以进行组合合并处理,例如组合成Y、UV两种分量,或Y、U、V的其他组合。
图8A和8B给出一种具体的实施例。本实施例的技术方案整体框图如图8A所示, 其中,Y分量质量因子、U V分量质量因子输入码率分配控制模块,该模块输出控制向量q yi、控制向量q uvi分别作用于Y分量处理模块输出的Y分量特征图、UV分量处理模块输出的UV分量特征图,从而实现YUV分量的码率分配。
解码端通过Y分量质量因子、U V分量质量因子输入质量响应模块,该模块输出控制向量g yi、控制向量g uvi分别作用于Y分量特征图、UV分量特征图,从而实现YUV分量的各自质量增益相应。
本实施例不对Y分量处理模块、UV分量处理模块、联合处理模块、概率估计模块、Y分量处理模块2、UV分量处理模块2、联合处理模块2的具体网络结构做约束,为便于理解图8B给出一种具体示例。
第一步,获取Y、UV分量的特征图:
将Y、UV分量分别输入Y分量处理模块、UV分量处理模块,网络输出Y、UV分量的特征图。以图8B为例,Y分量处理模块包含两层卷积层和两层非线性层,两个卷积层中水平、垂直方向下采样因子均为2,Y分量处理模块输出Y分量特征图。UV分量处理模块包含两层卷积层和两层非线性层,第一层卷积层水平和垂直方向下采样因子为1,即不进行无下采样操作。UV分量处理模块第二层卷积层水平和垂直方向下采样因子为2。UV分量处理模块输出UV分量特征图。经过上述网络处理后,对于YUV420的数据格式,Y分量特征图与UV分量特征图宽、高相同。
类似YUV420数据格式的处理方式,对于YUV444、YUV422等数据格式,通过控制卷积层数目、水平和垂直方向的下采样因子使得Y分量特征图与UV分量特征图宽、高相同。
第二步,Y分量质量因子、U V分量质量因子输入码率分配模块获取控制向量q yi、控制向量q uvi,控制向量q yi、控制向量q uvi与Y、UV分量的特征图逐通道相乘得到处理后的Y、UV分量的特征图,处理后的Y、UV分量的特征图相加或级联在一起,输入联合处理模块,输出待编码的特征图。
码率分配模块由控制矩阵Q y和Q uv组成,将Y分量质量因子、U V分量质量因子作为控制矩阵Q y和Q uv的索引值,从Q y和Q uv索引得到控制向量q yi、控制向量q uvi。其中,控制矩阵Q y和Q uv经网络学习得到。其中,Y分量质量因子、U V分量质量因子为设定的任意值。
以图8B为例,控制矩阵Q y为KxN大小的二维矩阵,控制矩阵Q uv为LxM大小的二维矩阵,两个矩阵中的每个元素为网络可学习的参数。其中K表示Y分量特征图数目,L表示UV分量特征图数目,N表示N组Y分量质量因子候选值,M表示M组UV分量质量 因子候选值。以N为4、M为4为例,Y分量质量因子可选值为{0.5,0.7,0.8,1.0},UV分量质量因子可选值为{0.15,0.2,0.25,0.3}。
第三步,待编码的特征图输入熵编码模块,输出码流。以图8B为例,待编码的特征图输入编码的特征图Hyper Entropy模块,输出待编码符号的概率分布。基于所述待编码符号的概率分布进行算术编码,输出码流。同时,Y分量质量因子、U V分量质量因子信息写入码流。
Y分量质量因子、U V分量质量因子信息可采用如下三种方式进行表达写入码流:
方案一:预定义Y分量和UV分量质量因子候选值数目及候选值,将Y分量质量因子、UV分量质量因子在各自候选列表中的索引号传递至解码端。以N为4、M为3为例,Y分量质量因子可选值为{0.5,0.7,0.8,1.0},UV分量质量因子可选值为{0.15,0.2,0.25},将Y分量索引号i,UV分量索引号j写入码流,i,j取值为0,1,2,3,当i为1时,表示Y分量质量因子为0.7,当j为0时,表示UV分量质量因子为0.15。
方案二:预定义Y分量和UV分量质量因子组合后的候选值数目及候选值,例如,Y、UV分量质量因子组合值的候选数目为6,候选列表为{(0.5,0.25)、(0.7,0.15)、(0.7,0.25)、(0.8,0.1)、(0.8,0.2)、(1.0,0.2)},将索引号i写入码流,i取值为0,1,2,3,4,5,当i为1时,表示Y、U、V分量质量因子为(0.7,0.15)。
方案三:直接将Y分量质量因子和UV分量质量因子写入码流传递至解码端,例如,将(1.0,0.2)写入码流。
第四步,码流输入熵解码模块,进行算术解码得到特征图及Y分量质量因子、U V分量质量因子。以图8B为例,基于Hyper Entropy模块估计的概率分布进行算术解码。
第五步,解码得到特征图输入联合处理模块2,输出通道数为M的特征图,将通道数为M的特征图拆分为通道数为K的Y分量特征图、通道数为L的UV分量特征图。拆分方案保证K≤M,L≤M即可。其中,当K=L=M时,表示将Y分类特征图与UV分量特征图相同,均为通道数为M的特征图。以图8B为例,联合处理模块2包含二层卷积层,一层非线性层。
第六步,Y分量质量因子、U V分量质量因子输入质量响应模块获取响应向量g yi、响应向量g uvi,响应向量g yi、响应向量g uvi与Y、UV分量的特征图逐通道相乘得到质量增益后的Y、UV分量的特征图。所述质量增益后的Y、UV分量的特征图分别输入Y分量处理模块2、UV分量处理模块2,输出Y分量重建图、UV分量重建图。
质量响应模块由响应矩阵G y和G uv组成,将解码得到的Y分量质量因子、U V分量 质量因子作为响应矩阵G y和G uv的索引值,从G y和G uv索引得到响应向量g yi、响应向量g uvi
其中,响应矩阵G y和G uv经网络学习得到。以图8B为例,响应矩阵G y为KxN大小的二维矩阵,响应矩阵G uv为LxM大小的二维矩阵,两个矩阵中的每个元素为网络可学习的参数。其中K表示Y分量特征图数目,L表示UV分量特征图数目,N表示N组Y分量质量因子候选值,,M表示M组UV分量质量因子候选值。
可选地,对控制矩阵Q y和Q uv分别取倒数得到响应矩阵G y和G uv
本实施例对上述步骤1至步骤7涉及的网络模块及控制矩阵参数进行训练学习。具体的,本申请使用自适应矩估计(Adaptive Moment Estimation,Adam)优化算法对神经网络进行优化训练,ImgeNet数据集为训练数据集。由于本网络结构面向图像编码,训练优化目标为最小化码率-失真联合损失函数,其函数表达式为:
Figure PCTCN2022072627-appb-000001
其中,p(y)代表概率估计器估计的概率分布,x y为Y分量原始值,x y′为Y分量重建图,x u为U分量原始值,x u′为U分量重建图,x v为V分量原始值,x v′为V分量重建图,w y为Y分量的质量因子,w u为U分量的质量因子,w v为V分量的质量因子。λ为常数,匹配目标码率。
以N为4为例,Y、U、V分量质量因子(w y,w u,w v)可选值为{(0.5,0.25,0.25)、(0.7,0.4,0.4)、(0.8,0.1,0.1)、(1.0,0.2,0.2)}。网络训练时,权重组索引号i从{0,1,2,3}随机选取,根据i确定权重值组(w yi,w ui,w vi)及待学习的控制向量q yi、控制向量q uvi,根据优化目标对网络模块及控制矩阵参数进行训练学习。
本实施例中提及编解码器中去除联合处理模块或/和联合处理模块2或/和质量响应模块时,本申请其他实施例中依然适用。
本实施例给出Y、U、V分量组合为Y、UV两种分量的情况下的技术方案,对Y、U、V的其他组合例如{YU、V}、{YV、U},本申请的解决思路依然适用。
同样,将UV分量进一步拆分为U分量、V分量分别处理,本申请的解决思路依然适用。
在现有端到端图像编码中,对于某一特定网络学习优化时,按固定Y、U、V分量权重值进行优化,因此,YUV分量的码率固定。由于不同图像色彩特性不同,固定码率分配会导致部分视频图像内容上编码性能较差的现象。可简单通过按照多组不同Y、U、V分量权重值训练多个模型,以实现YUV分量不同码率分配,但这样会增加模型数目,且训 练多个模型花费大量计算资源及时间。相比于现有技术,本申请基于Y、U、V分量权重值导出网络学习得到的控制向量,根据控制向量对Y、UV分量的特征图进行不同程度的失真控制,实现Y、UV分量的码率分配。从而本申请具有以下优势:
1)支持YUV分量间的码率分配,达到适配不同色彩特性的图像内容。
2)减少训练多个模型的时间花费,同时也减少模型新增的网络参数量。
图9A和9B给出一种具体的实施例。在图9A和9B实施例的基础上,本实施例采用U分量处理模块、V分量处理模块分别处理U分量、V分量数据。本实施例,将Y分量质量因子、U分量质量因子、V分量质量因子作为码率分配控制模块的输入,输出控制信号对Y分量处理模块、U分量处理模块、V分量处理模块中的任意层特征图进行处理,实现Y、U、V分量间的码率分配。解码端,将Y分量质量因子、U分量质量因子、V分量质量因子作为质量响应模块的输入,输出控制信号对Y分量处理模块、U分量处理模块、V分量处理模块中的任意层特征图进行质量增益响应。本申请不对码率分配控制模块、质量响应模块、Y分量处理模块、U分量处理模块、联合处理模块、Y分量处理模块2、U分量处理模块2、V分量处理模块2、联合处理模块2、熵编码模块、熵解码模块的具体网络结构做约束,为便于理解图9A给出一种具体示例。
以图9B为例,第一步,Y分量质量因子、U分量质量因子、V分量质量因子输入码率分配控制模块,该模块由全连接网络组成,模块输出控制信号:控制向量、偏移向量。
第二步,待编码Y、U、V信号分别输入Y分量处理模块、U分量处理模块、V分量处理模块。以Y分量处理模块为例,该模块中每个卷积层输出的特征图与其相对应的控制向量逐通道相乘,然后与其相对应的偏移向量逐通道相加。模块中非线性层输出的特征图与其相对应的控制向量逐通道相乘。Y分量处理模块中每层网络的输出均经码率控制模块的控制信号进行处理。U、V分量的处理方式与Y分量类似。
图9B给出一种具体网络结构的示意图,Y分量处理模块、U分量处理模块、V分量处理模块的网络结构除第一层卷积层外,其他网络层均相同。对于YUV422格式,Y分量处理模块中第一层卷积层水平和垂直方向下采样因子为2。U分量处理模块、V分量处理模块中第一层卷积层水平下采样因子为1即不进行下采样操作,垂直方向下采样因子为2。对于YUV420格式,Y分量处理模块中第一层卷积层水平和垂直方向下采样因子为2。U分量处理模块、V分量处理模块中第一层卷积层水平和垂直方向下采样因子为1,即均不进行下采样操作。
第三步,将Y分量特征图、U分量特征图、V分量特征图级联拼接在一起形成待编 码的特征图,输入熵编码模块,输出码流。以图9B为例,待编码的特征图输入编码的特征图Hyper Entropy模块,输出待编码符号的概率分布。基于所述待编码符号的概率分布进行算术编码,输出码流。同时,Y分量质量因子、U分量质量因子、U分量质量因子信息写入码流。
第四步,码流输入熵解码模块,进行算术解码得到特征图及Y分量质量因子、U分量质量因子、V分量质量因子信息。以图9B为例,基于Hyper Entropy模块估计的概率分布进行算术解码。
第六步,将解码得到的特征图输入联合处理模块2,输出的特征图。
第七步,Y分量质量因子、U分量质量因子、V分量质量因子输入质量响应模块获取响应向量g yi、响应向量g ui、响应向量g vi。响应向量g yi与Y分量处理模块2的中第二层卷积层输出的特征图逐通道相乘得到质量增益后的特征图,U、V分量处理过程类似。Y分量处理模块2、U分量处理模块2、V分量处理模块2输出Y、U、V分量的重建图。
Y分量质量因子、U分量质量因子、V分量质量因子输入质量响应模块,该模块由全连接网络组成,模块输出响应向量g yi、响应向量g ui、响应向量g vi
可选的,类似实施例一响应向量获取方式,质量响应模块由响应矩阵G y、G u和G v组成,将解码得到的Y分量质量因子、U分量质量因子、V分量质量因子作为响应矩阵G y、G u和G v的索引值,从G y、G u和G v索引得到响应向量g yi、响应向量g ui、响应向量g vi。其中,响应矩阵G y、G u和G v经网络学习得到。
网络训练过程与图8A和8B实施例类似,不再赘述。
本实施例中控制信号作用于Y分量处理模块、U分量处理模块、V分量处理模块中每层网络的输出,可选的,控制信号只作用于Y分量处理模块、U分量处理模块、V分量处理模块中部分网络的输出。
本实施例中响应信号只作用于Y分量处理模块、U分量处理模块、V分量处理模块中间一层网络的输出,可选的,控制信号作用于Y分量处理模块、U分量处理模块、V分量处理模块中任意一层或多层网络的输出。
本实施例中提及编解码器去除联合处理模块2或/和质量响应模块时,本申请技术依然适用。
本实施例中提及编解码器中增加联合处理模块时,本申请技术依然适用。
本实施例给出Y、U、V作为三种分量分别处理的情况下的技术方案,对Y、U、V的其他组合例如{YU、V}、{YV、U}、{Y、UV},本申请技术的解决思路依然适用。
根据前面实施例的描述,本申请中本发明中YUV不同分量的质量因子输入码率分配控制模块,该模块输出控制信号分别作用于不同分量的特征图,从而实现不同分量的码率分配。所述不同分量可指Y、U、V三种分量,Y、UV两种分量,或Y、U、V的其他组合。
可选的,所述控制信号指控制向量q i,根据不同分量的质量因子产生:网络学习得到同分量的权重矩阵{q c1,q c2,…,q cN},其中c为2或3代表不同分量数目,N为质量因子候选值数目。使用时根据不同分量的质量因子索引得到不同分量相应的控制向量q ci
可选的,所述控制信号指控制向量q、偏移向量b,不同分量的质量因子作为全连接网络的输入,输出不同分量相应的控制向量q、偏移向量b。
因此,本申请提供的实施例能够:
1)适配不同色彩特性的图像内容,通过控制向量支持YUV分量间的码率分配。
2)减少训练多个模型的时间花费,同时也减少模型新增的网络参数量。
图10是示出根据本申请一种实施例的编码装置1000的结构示意图。该编码装置可以对应于视频编码器20。该编码装置1000包括第一控制模块1001、第一控制模块1002和编码模块1003。其中,第一控制模块1001,用于把所述视频信号的第一信号分量的控制信号作用于所述第一信号分量的第一特征图,获得所述第一信号分量的第二特征图,其中所述第一信号分量的控制信号通过学习获得;第二控制模块1002,用于把所述视频信号的第二信号分量的控制信号作用于所述第二信号分量的第一特征图,获得所述第二信号分量的第二特征图,其中所述第二信号分量的控制信号通过学习获得;以及编码模块1003,用于根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流。该编码装置1000还可以包括前面实施例介绍的码率分配控制模块。编码装置1000用于实现前述实施例介绍的编码方法,详细的功能参见前面实施例的描述,此处不再重复描述。
图11是示出根据本申请一种实施例的解码装置1100的结构示意图。该解码装置1100可以对应于视频解码器30。该解码装置1100包括解码模块1101、第一控制模块1102、第一控制模块1103和重建模块1104。其中,解码模块1101,用于获得所述视频信号的码流,对所述码流进行熵解码以获得所述视频信号的第一信号分量的特征图和所述视频信号的第二信号分量的特征图;第一控制模块1102,用于根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图,其中所述第一信号分量的响应信号通过学习获得;第二控制模块1103,用于根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图,其中所述第二 信号分量的响应信号通过学习获得;以及重建模块1104,用于根据所述第一信号分量的重建图和所述第二信号分量的重建图,重建所述视频信号。该解码装置1100还可以包括前面实施例介绍的质量响应模块。解码装置1100用于实现前述实施例介绍的解码方法,详细的功能参见前面实施例的描述,此处不再重复描述。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/ 或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (41)

  1. 一种视频信号的编码方法,其特征在于,包括:
    把所述视频信号的第一信号分量的控制信号作用于所述第一信号分量的第一特征图,获得所述第一信号分量的第二特征图,其中所述第一信号分量的控制信号通过学习获得;
    把所述视频信号的第二信号分量的控制信号作用于所述第二信号分量的第一特征图,获得所述第二信号分量的第二特征图,其中所述第二信号分量的控制信号通过学习获得;以及
    根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    根据所述第一信号分量的质量因子从N个候选第一控制信号中,获得所述第一信号分量的控制信号,其中N为大于1的整数;以及
    根据所述第二信号分量的质量因子从M个候选第二控制信号中,获得所述第二信号分量的控制信号,其中M为大于1的整数。
  3. 如权利要求1或2所述的方法,其特征在于,根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流包括:
    对所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
    对所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
    对经神经网络处理的所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流;或者,
    对经神经网络处理的所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行熵编码,以获得所述视频信号的码流。
  4. 如权利要求1或2所述的方法,其特征在于,根据所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,获得所述视频信号的码流包括:
    对所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,
    对所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特 征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,
    对经神经网络处理的所述第一信号分量的第二特征图和所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流;或者,
    对经神经网络处理的所述第一信号分量的第二特征图和经神经网络处理的所述第二信号分量的第二特征图,进行联合处理,获得联合的特征图,对所述联合的特征图进行熵编码,以获得所述视频信号的码流。
  5. 如权利要求1-4任一所述的方法,其特征在于,所述第一信号分量为Y分量,所述第二信号分量为UV分量,U分量或V分量。
  6. 如权利要求5所述的方法,其特征在于,当所述第二信号分量为UV分量时,所述方法包括:
    通过学习生成所述Y分量的控制信号矩阵{q y1,q y2,...q yi...q yN}和所述UV分量的控制信号矩阵{q uv1,q uv2,...q uvj,...q uvM},其中N和M为大于1的整数;
    根据所述Y分量的质量因子的索引i得到所述第一信号分量的控制信号q yi
    根据所述UV分量的质量因子的索引j得到所述第二信号分量的控制信号q uvj
  7. 如权利要求5所述的方法,其特征在于,当所述第二信号分量为UV分量时,所述方法包括:
    通过学习生成所述视频信号的控制信号矩阵{q c1,q c2,...q ci...q cN},其中c为2代表Y分量和UV分量,N为大于1的整数;
    根据所述视频信号的质量因子的索引i得到包含所述第一信号分量和所述第二信号分量的控制信号q ci
  8. 如权利要求6或7所述的方法,其特征在于,所述视频信号的码流中包括所述Y分量的质量因子的索引i和所述UV分量的质量因子的索引j;或包括所述视频信号的质量因子的索引i。
  9. 如权利要求5所述的方法,其特征在于,当所述第二信号分量为UV分量时,所述方法包括:
    将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的控制信号;
    将所述UV分量的质量因子作为全连接网络的输入,输出所述UV分量的控制信号。
  10. 如权利要求9所述的方法,其特征在于,所述视频信号的码流中包括所述Y分 量的质量因子和所述UV分量的质量因子。
  11. 如权利要求5所述的方法,其特征在于,当所述第二信号分量为U分量或者V分量时,所述方法还包括:
    把所述视频信号的第三信号分量的控制信号作用于所述第三信号分量的第一特征图,获得所述第三信号分量的第二特征图,其中所述第三信号分量的控制信号通过学习获得,其中当所述第二信号分量为U分量时,所述第三信号分量为V分量,或当所述第二信号分量为V分量,所述第三信号分量为U分量。
  12. 如权利要求11所述的方法,其特征在于,所述方法包括:
    通过学习生成所述Y分量的控制信号矩阵{q y1,q y2,...q yi...q yN}、所述U分量的控制信号矩阵{q u1,q u2,...q uj...,q uM}、所述V分量的控制信号矩阵{q v1,q v2,...,q vk...q vL},其中N,M和L为大于1的整数;
    根据所述Y分量质量因子的索引i得到所述第一信号分量的控制信号q yi
    根据所述U分量质量因子的索引j得到所述第二信号分量的控制信号q uj;以及
    根据所述V分量质量因子的索引k得到所述第三信号分量的控制信号q vk
  13. 如权利要求11所述的方法,其特征在于,所述方法包括:
    通过学习生成所述视频信号的控制信号矩阵{q c1,q c2,...q ci...q cN},其中c为3代表Y分量、U分量和V分量,N为大于1的整数;
    根据所述视频信号的质量因子的索引i得到包含所述第一信号分量、所述第二信号分量和所述第三信号分量的控制信号q ci
  14. 如权利要求12或13所述的方法,其特征在于,所述视频信号的码流中包括所述Y分量的质量因子的索引i、所述U分量的质量因子的索引j、和所述V分量质量因子的索引k;或包括所述视频信号的质量因子的索引i。
  15. 如权利要求11所述的方法,其特征在于,所述方法包括:
    将所述Y分量的质量因子作为全连接网络的输入,输出Y分量的控制信号;
    将所述U分量的质量因子作为全连接网络的输入,输出U分量的控制信号;以及
    将所述V分量的质量因子作为全连接网络的输入,输出V分量的控制信号。
  16. 如权利要求15所述的方法,其特征在于,所述视频信号的码流中包括所述Y分量的质量因子、所述U分量的质量因子和所述V分量的质量因子。
  17. 如权利要求1-16任一所述的方法,其特征在于,当所述控制信号包括控制向量,所述方法包括:
    所述第一信号分量的控制向量和所述第一信号分量的第一特征图相乘,获得所述第一信号分量的第二特征图;
    所述第二信号分量的控制向量和所述第一信号分量的第二特征图相乘,获得所述第二信号分量的第二特征图。
  18. 如权利要求1-16任一所述的方法,其特征在于,当所述控制信号包括控制向量和偏移向量,所述方法包括:
    所述第一信号分量的控制向量和所述第一信号分量的第一特征图相乘,然后加上所述第一信号分量的偏移向量,获得所述第一信号分量的第二特征图;
    所述第二信号分量的控制向量和所述第一信号分量的第二特征图相乘,然后加上所述第二信号分量的偏移向量,获得所述第二信号分量的第二特征图。
  19. 如权利要求1-18任一所述的方法,其特征在于,所述第一信号分量的第一特征图为所述第一信号分量通过至少一层卷积和/或至少一层非线性处理获得;
    所述第二信号分量的第一特征图为所述第二信号分量通过至少一层卷积和/或至少一层非线性处理获得。
  20. 如权利要求19所述的方法,其特征在于,所述第一信号分量的第一特征图为所述第一信号分量通过两层卷积层,下采样因子均为2,两层非线性层获得;
    所述第二信号分量的第一特征图为所述第二信号分量通过一层卷积层,无下采样操作,一层卷积层,下采样因子为2,两层非线性层获得。
  21. 一种视频信号的解码方法,其特征在于,包括:
    获得所述视频信号的码流;
    对所述码流进行熵解码以获得所述视频信号的第一信号分量的特征图和所述视频信号的第二信号分量的特征图;
    根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图,其中所述第一信号分量的响应信号通过学习获得;
    根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图,其中所述第二信号分量的响应信号通过学习获得;以及
    根据所述第一信号分量的重建图和所述第二信号分量的重建图,重建所述视频信号。
  22. 如权利要求21所述的方法,其特征在于,所述码流中还包括所述第一信号分量的质量因子信息和所述第二信号分量的质量因子信息,其中所述第一信号分量的质量因子信息为所述第一信号分量的质量因子或者所述第一信号分量的质量因子的索引,所述 第二信号分量的质量因子信息为所述第二信号分量的质量因子或者所述第二信号分量的质量因子的索引;
    通过所述第一信号分量的质量因子信息,获得所述第一信号分量的响应信号;
    通过所述第二信号分量的质量因子信息,获得所述第二信号分量的响应信号。
  23. 如权利要求22所述的方法,其特征在于,当所述第一信号分量的质量因子信息为所述第一信号分量的质量因子时,所述第一信号分量的质量因子取值为N个中的一个;当所述第一信号分量的质量因子信息为所述第一信号分量的质量因子的索引时,所述第一信号分量的质量因子的索引的取值范围为0至N-1或者1至N,其中N为大于1的整数;
    当所述第二信号分量的质量因子信息为所述第二信号分量的质量因子时,所述第二信号分量的质量因子取值为M个中的一个;当所述第二信号分量的质量因子信息为所述第二信号分量的质量因子的索引时,所述第二信号分量的质量因子的索引的取值范围为0至M-1或者1至M,其中M为大于1的整数。
  24. 如权利要求21-23所述的方法,其特征在于,所述码流中包括联合的特征图,
    对所述联合的特征图进行熵解码,并经神经网络处理获得所述第一信号分量的特征图和所述第二信号分量的特征图。
  25. 如权利要求21-24任一所述的方法,其特征在于,所述第一信号分量为Y分量,所述第二信号分量为UV分量,U分量或V分量。
  26. 如权利要求25所述的方法,其特征在于,当所述第二信号分量为UV分量时,若所述码流中包括所述Y分量的质量因子的索引i和所述UV分量的质量因子的索引j,所述方法包括:
    通过学习生成所述第一信号分量的响应信号矩阵{g y1,g y2,...g yi...g yN}、和所述第二信号分量的响应信号矩阵{g uv1,g uv2,...g uvj,...g uvM},其中N和M为大于1的整数;
    根据所述Y分量的质量因子的索引i得到所述第一信号分量的响应信号g yi
    根据所述UV分量的质量因子的索引j得到所述第二信号分量的响应信号g uvj
  27. 如权利要求25所述的方法,其特征在于,当所述第二信号分量为UV分量时,若所述码流中包括所述视频信号的质量因子的索引i,所述方法包括:
    通过学习生成所述视频信号的响应信号矩阵{g c1,g c2,...g ci...g cN},其中c为2代表Y分量和UV分量,N为大于1的整数;
    根据所述视频信号的质量因子的索引i得到包含所述第一信号分量和所述第二信号分量的响应信号g ci
  28. 如权利要求25所述的方法,其特征在于,当所述第二信号分量为UV分量时,若所述码流中包括所述第一信号分量的质量因子和第二信号分量的质量因子,所述方法包括:
    将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的响应信号;
    将所述UV分量的质量因子作为全连接网络的输入,输出所述UV分量的响应信号。
  29. 如权利要求25所述的方法,其特征在于,当所述第二信号分量为U分量或者V分量时,所述方法还包括:
    对所述码流进行熵解码以获得所述视频信号的第三信号分量的特征图;
    根据所述第三信号分量的响应信号和所述第三信号分量的特征图,获得所述第三信号分量的重建图,其中所述第三信号分量的响应信号通过学习获得,其中当所述第二信号分量为U分量时,所述第三信号分量为V分量,或当所述第二信号分量为V分量,所述第三信号分量为U分量;
    则所述重建所述视频信号包括:
    根据所述第一信号分量的重建图、所述第二信号分量的重建图和和所述第三信号分量的重建图,重建所述视频信号。
  30. 如权利要求29所述的方法,其特征在于,所述码流中还包括所述第三信号分量的质量因子信息,其中所述第三信号分量的质量因子信息为所述第三信号分量的质量因子或者所述第三信号分量的质量因子的索引,所述第三信号分量的质量因子取值为L个中的一个,所述第三信号分量的质量因子的索引的取值范围为0至L-1或者1至L,其中L为大于1的整数;
    通过所述第三信号分量的质量因子信息,获得所述第三信号分量的响应信号。
  31. 如权利要求30所述的方法,其特征在于,若所述码流中包括所述Y分量的质量因子的索引i、所述U分量的质量因子的索引j,和所述V分量的质量因子的索引k,所述方法包括:
    通过学习生成所述第一信号分量的响应信号矩阵{g y1,g y2,...g yi...g yN}、所述第二信号分量的响应信号矩阵{g u1,g u2,...g uj...,g uM}、所述第三信号分量的响应信号矩阵{g v1,g v2,...,g vk...g vL},其中N,M和L为大于1的整数;
    根据所述Y分量质量因子的索引i得到所述第一信号分量的响应信号g yi
    根据所述U分量质量因子的索引j得到所述第二信号分量的响应信号g uj;以及
    根据所述V分量质量因子的索引k得到所述第三信号分量的响应信号g vk
  32. 如权利要求29所述的方法,其特征在于,若所述码流中包括所述视频信号的质量因子的索引i,所述方法包括:
    通过学习生成所述视频信号的响应信号矩阵{g c1,g c2,...g ci...g cN},其中c为3代表Y分量、U分量和V分量,N为大于1的整数;
    根据所述视频信号的质量因子的索引i得到包含所述第一信号分量、所述第二信号分量和所述第三信号分量的响应信号g ci
  33. 如权利要求29所述的方法,其特征在于,若所述码流中包括所述第一信号分量的质量因子、所述第二信号分量的质量因子和所述第三信号分量的质量因子,所述方法包括:
    将所述Y分量的质量因子作为全连接网络的输入,输出所述Y分量的响应信号;
    将所述U分量的质量因子作为全连接网络的输入,输出所述U分量的响应信号;
    将所述V分量的质量因子作为全连接网络的输入,输出所述V分量的响应信号。
  34. 如权利要求21-33任一所述的方法,其特征在于,当所述响应信号包括响应向量,则根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图包括:
    将所述第一信号分量的响应向量和所述第一信号分量的特征图相乘,获得所述第一信号分量的重建图;或将所述第一信号分量的响应向量和所述第一信号分量的特征图相乘后并且再经神经网络处理,获得所述第一信号分量的重建图;
    则根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图包括:
    将所述第二信号分量的响应向量和所述第二信号分量的特征图相乘,获得所述第二信号分量的重建图;或将所述第二信号分量的响应向量和所述第二信号分量的特征图相乘后并且再经神经网络处理,获得所述第二信号分量的重建图。
  35. 如权利要求21-33任一所述的方法,其特征在于,当所述响应信号包括响应向量和偏移向量,则根据所述第一信号分量的响应信号和所述第一信号分量的特征图,获得所述第一信号分量的重建图包括:
    将所述第一信号分量的响应向量与所述第一信号分量的特征图相乘,再与所述第一信号分量的偏移向量相加,获得所述第一信号分量的重建图;或将所述第一信号分量的响应向量与所述第一信号分量的特征图相乘,再与所述第一信号分量的偏移向量相加后并且再经神经网络处理,获得所述第一信号分量的重建图;
    则根据所述第二信号分量的响应信号和所述第二信号分量的特征图,获得所述第二信号分量的重建图包括:
    将所述第二信号分量的响应向量与所述第二信号分量的特征图相乘,再与所述第二信号分量的偏移向量相加,获得所述第二信号分量的重建图;或将所述第二信号分量的响应向量与所述第二信号分量的特征图相乘,再与所述第二信号分量的偏移向量相加,并且再经神经网络处理,获得所述第二信号分量的重建图。
  36. 一种编码器,其特征在于,包括处理电路,用于执行权利要求1-20任一项所述的方法。
  37. 一种解码器,其特征在于,包括处理电路,用于执行权利要求21-35任一项所述的方法。
  38. 一种计算机程序产品,其特征在于,包括程序代码,当其在计算机或处理器上执行时,用于执行权利要求1-35任一项所述的方法。
  39. 一种编码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述编码器执行权利要求1-20任一项所述的方法。
  40. 一种解码器,其特征在于,包括:
    一个或多个处理器;
    非瞬时性计算机可读存储介质,耦合到所述处理器并存储由所述处理器执行的程序,其中所述程序在由所述处理器执行时,使得所述解码器执行权利要求21-35任一项所述的方法。
  41. 一种非瞬时性计算机可读存储介质,其特征在于,包括程序代码,当其由计算机设备执行时,用于执行权利要求1-35任一项所述的方法。
PCT/CN2022/072627 2021-01-19 2022-01-19 分层编解码的方法及装置 WO2022156688A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
MX2023008449A MX2023008449A (es) 2021-01-19 2022-01-19 Metodo y aparato de codificacion y decodificacion escalables.
KR1020237026723A KR20230129068A (ko) 2021-01-19 2022-01-19 확장 가능한 인코딩 및 디코딩 방법 및 장치
JP2023543103A JP2024503712A (ja) 2021-01-19 2022-01-19 スケーラブルな符号化及び復号方法及び装置
EP22742167.4A EP4277274A1 (en) 2021-01-19 2022-01-19 Layered encoding and decoding methods and apparatuses
US18/223,126 US20240007658A1 (en) 2021-01-19 2023-07-18 Scalable encoding and decoding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110071775.8A CN114827622A (zh) 2021-01-19 2021-01-19 分层编解码的方法及装置
CN202110071775.8 2021-01-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/223,126 Continuation US20240007658A1 (en) 2021-01-19 2023-07-18 Scalable encoding and decoding method and apparatus

Publications (1)

Publication Number Publication Date
WO2022156688A1 true WO2022156688A1 (zh) 2022-07-28

Family

ID=82524694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072627 WO2022156688A1 (zh) 2021-01-19 2022-01-19 分层编解码的方法及装置

Country Status (7)

Country Link
US (1) US20240007658A1 (zh)
EP (1) EP4277274A1 (zh)
JP (1) JP2024503712A (zh)
KR (1) KR20230129068A (zh)
CN (1) CN114827622A (zh)
MX (1) MX2023008449A (zh)
WO (1) WO2022156688A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116723333B (zh) * 2023-08-02 2023-10-31 清华大学 基于语义信息的可分层视频编码方法、装置及产品

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534432A (zh) * 2009-04-09 2009-09-16 上海广电(集团)有限公司中央研究院 基于人眼感知模型的码率控制方法
CN102158712A (zh) * 2011-03-22 2011-08-17 宁波大学 一种基于视觉的多视点视频信号编码方法
CN107277520A (zh) * 2017-07-11 2017-10-20 中国科学技术大学 帧内预测的码率控制方法
CN108134937A (zh) * 2017-12-21 2018-06-08 西北工业大学 一种基于hevc的压缩域显著性检测方法
US20200160565A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods And Apparatuses For Learned Image Compression
US20200177898A1 (en) * 2018-10-19 2020-06-04 Samsung Electronics Co., Ltd. Methods and apparatuses for performing encoding and decoding on image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534432A (zh) * 2009-04-09 2009-09-16 上海广电(集团)有限公司中央研究院 基于人眼感知模型的码率控制方法
CN102158712A (zh) * 2011-03-22 2011-08-17 宁波大学 一种基于视觉的多视点视频信号编码方法
CN107277520A (zh) * 2017-07-11 2017-10-20 中国科学技术大学 帧内预测的码率控制方法
CN108134937A (zh) * 2017-12-21 2018-06-08 西北工业大学 一种基于hevc的压缩域显著性检测方法
US20200177898A1 (en) * 2018-10-19 2020-06-04 Samsung Electronics Co., Ltd. Methods and apparatuses for performing encoding and decoding on image
US20200160565A1 (en) * 2018-11-19 2020-05-21 Zhan Ma Methods And Apparatuses For Learned Image Compression

Also Published As

Publication number Publication date
MX2023008449A (es) 2023-07-27
CN114827622A (zh) 2022-07-29
EP4277274A1 (en) 2023-11-15
KR20230129068A (ko) 2023-09-05
US20240007658A1 (en) 2024-01-04
JP2024503712A (ja) 2024-01-26

Similar Documents

Publication Publication Date Title
CN113498605A (zh) 编码器、解码器及使用自适应环路滤波器的相应方法
WO2021109978A1 (zh) 视频编码的方法、视频解码的方法及相应装置
US11818357B2 (en) Encoder, a decoder and corresponding methods using compact MV storage
JP7277586B2 (ja) モードおよびサイズに依存したブロックレベル制限の方法および装置
WO2021249290A1 (zh) 环路滤波方法和装置
WO2020114394A1 (zh) 视频编解码方法、视频编码器和视频解码器
EP3939263A1 (en) Method and apparatus for intra smoothing
WO2023279961A1 (zh) 视频图像的编解码方法及装置
AU2020206492B2 (en) Encoder, decoder, non-transitionary computer-readable medium and method of video coding a block of a picture
JP2023126795A (ja) ビデオコーディングにおけるクロマイントラ予測のための方法及び装置
JP2024520151A (ja) 特徴データ符号化および復号方法および装置
US20240007658A1 (en) Scalable encoding and decoding method and apparatus
WO2021037053A1 (en) An encoder, a decoder and corresponding methods of cabac coding for the indices of geometric partition flag
WO2021164014A1 (zh) 视频编码方法及装置
CN113287301A (zh) 用于帧内预测的分量间线性建模方法和装置
WO2022179509A1 (zh) 音视频或图像分层压缩方法和装置
WO2021027799A1 (zh) 视频编码器及qp设置方法
WO2020114393A1 (zh) 变换方法、反变换方法以及视频编码器和视频解码器
RU2786022C1 (ru) Устройство и способ для ограничений уровня блока в зависимости от режима и размера
CN113556566B (zh) 用于视频帧的帧内预测或帧间预测处理的方法及装置
RU2787217C1 (ru) Способ и устройство интерполяционной фильтрации для кодирования с предсказанием
WO2020164604A1 (en) An encoder, a decoder and corresponding methods restricting size of sub-partitions from intra sub-partition coding mode tool
WO2023165487A1 (zh) 特征域光流确定方法及相关设备
WO2021057755A1 (en) An encoder, a decoder and corresponding methods of complexity reduction on intra prediction for the planar mode
WO2021043138A9 (en) An encoder, a decoder and corresponding methods of filter modification on general intra prediction process

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22742167

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: MX/A/2023/008449

Country of ref document: MX

WWE Wipo information: entry into national phase

Ref document number: 2023543103

Country of ref document: JP

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112023014502

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20237026723

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020237026723

Country of ref document: KR

ENP Entry into the national phase

Ref document number: 2022742167

Country of ref document: EP

Effective date: 20230811

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112023014502

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20230719

WWE Wipo information: entry into national phase

Ref document number: 11202305224X

Country of ref document: SG