WO2023051223A1 - 滤波及编解码方法、装置、计算机可读介质及电子设备 - Google Patents

滤波及编解码方法、装置、计算机可读介质及电子设备 Download PDF

Info

Publication number
WO2023051223A1
WO2023051223A1 PCT/CN2022/118321 CN2022118321W WO2023051223A1 WO 2023051223 A1 WO2023051223 A1 WO 2023051223A1 CN 2022118321 W CN2022118321 W CN 2022118321W WO 2023051223 A1 WO2023051223 A1 WO 2023051223A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
deep learning
component
input
unit
Prior art date
Application number
PCT/CN2022/118321
Other languages
English (en)
French (fr)
Inventor
王力强
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023051223A1 publication Critical patent/WO2023051223A1/zh
Priority to US18/472,484 priority Critical patent/US20240015336A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation

Definitions

  • the present application relates to the field of computer and communication technologies, and in particular, to a filtering and encoding and decoding method, device, computer-readable medium, and electronic equipment.
  • Embodiments of the present application provide a filtering and encoding/decoding method, device, computer-readable medium, and electronic equipment, which can improve filtering effects at least to a certain extent, and thus help improve video encoding and decoding efficiency.
  • a filtering method based on deep learning including: obtaining the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image; according to the luminance component The reconstructed image and the chrominance component information generate input parameters of a deep learning filter; input the input parameters to the deep learning filter to obtain the output of the deep learning filter corresponding to the reconstructed image of the luminance component Filtered image.
  • a video coding method including: obtaining a reconstructed image of a luminance component corresponding to an encoded image and chrominance component information corresponding to the encoded image; reconstructing the image according to the luminance component and The chroma component information generates input parameters of a deep learning filter; the input parameters are input to the deep learning filter to obtain a filtered image corresponding to the brightness component reconstructed image output by the deep learning filter ; Generating a predicted image of the brightness component corresponding to the image of the next frame based on the filtered image, and encoding the video image of the next frame based on the predicted image of the brightness component.
  • a video decoding method including: obtaining a reconstructed image of a luminance component corresponding to an encoded image and chrominance component information corresponding to the encoded image; reconstructing the image according to the luminance component and The chroma component information generates input parameters of a deep learning filter; the input parameters are input to the deep learning filter to obtain a filtered image corresponding to the brightness component reconstructed image output by the deep learning filter ; Generate a predicted image of the brightness component corresponding to the image of the next frame based on the filtered image, and decode the video code stream based on the predicted image of the brightness component.
  • a filtering device based on deep learning including: an acquisition unit configured to acquire a reconstructed image of a luminance component corresponding to an encoded image and chrominance component information corresponding to the encoded image; A generating unit configured to generate an input parameter of a deep learning filter according to the reconstructed image of the luminance component and the information of the chrominance component; a processing unit configured to input the input parameter into the deep learning filter to obtain the described A filtered image corresponding to the reconstructed image of the luminance component output by the deep learning filter.
  • a video encoding device including: an acquisition unit configured to acquire a reconstructed image of a luminance component corresponding to an encoded image and chrominance component information corresponding to the encoded image; a generation unit, It is configured to generate an input parameter of a deep learning filter according to the reconstructed image of the luminance component and the information of the chrominance component; a processing unit is configured to input the input parameter into the deep learning filter to obtain the deep learning filter The filtered image corresponding to the reconstructed image of the luminance component output by the filter; the encoding unit is configured to generate a predicted image of the luminance component corresponding to the next frame image based on the filtered image, and based on the predicted image of the luminance component, pair the The next frame of video image is encoded.
  • a video decoding device including: an acquisition unit configured to acquire a reconstructed image of a luminance component corresponding to an encoded image and chrominance component information corresponding to the encoded image; a generation unit, It is configured to generate an input parameter of a deep learning filter according to the reconstructed image of the luminance component and the information of the chrominance component; a processing unit is configured to input the input parameter into the deep learning filter to obtain the deep learning filter The filtered image corresponding to the luminance component reconstruction image output by the device; the decoding unit is configured to generate a luminance component prediction image corresponding to the next frame image based on the filtered image, and based on the luminance component prediction image to video code The stream is decoded.
  • a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processor, the filtering method based on deep learning as described in the above-mentioned embodiments, Video encoding method or video decoding method.
  • an electronic device including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more When executed by multiple processors, the electronic device is made to implement the deep learning-based filtering method, video encoding method or video decoding method as described in the above-mentioned embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the deep learning-based filtering method, video encoding method or video decoding provided in the above various embodiments method.
  • a deep learning filter is generated according to the luminance component reconstruction image and chrominance component information.
  • the input parameters of the filter and then input the input parameters to the deep learning filter to obtain the filtered image corresponding to the reconstructed image of the brightness component output by the deep learning filter, so that when the brightness component of the image is filtered, it can make full use of
  • the information of the chroma component, and then the existing chroma component information can be used to improve the performance of the deep learning filter for the luminance component, thereby improving the filtering effect and helping to improve the video encoding and decoding efficiency.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied;
  • Fig. 2 shows a schematic diagram of placement of a video encoding device and a video decoding device in a streaming system
  • Figure 3 shows a basic flow diagram of a video encoder
  • FIG. 4 shows a schematic diagram of filtering processing based on CNNLF
  • Fig. 5 shows a schematic diagram of performing filtering processing on a luma component and a chrominance component
  • Fig. 6 shows a schematic diagram of performing filtering processing on luminance components according to an embodiment of the present application
  • Fig. 7 shows a flow chart of a filtering method based on deep learning according to an embodiment of the present application
  • FIG. 8A shows a schematic diagram of generating input parameters of a deep learning filter based on a reconstructed image of a luminance component and information of a chrominance component according to an embodiment of the present application
  • Fig. 8B shows a schematic diagram of generating input parameters of a deep learning filter based on a reconstructed image of a luminance component and information of a chrominance component according to another embodiment of the present application;
  • FIG. 9 shows a schematic diagram of generating input parameters of a deep learning filter based on a brightness component reconstruction image and chrominance component information according to another embodiment of the present application.
  • FIG. 10 shows a schematic structural diagram of a residual block according to an embodiment of the present application.
  • Fig. 11 shows a schematic structural diagram of a residual block according to another embodiment of the present application.
  • Fig. 12 shows a schematic structural diagram of a residual block according to another embodiment of the present application.
  • Fig. 13 shows a schematic structural diagram of a residual block according to another embodiment of the present application.
  • FIG. 14 shows a flowchart of a video coding method according to an embodiment of the present application.
  • Fig. 15 shows a flowchart of a video decoding method according to an embodiment of the present application
  • Fig. 16 shows a block diagram of a filtering device based on deep learning according to an embodiment of the present application
  • FIG. 17 shows a block diagram of a video encoding device according to an embodiment of the present application.
  • Fig. 18 shows a block diagram of a video decoding device according to an embodiment of the present application.
  • FIG. 19 shows a schematic structural diagram of a computer system suitable for implementing the electronic device of the embodiment of the present application.
  • Example embodiments will now be described in a more complete manner with reference to the accompanying drawings.
  • Example embodiments may, however, be embodied in various forms and should not be construed as limited to these examples; rather, these embodiments are provided so that this application will be thorough and complete, and to fully convey the concepts of example embodiments communicated to those skilled in the art.
  • the "plurality” mentioned in this article refers to two or more than two.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships. For example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently. The character “/” generally indicates that the contextual objects are an "or” relationship.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture 100 includes a plurality of end devices that can communicate with each other through, for example, a network 150 .
  • the system architecture 100 may include a first terminal device 110 and a second terminal device 120 interconnected by a network 150 .
  • the first terminal device 110 and the second terminal device 120 perform unidirectional data transmission.
  • the first terminal device 110 can encode video data (such as a video picture stream collected by the terminal device 110) to be transmitted to the second terminal device 120 through the network 150, and the encoded video data is encoded in one or more
  • the coded video code stream is transmitted, and the second terminal device 120 can receive the coded video data from the network 150, decode the coded video data to recover the video data, and display the video picture according to the recovered video data.
  • the system architecture 100 may include a third terminal device 130 and a fourth terminal device 140 performing bidirectional transmission of encoded video data, such as may occur during a video conference.
  • each of the third terminal device 130 and the fourth terminal device 140 can encode video data (such as a stream of video pictures captured by the terminal device) for transmission to the third terminal device via the network 150 130 and the other terminal device in the fourth terminal device 140 .
  • Each of the third terminal device 130 and the fourth terminal device 140 can also receive the encoded video data transmitted by the other terminal device of the third terminal device 130 and the fourth terminal device 140, and can modify the encoded video data.
  • the video data is decoded to recover the video data, and video pictures may be displayed on an accessible display device based on the recovered video data.
  • the first terminal device 110, the second terminal device 120, the third terminal device 130 and the fourth terminal device 140 may be servers, personal computers and smart phones, but the principles disclosed in this application may not be limited thereto . Embodiments disclosed herein are suitable for use with laptop computers, tablet computers, media players, and/or dedicated video conferencing equipment.
  • the network 150 represents any number of networks, including for example wired and/or wireless communication networks, that communicate encoded video data between the first terminal device 110, the second terminal device 120, the third terminal device 130 and the fourth terminal device 140.
  • Communication network 150 may exchange data in circuit-switched and/or packet-switched channels.
  • the network may include a telecommunications network, a local area network, a wide area network and/or the Internet. For purposes of this application, unless explained below, the architecture and topology of network 150 may be immaterial to the operation of the present disclosure.
  • FIG. 2 shows how a video encoding device and a video decoding device are placed in a streaming environment.
  • the subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, videoconferencing, digital TV (television), storing compressed video on digital media including CDs, DVDs, memory sticks, and the like.
  • the streaming transmission system may include an acquisition subsystem 213 , the acquisition subsystem 213 may include a video source 201 such as a digital camera, and the video source creates an uncompressed video picture stream 202 .
  • video picture stream 202 includes samples captured by a digital camera. Compared to the encoded video data 204 (or encoded video code stream 204), the video picture stream 202 is depicted as a thick line to emphasize the video picture stream of high data volume.
  • the video picture stream 202 can be processed by the electronic device 220, the electronic device 220
  • the device 220 comprises a video encoding device 203 coupled to the video source 201 .
  • Video encoding device 203 may include hardware, software, or a combination of hardware and software to implement or implement aspects of the disclosed subject matter as described in more detail below.
  • the encoded video data 204 (or encoded video code stream 204 ) is depicted as a thinner line to emphasize the lower data volume of the encoded video data 204 (or encoded video code stream 204 ) compared to the video picture stream 202 . 204), which may be stored on the streaming server 205 for future use.
  • One or more streaming client subsystems such as client subsystem 206 and client subsystem 208 in FIG. 2 , may access streaming server 205 to retrieve copies 207 and 209 of encoded video data 204 .
  • Client subsystem 206 may include, for example, video decoding device 210 in electronic device 230 .
  • a video decoding device 210 decodes an incoming copy 207 of encoded video data and produces an output video picture stream 211 that may be presented on a display 212, such as a display screen, or another presentation device.
  • encoded video data 204, video data 207, and video data 209 may be encoded according to certain video encoding/compression standards.
  • the electronic device 220 and the electronic device 230 may include other components not shown in the figure.
  • the electronic device 220 may include a video decoding device
  • the electronic device 230 may also include a video encoding device.
  • HEVC High Efficiency Video Coding, high-efficiency video codec
  • VVC Very Video Coding, multifunctional video codec
  • China's national video codec standard AVS For example, when a video frame image is input, the video frame image will be divided into several non-overlapping processing units according to a block size, and each processing unit will perform a similar compression operation. This processing unit is called CTU (Coding Tree Unit, coding tree unit), or called LCU (Largest Coding Unit, largest coding unit). The CTU can be further divided into finer divisions to obtain one or more basic coding units CU (Coding Unit, coding unit). CU is the most basic element in a coding process.
  • Predictive Coding includes intra-frame prediction and inter-frame prediction. After the original video signal is predicted by the selected reconstructed video signal, a residual video signal is obtained. The encoder needs to decide which predictive coding mode to choose for the current CU, and inform the decoder. Among them, intra-frame prediction means that the predicted signal comes from the area that has been coded and reconstructed in the same image; inter-frame prediction means that the predicted signal comes from another image (called a reference image) that has been encoded and is different from the current image. .
  • Transform&Quantization After the residual video signal undergoes transformation operations such as DFT (Discrete Fourier Transform, Discrete Fourier Transform), DCT (Discrete Cosine Transform, Discrete Cosine Transform), the signal is converted into the transform domain, called is the transformation coefficient.
  • the transform coefficients are further subjected to a lossy quantization operation, and certain information is lost, so that the quantized signal is conducive to compressed expression.
  • there may be more than one transformation method for selection so the encoder also needs to select one of the transformation methods for the current CU and inform the decoder.
  • the fineness of quantization is usually determined by the quantization parameter (Quantization Parameter, referred to as QP).
  • a larger value of QP means that coefficients with a larger range of values will be quantized into the same output, which usually results in greater distortion and Lower code rate; on the contrary, the QP value is smaller, which means that the coefficients with a smaller range of values will be quantized to the same output, so it usually brings smaller distortion and corresponds to a higher code rate.
  • Entropy Coding or Statistical Coding The quantized transform domain signal will be statistically compressed and encoded according to the frequency of occurrence of each value, and finally output a binary (0 or 1) compressed code stream. At the same time, other information generated by encoding, such as the selected encoding mode, motion vector data, etc., also needs to be entropy encoded to reduce the bit rate.
  • Statistical coding is a lossless coding method that can effectively reduce the bit rate required to express the same signal. Common statistical coding methods include variable length coding (Variable Length Coding, VLC for short) or context-based binary arithmetic coding ( Content Adaptive Binary Arithmetic Coding, referred to as CABAC).
  • the context-based binary arithmetic coding (CABAC) process mainly includes three steps: binarization, context modeling and binary arithmetic coding.
  • the binary data can be encoded by the regular coding mode and the bypass coding mode (Bypass Coding Mode).
  • the bypass encoding mode does not need to assign a specific probability model to each binary bit, and the input binary bit bin value is directly encoded by a simple bypass encoder to speed up the entire encoding and decoding.
  • different grammatical elements are not completely independent, and the same grammatical element itself has a certain memory.
  • the encoded information used as conditions is called context.
  • the binary bits of the syntax elements enter the context encoder sequentially, and the encoder assigns an appropriate probability model to each input binary bit according to the value of the previously encoded syntax element or binary bit, which Processes model contexts.
  • ctxIdxInc context index increment, context index increment
  • ctxIdxStart context index Start, context start index
  • Loop Filtering The transformed and quantized signal will be reconstructed through inverse quantization, inverse transformation and prediction compensation operations. Compared with the original image, due to the influence of quantization, some information of the reconstructed image is different from the original image, that is, the reconstructed image will produce distortion (Distortion). Therefore, filtering operations can be performed on the reconstructed image, such as Deblocking filter (DB for short), SAO (Sample Adaptive Offset, adaptive pixel compensation) or ALF (Adaptive Loop Filter, adaptive loop filter), etc., can Effectively reduce the degree of distortion generated by quantization. Since these filtered reconstructed images will be used as references for subsequent encoded images to predict future image signals, the above filtering operation is also called in-loop filtering, that is, a filtering operation in an encoding loop.
  • DB Deblocking filter
  • SAO Sample Adaptive Offset, adaptive pixel compensation
  • ALF Adaptive Loop Filter, adaptive loop filter
  • FIG. 3 shows a basic flow chart of a video encoder, in which intra prediction is taken as an example for illustration.
  • the original image signal s k [x,y] and the predicted image signal Do the difference operation to get the residual signal u k [x, y], and the residual signal u k [x, y] is transformed and quantized to obtain the quantized coefficient.
  • the quantized coefficient is obtained by entropy coding to obtain the coded bit
  • the reconstructed residual signal u' k [x,y] is obtained through inverse quantization and inverse transformation processing, and the predicted image signal Superimposed with the reconstructed residual signal u' k [x,y] to generate a reconstructed image signal reconstruct image signal
  • it is input to the intra-frame mode decision module and the intra-frame prediction module for intra-frame prediction processing, and on the other hand, it is filtered through loop filtering, and the filtered image signal s' k [x, y] is output, and the filtered
  • the image signal s' k [x, y] can be used as a reference image of the next frame for motion estimation and motion compensation prediction. Then based on the result of motion compensation prediction s' r [x+m x ,y+m y ] and the intra prediction result Get the predicted image signal of the next frame And continue to repeat the above process until the encoding is
  • the above loop filtering can be realized based on CNNLF (Convolutional Neural Network Loop Filter, a filter based on deep learning).
  • the model structure of CNNLF includes basic modules such as convolutional layer, activation function, fully connected layer and pooling layer, and model parameters It needs to be obtained through training.
  • the pre-filtered image can be input into the trained CNNLF, and finally the filtered image is output.
  • different filtering models can be trained separately for the luminance component (Y) and the chroma components (Cb and Cr) to improve the filtering performance for the luminance component and the chroma component.
  • Y luminance component
  • Cb and Cr chroma components
  • two deep learning filters can be used to filter the Y component and ⁇ Cb, Cr ⁇ component respectively; or as shown in method B, three deep learning filters can be used to filter Filters perform filtering processing on the Y component, Cb component and Cr component respectively.
  • mode A or mode B shown in Figure 5 since the Y component contains more texture information, the information of the Y component is often introduced to improve the classification of the filter when filtering the chrominance components Cb and/or Cr Accuracy, thereby improving the final filtering performance.
  • the embodiment of the present application proposes to use the existing information to improve the performance of the deep learning filter for the luma component.
  • the performance of the deep learning filter for the luminance component Y can be improved by using the information of the chrominance components ⁇ Cb, Cr ⁇ .
  • the information of the chroma component may be one or more of an image of the chroma component before filtering (that is, a reconstructed image of the chroma component), a predicted image of the chroma component, and block division information of the chroma component.
  • deep learning belongs to the category of artificial intelligence (AI for short), while machine learning (ML for short) is the core of artificial intelligence and the fundamental way to make computers intelligent.
  • the deep learning filter in the embodiment of the present application is a filter based on machine learning/deep learning.
  • Fig. 7 shows a flowchart of a filtering method based on deep learning according to an embodiment of the present application.
  • the filtering method based on deep learning can be performed by a device with processing functions such as calculation and storage, such as by a terminal device or server to execute.
  • the deep learning-based filtering method includes at least step S710 to step S730, which are described in detail as follows:
  • step S710 the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image are obtained.
  • the reconstructed image corresponding to the encoded image is the image generated by superimposing the reconstructed residual image obtained after inverse quantization and inverse transformation processing with the predicted image, for example, in the process shown in Figure 3 , the reconstructed image is the predicted image signal The image signal generated by superimposing with the reconstructed residual signal u' k [x,y] The reconstructed image of the luminance component is the luminance part in the reconstructed image corresponding to the coded image.
  • the chroma component information includes at least one of the following: a pre-filtered chroma component image corresponding to the encoded image, a predicted chroma component image corresponding to the encoded image, and chroma component block division information corresponding to the encoded image.
  • the chroma component block division information may be an image generated according to the chroma component block division result, for example, may include at least one of the following images: a binary image generated according to the chroma component block division boundary; The binary image generated by the filter on the filter boundary of the reconstructed image of the chroma component; for the reconstructed image of the chroma component before filtering, according to the result of block division, the average value in the same block is taken as the value of all samples in the block
  • the chroma component blocks divide the mean image.
  • step S720 the input parameters of the deep learning filter are generated according to the reconstructed image of the luma component and the information of the chrominance component.
  • the chroma component information can be up-sampled to obtain the chroma component information with the same size as the reconstructed image of the luma component, and then the reconstructed image of the luma component can be combined with the up-sampled chroma component information
  • the layer merging process is performed, and the result of the layer merging process is used as an input parameter of the deep learning filter.
  • the upsampling process may be implemented by using an upsampling filter or by a deep learning module.
  • the upsampling filter can be based on Lanczos (an algorithm that transforms a symmetric matrix into a symmetric tridiagonal matrix through an orthogonal similarity transformation), nearest (nearest neighbor interpolation algorithm), bilinear (bilinear interpolation algorithm), bicubic ( Bicubic interpolation algorithm) and other algorithms.
  • the upsampled chroma component information may be a chroma component reconstructed image, as shown in FIG.
  • the degree component reconstructed image that is, the Cb and Cr components
  • the degree component reconstructed image is merged as a layer, and then input to the deep learning filter, and finally the image Y' filtered by the brightness component is obtained.
  • the deep learning filter includes a sequentially connected convolution unit, residual unit and rearrangement unit.
  • the convolution unit includes a convolutional layer (Convolutional layer, conv for short) and a parametric rectified linear unit (Parametric Rectified Linear Unit, prelu for short);
  • the residual unit includes N residual blocks connected in sequence;
  • the rearrangement unit is It is a shuffle unit.
  • the Shuffle unit can execute the shuffle() function to rearrange the elements in the array in a random order.
  • the upsampling of the image is realized by rearranging.
  • the upsampled chroma component information may be a chroma component reconstructed image, a chroma component predicted image, and chroma component block division information.
  • the reconstructed image of the luma component, the predicted image of the luma component and the division information of the luma component block (that is, the Y component) and the reconstructed image of the chroma component after the upsampling process, the predicted picture of the chroma component and the block of the chroma component can be The division information (that is, the Cb and Cr components) are combined as a layer, and then input into the deep learning filter, and finally the image Y' after the brightness component is filtered is obtained.
  • the deep learning filter includes a sequentially connected convolution unit, residual unit and rearrangement unit.
  • the convolution unit includes a convolution layer conv and a parameterized corrected linear unit prelu;
  • the residual unit includes N residual blocks connected in sequence;
  • the rearrangement unit is the shuffle unit.
  • the luma component block division information in the embodiment of the present application may be an image generated according to the luma component block division result, for example, may include at least one of the following images: The binary image generated by the block division boundary; the binary image generated by the filtering boundary of the luminance component reconstruction image according to the deblocking filter; for the luminance component reconstruction image before filtering, according to the block division result, take the average in the same block as The brightness component block division average image obtained by the values of all samples in the block.
  • the chroma component information can be up-sampled to obtain chroma component information with the same size as the reconstructed image of the luma component.
  • the chroma component information includes: the chroma component image before filtering, the chroma component Component prediction image, chrominance component block division information; then merge the luminance component reconstructed image with the upsampled chrominance component filter image, and extract the merged image features to obtain the first feature; corresponding to the encoded image
  • the luminance component prediction image of the coded image is merged with the up-sampled chroma component prediction image, and the combined image features are extracted to obtain the second feature;
  • the degree component block division information is merged, and the merged image features are extracted to obtain the third feature; then, the input parameters of the deep learning filter are generated according to the first feature, the second feature and the third feature.
  • the deep learning filter includes a convolution unit, a residual unit and a rearrangement unit connected in sequence.
  • the residual unit includes N residual blocks connected in sequence; rearrangement
  • the unit is the shuffle unit.
  • the description is made by taking the chroma component information as an example including: the pre-filtered image of the chroma component (ie, the reconstructed image of the chroma component), the predicted image of the chroma component, and the block division information of the chroma component .
  • the chroma component information may also include parts of the pre-filtered chroma component image (ie the chroma component reconstructed image), the chroma component predicted image, and the chroma component block division information. If the chrominance component information does not include the chrominance component predicted image, then in the embodiment shown in FIG. Chroma component block division information, then in the embodiment shown in FIG. 9 , the feature extraction part of the block division information (Y, Cb, Cr) through the convolution unit can be removed.
  • the N residual blocks included in the residual unit may be 1, 2, 3, etc. any positive integer number of residual blocks.
  • the residual block structure in one embodiment of the present application may include: a first convolutional layer connected in sequence (its convolution kernel size may be 1 ⁇ 1), a parameterized rectified linear unit, and a second Convolutional layer (its convolution kernel size can be 3 ⁇ 3); wherein, the input of the first convolutional layer is used as the input of the residual block, and the superposition result of the input of the first convolutional layer and the output of the second convolutional layer as the output of the residual block.
  • the residual block structure in one embodiment of the present application may include: first convolutional layers connected in sequence (the size of the convolutional kernel may be 1 ⁇ 1), a parameterized rectified linear unit, a second Convolutional layer (its convolution kernel size can be 3 ⁇ 3) and convolutional block attention unit (Convolutional Block Attention Module, referred to as CBAM); where the input of the first convolutional layer is used as the input of the residual block, the first The superposition result of the input of the convolutional layer and the output of the attention unit of the convolutional block is used as the output of the residual block.
  • CBAM convolutional Block Attention Module
  • the residual block structure in one embodiment of the present application may include: the first convolutional layer connected in sequence (the size of the convolution kernel may be 1 ⁇ 1), a parameterized rectified linear unit, a third Convolution layer (its convolution kernel size can be 1 ⁇ 1) and the second convolution layer (its convolution kernel size can be 3 ⁇ 3); wherein, the convolution kernel of the first convolution layer and the third convolution
  • the convolution kernels of the layers have the same size, the input of the first convolution layer is used as the input of the residual block, and the superposition result of the input of the first convolution layer and the output of the second convolution layer is used as the output of the residual block.
  • the residual block structure in one embodiment of the present application may include: the first convolutional layer connected in sequence (the size of the convolution kernel may be 1 ⁇ 1), a parameterized rectified linear unit, a third Convolution layer (its convolution kernel size can be 1 ⁇ 1), the second convolution layer (its convolution kernel size can be 3 ⁇ 3) and convolution block attention unit; wherein, the volume of the first convolution layer
  • the size of the convolution kernel is the same as that of the third convolutional layer, the input of the first convolutional layer is used as the input of the residual block, and the superposition result of the input of the first convolutional layer and the output of the attention unit of the convolutional block is used as the residual The output of the bad block.
  • the residual unit may include one or more residual blocks, and the structure of each residual block may be any one of the above.
  • the number of convolutional layers and prelu layers included in the convolution unit of the deep learning filter can be set according to actual needs, and the number of convolutional layers and prelu layers included in the residual block can also be set according to actual needs. At the same time, the number of channels of different convolutional layers can be the same or different.
  • step S730 the generated input parameters are input to the deep learning filter to obtain a filtered image corresponding to the reconstructed image of the luminance component output by the deep learning filter.
  • the deep learning filter needs to use the same parameters as input when training. Specifically, in the training phase, it is necessary to obtain the sample reconstruction image of the luminance component and the corresponding chrominance component information (the chrominance component information is adjusted according to the usage scene of the deep learning filter, that is, when the deep learning filter is used The parameters used are matched), and then the input parameters for training are generated according to the sample reconstruction image of the brightness component and the chrominance component information, and then the input parameters obtained are input into the deep learning filter, according to the output of the deep learning filter and The parameters of the deep learning filter are adjusted by the loss value between the expected filtering result image corresponding to the sample reconstruction image of the brightness component, and this process is repeated until the deep learning filter meets the convergence condition.
  • a video encoding method is also proposed in the embodiment of the present application.
  • the video encoding method can be performed by a device with processing functions such as calculation and storage, for example, Executed by a terminal device or a server.
  • the specific process is shown in Figure 14, including the following steps S1410 to S1440:
  • step S1410 the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image are obtained.
  • step S710 For the specific implementation details of this step, reference may be made to the aforementioned step S710, and details are not repeated here.
  • step S1420 the input parameters of the deep learning filter are generated according to the reconstructed image of the luma component and the information of the chrominance component.
  • step S720 For the specific implementation details of this step, reference may be made to the aforementioned step S720, which will not be repeated here.
  • step S1430 the generated input parameters are input to the deep learning filter to obtain a filtered image corresponding to the reconstructed image of the luminance component output by the deep learning filter.
  • step S1440 based on the filtered image of the luminance component reconstructed image, a luminance component prediction image corresponding to the next frame image is generated, and the next frame video image is encoded based on the luminance component prediction image.
  • the process shown in FIG. 3 can be referred to, that is, the filtered image of the luminance component reconstructed image is used as the luminance component reference image of the next frame image for motion estimation and motion compensation prediction, and then based on The result of the motion compensation prediction and the result of the intra-frame prediction obtain the predicted image of the luminance component of the next frame image, and continue to repeat the process shown in FIG. 3 until the encoding of the video image is completed.
  • a video decoding method is also proposed in the embodiment of the present application, and the video decoding method can be executed by a device with processing functions such as calculation and storage. , for example, can be executed by a terminal device or a server.
  • the specific process is shown in Figure 15, including the following steps S1510 to S1540:
  • step S1510 the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image are obtained.
  • step S710 For the specific implementation details of this step, reference may be made to the aforementioned step S710, and details are not repeated here.
  • step S1520 the input parameters of the deep learning filter are generated according to the reconstructed image of the luma component and the information of the chrominance component.
  • step S720 For the specific implementation details of this step, reference may be made to the aforementioned step S720, which will not be repeated here.
  • step S1530 the generated input parameters are input to the deep learning filter to obtain a filtered image corresponding to the reconstructed image of the luminance component output by the deep learning filter.
  • step S1540 based on the filtered image of the luminance component reconstructed image, a luminance component prediction image corresponding to the next frame image is generated, and the video code stream is decoded based on the luminance component prediction image.
  • the filtered image of the reconstructed image of the brightness component may be used as a reference image of the brightness component of the next frame for motion estimation and motion compensation prediction, and then based on the motion compensation prediction
  • the predicted image of the brightness component of the next frame image is obtained from the results and the intra-frame prediction results, and the predicted image of the brightness component is superimposed on the reconstructed residual signal of the brightness component obtained by inverse quantization and inverse transformation processing to generate the reconstructed image of the brightness component of the next frame. And repeat this process, so as to realize the decoding processing of the video code stream.
  • the technical solution of the embodiment of the present application makes it possible to make full use of the information of the chroma component when filtering the luminance component of the image, and then use the existing chroma component information to improve the performance of the deep learning filter for the luminance component, Therefore, the filtering effect can be improved, which is beneficial to improving the encoding and decoding efficiency of the video.
  • FIG. 16 shows a block diagram of a deep learning-based filtering device according to an embodiment of the present application.
  • the deep learning-based filtering device can be set in a device with processing functions such as calculation and storage, for example, it can be set in a terminal device or inside the server.
  • a filtering device 1600 based on deep learning includes: an acquiring unit 1602 , a generating unit 1604 and a processing unit 1606 .
  • the acquiring unit 1602 is configured to acquire the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image; the generating unit 1604 is configured to generate The input parameters of the deep learning filter; the processing unit 1606 is configured to input the input parameters to the deep learning filter to obtain a filtered image corresponding to the brightness component reconstruction image output by the deep learning filter.
  • the chroma component information includes at least one of the following: a pre-filtered chroma component image corresponding to the encoded image, and a predicted chroma component image corresponding to the encoded image . Chroma component block division information corresponding to the coded image.
  • the generating unit 1604 is configured to: perform upsampling processing on the chrominance component information to obtain chrominance component information having the same size as the reconstructed image of the luminance component;
  • the luminance component reconstructed image and the upsampled chrominance component information are subjected to layer merging processing, and the result of the layer merging processing is used as an input parameter of the deep learning filter.
  • the generating unit 1604 is configured to: perform upsampling processing on the chrominance component information to obtain chrominance component information having the same size as the reconstructed image of the luminance component, so
  • the chroma component information includes: the chroma component image before filtering, the chroma component prediction image, and the chroma component block division information; the reconstructed image of the luminance component is merged with the up-sampled chroma component image before filtering, and Extracting the merged image features to obtain the first feature; merging the predicted image of the luminance component corresponding to the encoded image and the predicted image of the chrominance component after the upsampling process, and extracting the merged image features to obtain the second feature ; Merging the luminance component block division information corresponding to the encoded image and the chrominance component block division information after the upsampling process, and extracting the merged image features to obtain the third feature; according to the first feature
  • the deep learning filter includes a convolution unit, a residual unit, and a rearrangement unit connected in sequence, and the residual unit includes at least one residual block.
  • the residual unit includes multiple residual blocks
  • the number of channels of the multiple residual blocks is the same, or each residual block in the multiple residual blocks
  • the channel numbers of the difference blocks are not exactly the same.
  • one residual block includes: a sequentially connected first convolutional layer, a parameterized rectified linear unit, and a second convolutional layer; wherein, the first convolutional The input of the convolutional layer is used as the input of the residual block, and the superposition result of the input of the first convolutional layer and the output of the second convolutional layer is used as the output of the residual block.
  • one residual block includes: a sequentially connected first convolutional layer, a parametric rectified linear unit, a second convolutional layer, and a convolutional block attention unit; Wherein, the input of the first convolutional layer is used as the input of the residual block, and the superposition result of the input of the first convolutional layer and the output of the attention unit of the convolutional block is used as the input of the residual block output.
  • one residual block includes: a sequentially connected first convolutional layer, a parameterized rectified linear unit, a third convolutional layer, and a second convolutional layer; wherein , the convolution kernel of the first convolution layer is the same size as the convolution kernel of the third convolution layer, the input of the first convolution layer is used as the input of the residual block, and the first convolution The superposition result of the input of the convolutional layer and the output of the second convolutional layer is used as the output of the residual block.
  • one residual block includes: a sequentially connected first convolutional layer, a parameterized rectified linear unit, a third convolutional layer, a second convolutional layer, and a convolutional A block attention unit; wherein, the convolution kernel of the first convolution layer is the same size as the convolution kernel of the third convolution layer, and the input of the first convolution layer is used as the residual block Input, the superposition result of the input of the first convolution layer and the output of the attention unit of the convolution block is used as the output of the residual block.
  • FIG. 17 shows a block diagram of a video encoding device according to an embodiment of the present application.
  • the video encoding device may be set in a device having processing functions such as calculation and storage, such as a terminal device or a server.
  • a video encoding device 1700 includes: an acquiring unit 1602 , a generating unit 1604 , a processing unit 1606 and an encoding unit 1702 .
  • the acquiring unit 1602 is configured to acquire the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image;
  • the generating unit 1604 is configured to generate The input parameters of the deep learning filter;
  • the processing unit 1606 is configured to input the input parameters to the deep learning filter, and obtain the filtered image corresponding to the reconstructed image of the brightness component output by the deep learning filter;
  • encoding The unit 1702 is configured to generate a luminance component prediction image corresponding to the next frame image based on the filtered image, and perform encoding processing on the next frame video image based on the luminance component prediction image.
  • FIG. 18 shows a block diagram of a video decoding device according to an embodiment of the present application.
  • the video decoding device may be set in a device having processing functions such as calculation and storage, such as a terminal device or a server.
  • a video decoding device 1800 includes: an acquiring unit 1602 , a generating unit 1604 , a processing unit 1606 and a decoding unit 1802 .
  • the acquiring unit 1602 is configured to acquire the reconstructed image of the luminance component corresponding to the encoded image and the chrominance component information corresponding to the encoded image;
  • the generating unit 1604 is configured to generate The input parameters of the deep learning filter;
  • the processing unit 1606 is configured to input the input parameters to the deep learning filter, and obtain the filtered image corresponding to the reconstructed image of the brightness component output by the deep learning filter;
  • decoding The unit 1802 is configured to generate a luminance component prediction image corresponding to the next frame image based on the filtered image, and perform decoding processing on the video code stream based on the luminance component prediction image.
  • FIG. 19 shows a schematic structural diagram of a computer system suitable for implementing the electronic device of the embodiment of the present application.
  • a computer system 1900 includes a central processing unit (Central Processing Unit, CPU) 1901, which can be stored in a program in a read-only memory (Read-Only Memory, ROM) 1902 or loaded from a storage part 1908 to a random Access programs in the memory (Random Access Memory, RAM) 1903 to perform various appropriate actions and processes, such as performing the methods described in the above-mentioned embodiments.
  • CPU Central Processing Unit
  • RAM Random Access Memory
  • RAM 1903 various programs and data necessary for system operation are also stored.
  • the CPU 1901, ROM 1902, and RAM 1903 are connected to each other through a bus 1904.
  • An input/output (Input/Output, I/O) interface 1905 is also connected to the bus 1904 .
  • the following components are connected to the I/O interface 1905: an input part 1906 including a keyboard, a mouse, etc.; an output part 1907 including a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (Liquid Crystal Display, LCD), etc., and a speaker ; comprise the storage part 1908 of hard disk etc.; And comprise the communication part 1909 of the network interface card such as LAN (Local Area Network, local area network) card, modem etc. The communication section 1909 performs communication processing via a network such as the Internet.
  • a drive 1910 is also connected to the I/O interface 1905 as needed.
  • a removable medium 1911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 1910 as necessary so that a computer program read therefrom is installed into the storage section 1908 as necessary.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes a computer program for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication portion 1909 and/or installed from removable media 1911 .
  • CPU central processing unit
  • the computer-readable medium shown in the embodiment of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable one of the above The combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which a computer-readable computer program is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • a computer program embodied on a computer readable medium can be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of the code, and the above-mentioned module, program segment, or part of the code includes one or more executable instruction.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or by hardware, and the described units may also be set in a processor. Wherein, the names of these units do not constitute a limitation of the unit itself under certain circumstances.
  • the present application also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above-mentioned embodiments; or it may exist independently without being assembled into the electronic device. middle.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to implement the methods described in the above-mentioned embodiments.

Abstract

本申请的实施例提供了一种滤波及编解码方法、装置、计算机可读介质及电子设备。该基于深度学习的滤波方法包括:获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像。本申请实施例的技术方案可以提高滤波效果,进而有利于提升视频的编解码效率。

Description

滤波及编解码方法、装置、计算机可读介质及电子设备
本申请要求于2021年9月28日提交中国专利局、申请号为202111144705.7、名称为“滤波及编解码方法、装置、计算机可读介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机及通信技术领域,具体而言,涉及一种滤波及编解码方法、装置、计算机可读介质及电子设备。
背景技术
在视频编解码领域中,预测图像与重构残差图像叠加生成重建图像之后,由于重建图像会产生失真,因此为了获取较优质量的图像,通常需要对重建图像进行环路滤波处理(Loop Filtering),而在环路滤波处理中,如何能够提高滤波效果,以提升编解码效率是亟待解决的技术问题。
发明内容
本申请的实施例提供了一种滤波及编解码方法、装置、计算机可读介质及电子设备,进而至少在一定程度上可以提高滤波效果,进而有利于提升视频的编解码效率。
本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。
根据本申请实施例的一个方面,提供了一种基于深度学习的滤波方法,包括:获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像。
根据本申请实施例的一个方面,提供了一种视频编码方法,包括:获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对所述下一帧视频图像进行编码处理。
根据本申请实施例的一个方面,提供了一种视频解码方法,包括:获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已 滤波图像;基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对视频码流进行解码处理。
根据本申请实施例的一个方面,提供了一种基于深度学习的滤波装置,包括:获取单元,配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;生成单元,配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;处理单元,配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像。
根据本申请实施例的一个方面,提供了一种视频编码装置,包括:获取单元,配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;生成单元,配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;处理单元,配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;编码单元,配置为基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对所述下一帧视频图像进行编码处理。
根据本申请实施例的一个方面,提供了一种视频解码装置,包括:获取单元,配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;生成单元,配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;处理单元,配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;解码单元,配置为基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对视频码流进行解码处理。
根据本申请实施例的一个方面,提供了一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述实施例中所述的基于深度学习的滤波方法、视频编码方法或视频解码方法。
根据本申请实施例的一个方面,提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述电子设备实现如上述实施例中所述的基于深度学习的滤波方法、视频编码方法或视频解码方法。
根据本申请实施例的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各种实施例中提供的基于深度学习的滤波方法、视频编码方法或视频解码方法。
在本申请的一些实施例所提供的技术方案中,通过获取已编码图像对应的亮度分量重建图像和已编码图像对应的色度分量信息,根据亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数,进而将该输入参数输入至深度学习滤波器,得到深度学习滤波器输出的对应于亮度分量重建图像的已滤波图像,使得在对图像的亮度分量进行滤波处理时,能够充分利用色度分量的信息,进而可以利用已有的色度分量信息提升针 对亮度分量的深度学习滤波器的性能,从而可以提高滤波效果,有利于提升视频的编解码效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图简要说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图;
图2示出了视频编码装置和视频解码装置在流式传输系统中的放置方式示意图;
图3示出了一个视频编码器的基本流程图;
图4示出了基于CNNLF进行滤波处理的示意图;
图5示出了对亮度分量和色度分量进行滤波处理的示意图;
图6示出了根据本申请的一个实施例的对亮度分量进行滤波处理的示意图;
图7示出了根据本申请的一个实施例的基于深度学习的滤波方法的流程图;
图8A示出了根据本申请的一个实施例的基于亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数的示意图;
图8B示出了根据本申请的另一个实施例的基于亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数的示意图;
图9示出了根据本申请的另一个实施例的基于亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数的示意图;
图10示出了根据本申请的一个实施例的残差块的结构示意图;
图11示出了根据本申请的另一个实施例的残差块的结构示意图;
图12示出了根据本申请的另一个实施例的残差块的结构示意图;
图13示出了根据本申请的另一个实施例的残差块的结构示意图;
图14示出了根据本申请的一个实施例的视频编码方法的流程图;
图15示出了根据本申请的一个实施例的视频解码方法的流程图;
图16示出了根据本申请的一个实施例的基于深度学习的滤波装置的框图;
图17示出了根据本申请的一个实施例的视频编码装置的框图;
图18示出了根据本申请的一个实施例的视频解码装置的框图;
图19示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在参考附图以更全面的方式描述示例实施方式。然而,示例的实施方式能够以各种形式实施,且不应被理解为仅限于这些范例;相反,提供这些实施方式的目的是使得 本申请更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,本申请所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,有许多具体细节从而可以充分理解本申请的实施例。然而,本领域技术人员应意识到,在实施本申请的技术方案时可以不需用到实施例中的所有细节特征,可以省略一个或更多特定细节,或者可以采用其它的方法、元件、装置、步骤等。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
需要说明的是:在本文中提及的“多个”是指两个或两个以上。“和/或”描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构100包括多个终端装置,所述终端装置可通过例如网络150彼此通信。举例来说,系统架构100可以包括通过网络150互连的第一终端装置110和第二终端装置120。在图1的实施例中,第一终端装置110和第二终端装置120执行单向数据传输。
举例来说,第一终端装置110可对视频数据(例如由终端装置110采集的视频图片流)进行编码以通过网络150传输到第二终端装置120,已编码的视频数据以一个或多个已编码视频码流形式传输,第二终端装置120可从网络150接收已编码视频数据,对已编码视频数据进行解码以恢复视频数据,并根据恢复的视频数据显示视频图片。
在本申请的一个实施例中,系统架构100可以包括执行已编码视频数据的双向传输的第三终端装置130和第四终端装置140,所述双向传输比如可以发生在视频会议期间。对于双向数据传输,第三终端装置130和第四终端装置140中的每个终端装置可对视频数据(例如由终端装置采集的视频图片流)进行编码,以通过网络150传输到第三终端装置130和第四终端装置140中的另一终端装置。第三终端装置130和第四终端装置140中的每个终端装置还可接收由第三终端装置130和第四终端装置140中的另一终端装置传输的已编码视频数据,且可对已编码视频数据进行解码以恢复视频数据,并可根据恢复的视频数据在可访问的显示装置上显示视频图片。
在图1的实施例中,第一终端装置110、第二终端装置120、第三终端装置130和第四终端装置140可为服务器、个人计算机和智能电话,但本申请公开的原理可不限于此。本申请公开的实施例适用于膝上型计算机、平板电脑、媒体播放器和/或专用视频会议设备。网络150表示在第一终端装置110、第二终端装置120、第三终端装置130和第四终端装置140之间传送已编码视频数据的任何数目的网络,包括例如有线和/或无线通信网络。通信网络150可在电路交换和/或分组交换信道中交换数据。该网络可包括电信网络、 局域网、广域网和/或互联网。出于本申请的目的,除非在下文中有所解释,否则网络150的架构和拓扑对于本申请公开的操作来说可能是无关紧要的。
在本申请的一个实施例中,图2示出了视频编码装置和视频解码装置在流式传输环境中的放置方式。本申请所公开主题可同等地适用于其它支持视频的应用,包括例如视频会议、数字TV(television,电视机)、在包括CD、DVD、存储棒等的数字介质上存储压缩视频等等。
流式传输系统可包括采集子系统213,采集子系统213可包括数码相机等视频源201,视频源创建未压缩的视频图片流202。在实施例中,视频图片流202包括由数码相机拍摄的样本。相较于已编码的视频数据204(或已编码的视频码流204),视频图片流202被描绘为粗线以强调高数据量的视频图片流,视频图片流202可由电子装置220处理,电子装置220包括耦接到视频源201的视频编码装置203。视频编码装置203可包括硬件、软件或软硬件组合以实现或实施如下文更详细地描述的所公开主题的各方面。相较于视频图片流202,已编码的视频数据204(或已编码的视频码流204)被描绘为细线以强调较低数据量的已编码的视频数据204(或已编码的视频码流204),其可存储在流式传输服务器205上以供将来使用。一个或多个流式传输客户端子系统,例如图2中的客户端子系统206和客户端子系统208,可访问流式传输服务器205以检索已编码的视频数据204的副本207和副本209。客户端子系统206可包括例如电子装置230中的视频解码装置210。视频解码装置210对已编码的视频数据的传入副本207进行解码,且产生可在显示器212(例如显示屏)或另一呈现装置上呈现的输出视频图片流211。在一些流式传输系统中,可根据某些视频编码/压缩标准对已编码的视频数据204、视频数据207和视频数据209(例如视频码流)进行编码。
应注意,电子装置220和电子装置230可包括图中未示出的其它组件。举例来说,电子装置220可包括视频解码装置,且电子装置230还可包括视频编码装置。
在本申请的一个实施例中,以国际视频编解码标准HEVC(High Efficiency Video Coding,高效率视频编解码)、VVC(Versatile Video Coding,多功能视频编解码),以及中国国家视频编解码标准AVS为例,当输入一个视频帧图像之后,会根据一个块大小,将视频帧图像划分成若干个不重叠的处理单元,每个处理单元将进行类似的压缩操作。这个处理单元被称作CTU(Coding Tree Unit,编码树单元),或者称之为LCU(Largest Coding Unit,最大编码单元)。CTU再往下可以继续进行更加精细的划分,得到一个或多个基本的编码单元CU(Coding Unit,编码单元),CU是一个编码环节中最基本的元素。
以下介绍对CU进行编码时的一些概念:
预测编码(Predictive Coding):预测编码包括了帧内预测和帧间预测等方式,原始视频信号经过选定的已重建视频信号的预测后,得到残差视频信号。编码端需要为当前CU决定选择哪一种预测编码模式,并告知解码端。其中,帧内预测是指预测的信号来自于同一图像内已经编码重建过的区域;帧间预测是指预测的信号来自已经编码过的、不同于当前图像的其它图像(称之为参考图像)。
变换及量化(Transform&Quantization):残差视频信号经过DFT(Discrete Fourier  Transform,离散傅里叶变换)、DCT(Discrete Cosine Transform,离散余弦变换)等变换操作后,将信号转换到变换域中,称之为变换系数。变换系数进一步进行有损的量化操作,丢掉一定的信息,使得量化后的信号有利于压缩表达。在一些视频编码标准中,可能有多于一种变换方式可以选择,因此编码端也需要为当前CU选择其中的一种变换方式,并告知解码端。量化的精细程度通常由量化参数(Quantization Parameter,简称QP)来决定,QP取值较大,表示更大取值范围的系数将被量化为同一个输出,因此通常会带来更大的失真及较低的码率;相反,QP取值较小,表示较小取值范围的系数将被量化为同一个输出,因此通常会带来较小的失真,同时对应较高的码率。
熵编码(Entropy Coding)或统计编码:量化后的变换域信号将根据各个值出现的频率进行统计压缩编码,最后输出二值化(0或者1)的压缩码流。同时,编码产生的其他信息,例如选择的编码模式、运动矢量数据等,也需要进行熵编码以降低码率。统计编码是一种无损的编码方式,可以有效的降低表达同样信号所需要的码率,常见的统计编码方式有变长编码(Variable Length Coding,简称VLC)或者基于上下文的二值化算术编码(Content Adaptive Binary Arithmetic Coding,简称CABAC)。
基于上下文的二值化算术编码(CABAC)过程主要包含3个步骤:二值化、上下文建模和二进制算术编码。在对输入的语法元素进行二值化处理后,可以通过常规编码模式和旁路编码模式(Bypass Coding Mode)对二元数据进行编码。旁路编码模式无须为每个二元位分配特定的概率模型,输入的二元位bin值直接用一个简单的旁路编码器进行编码,以加快整个编码以及解码的速度。一般情况下,不同的语法元素之间并不是完全独立的,且相同语法元素自身也具有一定的记忆性。因此,根据条件熵理论,利用其他已编码的语法元素进行条件编码,相对于独立编码或者无记忆编码能够进一步提高编码性能。这些用来作为条件的已编码信息称为上下文。在常规编码模式中,语法元素的二元位顺序地进入上下文编码器,编码器根据先前编码过的语法元素或二元位的值,为每一个输入的二元位分配合适的概率模型,该过程即为上下文建模。通过ctxIdxInc(context index increment,上下文索引增量)和ctxIdxStart(context index Start,上下文起始索引)即可定位到语法元素所对应的上下文模型。将bin值和分配的概率模型一起送入二元算术编码器进行编码后,需要根据bin值更新上下文模型,也就是编码中的自适应过程。
环路滤波(Loop Filtering):经过变换及量化的信号会通过反量化、反变换及预测补偿的操作获得重建图像。重建图像与原始图像相比由于存在量化的影响,部分信息与原始图像有所不同,即重建图像会产生失真(Distortion)。因此,可以对重建图像进行滤波操作,例如去块效应滤波(Deblocking filter,简称DB)、SAO(Sample Adaptive Offset,自适应像素补偿)或者ALF(Adaptive Loop Filter,自适应环路滤波)等,可以有效降低量化所产生的失真程度。由于这些经过滤波后的重建图像将作为后续编码图像的参考来对将来的图像信号进行预测,因此上述的滤波操作也被称为环路滤波,即在编码环路内的滤波操作。
在本申请的一个实施例中,图3示出了一个视频编码器的基本流程图,在该流程中以帧内预测为例进行说明。其中,原始图像信号s k[x,y]与预测图像信号
Figure PCTCN2022118321-appb-000001
做差值运 算,得到残差信号u k[x,y],残差信号u k[x,y]经过变换及量化处理之后得到量化系数,量化系数一方面通过熵编码得到编码后的比特流,另一方面通过反量化及反变换处理得到重构残差信号u' k[x,y],预测图像信号
Figure PCTCN2022118321-appb-000002
与重构残差信号u' k[x,y]叠加生成重建图像信号
Figure PCTCN2022118321-appb-000003
重建图像信号
Figure PCTCN2022118321-appb-000004
一方面输入至帧内模式决策模块和帧内预测模块进行帧内预测处理,另一方面通过环路滤波进行滤波处理,并输出滤波后的图像信号s' k[x,y],滤波后的图像信号s' k[x,y]可以作为下一帧的参考图像进行运动估计及运动补偿预测。然后基于运动补偿预测的结果s' r[x+m x,y+m y]和帧内预测结果
Figure PCTCN2022118321-appb-000005
得到下一帧的预测图像信号
Figure PCTCN2022118321-appb-000006
并继续重复上述过程,直至编码完成。
上述的环路滤波可以基于CNNLF(Convolutional Neural Network Loop Filter,基于深度学习的滤波器)来实现,CNNLF的模型结构包括卷积层、激活函数、全连接层和池化层等基本模块,模型参数需要通过训练得到。如图4所示,在CNNLF经过训练完成之后,可以将滤波前的图像输入至训练好的CNNLF中,最后输出滤波后的图像。
针对图像的滤波任务,可以针对亮度分量(Y)和色度分量(Cb和Cr)单独训练不同的滤波模型来提升针对亮度分量和色度分量的滤波性能。比如,对于图5所示的方式A而言,可以通过2个深度学习滤波器分别针对Y分量和{Cb,Cr}分量进行滤波处理;或者可以如方式B所示,通过3个深度学习滤波器分别针对Y分量、Cb分量和Cr分量进行滤波处理。不管是图5中所示的方式A还是方式B,由于Y分量包含更多的纹理信息,因此在对色度分量Cb和/或Cr进行滤波时往往引入Y分量的信息以提升滤波器的分类精度,从而提升最终的滤波性能。
而为了提升亮度分量滤波器的性能,本申请实施例中提出了利用已有的信息提升针对亮度分量的深度学习滤波器的性能。比如如图6所示,可以利用色度分量{Cb,Cr}的信息来提升针对亮度分量Y的深度学习滤波器的性能。在实施例中,色度分量的信息可以是色度分量滤波前的图像(即色度分量重建图像)、色度分量的预测图像、色度分量的块划分信息中的一种或多种。
其中,深度学习属于人工智能(Artificial Intelligence,简称AI)的范畴,而机器学习(Machine Learning,简称ML)是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。本申请实施例中的深度学习滤波器即是基于机器学习/深度学习的一种滤波器。
以下对本申请实施例的技术方案的实现细节进行详细阐述:
图7示出了根据本申请的一个实施例的基于深度学习的滤波方法的流程图,该基于深度学习的滤波方法可以由具有计算、存储等处理功能的设备来执行,比如可以由终端设备或服务器来执行。参照图7所示,该基于深度学习的滤波方法至少包括步骤S710至步骤S730,详细介绍如下:
在步骤S710中,获取已编码图像对应的亮度分量重建图像和已编码图像对应的色度分量信息。
在本申请的一个实施例中,已编码图像对应的重建图像即是通过反量化、反变换处理后得到的重构残差图像与预测图像叠加生成的图像,比如在图3所示的流程中,重建图像即为预测图像信号
Figure PCTCN2022118321-appb-000007
与重构残差信号u' k[x,y]叠加生成的图像信号
Figure PCTCN2022118321-appb-000008
亮 度分量重建图像即为已编码图像对应的重建图像中的亮度部分。
在实施例中,色度分量信息包括以下至少一个:已编码图像对应的色度分量滤波前图像、已编码图像对应的色度分量预测图像、已编码图像对应的色度分量块划分信息。
在实施例中,色度分量块划分信息可以是根据色度分量块划分结果生成的图像,比如可以包括以下图像中的至少一个:根据色度分量块划分边界生成的二值图像;根据去块滤波器对色度分量重建图像的滤波边界所生成的二值图像;对于滤波前的色度分量重建图像,根据块划分结果,同一个块内取平均作为块内所有样点的值所得到的色度分量块划分均值图像。
在步骤S720中,根据亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数。
在本申请的一个实施例中,可以将色度分量信息进行上采样处理,得到与亮度分量重建图像尺寸相同的色度分量信息,然后将亮度分量重建图像与上采样处理后的色度分量信息进行图层合并处理,将图层合并处理的结果作为深度学习滤波器的输入参数。
在实施例中,上采样处理可以采用上采样滤波器实现或者通过深度学习模块实现。上采样滤波器比如可以是基于Lanczos(一种将对称矩阵通过正交相似变换变成对称三对角矩阵的算法)、nearest(最邻近插值算法)、bilinear(双线性插值算法)、bicubic(双立方插值算法)等算法实现的。
在本申请的一个实施例中,上采样处理后的色度分量信息可以是色度分量重建图像,如图8A所示,可以将亮度分量重建图像(即Y分量)和上采样处理后的色度分量重建图像(即Cb和Cr分量)分别作为一个图层进行合并处理,之后输入至深度学习滤波器中,最后得到亮度分量滤波后的图像Y'。
在实施例中,该深度学习滤波器包括顺次相连的卷积单元、残差单元和重排单元。其中,卷积单元包括卷积层(Convolutional layer,简称conv)和参数化修正线性单元(Parametric Rectified Linear Unit,简称prelu);残差单元包括顺次相连的N个残差块;重排单元即为shuffle单元。Shuffle单元能够执行shuffle()函数把数组中的元素按随机顺序重新排列,此处是通过重排来实现图像的上采样处理。在图8A中所示的实施例中,残差单元之前的卷积单元中卷积层的步长s=2;在残差单元与重排单元(shuffle单元)之间还可以设置另一个卷积层,该卷积层的步长s=1。
在本申请的一个实施例中,上采样处理后的色度分量信息可以是色度分量重建图像、色度分量预测图像和色度分量块划分信息。如图8B所示,可以将亮度分量重建图像、亮度分量预测图像和亮度分量块划分信息(即Y分量)和上采样处理后的色度分量重建图像、色度分量预测图像和色度分量块划分信息(即Cb和Cr分量)分别作为一个图层进行合并处理,之后输入至深度学习滤波器中,最后得到亮度分量滤波后的图像Y'。
在实施例中,该深度学习滤波器包括顺次相连的卷积单元、残差单元和重排单元。其中,卷积单元包括卷积层conv和参数化修正线性单元prelu;残差单元包括顺次相连的N个残差块;重排单元即为shuffle单元。在图8B中所示的实施例中,残差单元之前的卷积单元中卷积层的步长s=2;在残差单元与重排单元(shuffle单元)之间还可以设置另一个卷积层,该卷积层的步长s=1。
需要说明的是,与色度分量块划分信息类似,本申请实施例中的亮度分量块划分信息可以是根据亮度分量块划分结果生成的图像,比如可以包括以下图像中的至少一个:根据亮度分量块划分边界生成的二值图像;根据去块滤波器对亮度分量重建图像的滤波边界所生成的二值图像;对于滤波前的亮度分量重建图像,根据块划分结果,同一个块内取平均作为块内所有样点的值所得到的亮度分量块划分均值图像。
在本申请的一个实施例中,可以将色度分量信息进行上采样处理,得到与亮度分量重建图像尺寸相同的色度分量信息,该色度分量信息包括:色度分量滤波前图像、色度分量预测图像、色度分量块划分信息;然后对亮度分量重建图像与上采样处理后的色度分量滤波前图像进行合并,并提取合并后的图像特征,得到第一特征;对已编码图像对应的亮度分量预测图像与上采样处理后的色度分量预测图像进行合并,并提取合并后的图像特征,得到第二特征;对已编码图像对应的亮度分量块划分信息与上采样处理后的色度分量块划分信息进行合并,并提取合并后的图像特征,得到第三特征;然后根据第一特征、第二特征和第三特征生成深度学习滤波器的输入参数。
具体可以如图9所示,亮度分量预测图像与上采样处理后的色度分量预测图像合并后经过卷积单元(卷积单元包括卷积层conv和参数化修正线性单元prelu,该卷积层的步长s=1)来提取特征;亮度分量块划分信息与上采样处理后的色度分量块划分信息合并后经过卷积单元(卷积单元包括卷积层conv和参数化修正线性单元prelu,该卷积层的步长s=1)来提取特征;亮度分量重建图像与上采样处理后的色度分量重建图像合并后经过卷积单元(卷积单元包括卷积层conv和参数化修正线性单元prelu,该卷积层的步长s=1)来提取特征;然后将这些特征输入至深度学习滤波器中,最后得到亮度分量滤波后的图像Y'。
在图9所示的实施例中,深度学习滤波器包括顺次相连的卷积单元、残差单元和重排单元。其中,卷积单元包括步长s=1的卷积层、参数化修正线性单元prelu、以及步长s=2的卷积层;残差单元包括顺次相连的N个残差块;重排单元即为shuffle单元。在图9中所示的实施例中,在残差单元与重排单元(shuffle单元)之间还可以设置一个卷积层,该卷积层的步长s=1。
在图9所示的实施例中,是以色度分量信息包括:色度分量滤波前图像(即色度分量重建图像)、色度分量预测图像、色度分量块划分信息为例进行的说明。在本申请的其它实施例中,色度分量信息也可以包括色度分量滤波前图像(即色度分量重建图像)、色度分量预测图像、色度分量块划分信息中的部分。如果色度分量信息不包括色度分量预测图像,那么图9所示实施例中可以去掉通过卷积单元对预测图像(Y,Cb,Cr)进行特征提取的部分;如果色度分量信息不包括色度分量块划分信息,那么图9所示实施例中可以去掉通过卷积单元对块划分信息(Y,Cb,Cr)进行特征提取的部分。
在本申请的一个实施例中,残差单元中包含的N个残差块可以是1个、2个、3个等任意正整数数量个残差块。
如图10所示,本申请一个实施例中的残差块结构可以包括:顺次相连的第一卷积层(其卷积核大小可以为1×1)、参数化修正线性单元和第二卷积层(其卷积核大小可以为3×3);其中,第一卷积层的输入作为残差块的输入,第一卷积层的输入与第二卷积 层的输出的叠加结果作为残差块的输出。
如图11所示,本申请一个实施例中的残差块结构可以包括:顺次相连的第一卷积层(其卷积核大小可以为1×1)、参数化修正线性单元、第二卷积层(其卷积核大小可以为3×3)和卷积块注意力单元(Convolutional Block Attention Module,简称CBAM);其中,第一卷积层的输入作为残差块的输入,第一卷积层的输入与卷积块注意力单元的输出的叠加结果作为残差块的输出。
如图12所示,本申请一个实施例中的残差块结构可以包括:顺次相连的第一卷积层(其卷积核大小可以为1×1)、参数化修正线性单元、第三卷积层(其卷积核大小可以为1×1)和第二卷积层(其卷积核大小可以为3×3);其中,第一卷积层的卷积核与第三卷积层的卷积核大小相同,第一卷积层的输入作为残差块的输入,第一卷积层的输入与第二卷积层的输出的叠加结果作为残差块的输出。
如图13所示,本申请一个实施例中的残差块结构可以包括:顺次相连的第一卷积层(其卷积核大小可以为1×1)、参数化修正线性单元、第三卷积层(其卷积核大小可以为1×1)、第二卷积层(其卷积核大小可以为3×3)和卷积块注意力单元;其中,第一卷积层的卷积核与第三卷积层的卷积核大小相同,第一卷积层的输入作为残差块的输入,第一卷积层的输入与卷积块注意力单元的输出的叠加结果作为残差块的输出。
需要说明的是,在本申请的实施例中,残差单元中可以包含有一个或多个残差块,每个残差块的结构可以是上述的任意一种。深度学习滤波器的卷积单元包含的卷积层和prelu层的数量可以根据实际需要进行设定,残差块中包含的卷积层和prelu层的数量也可以根据实际需要进行设定。同时,不同卷积层的通道数可以相同,也可以不相同。
在步骤S730中,将生成的输入参数输入至深度学习滤波器,得到深度学习滤波器输出的对应于亮度分量重建图像的已滤波图像。
在本申请的一个实施例中,深度学习滤波器在进行训练时需要采用与应用时相同的参数作为输入。具体而言,在训练阶段,需要获取到亮度分量的样本重建图像和相应的色度分量信息(该色度分量信息根据深度学习滤波器的使用场景进行调整,即与深度学习滤波器在使用时所用到的参数相匹配),然后根据亮度分量的样本重建图像和色度分量信息生成训练用的输入参数,然后将得到的输入参数输入至深度学习滤波器中,根据深度学习滤波器的输出与亮度分量的样本重建图像对应的期望滤波结果图像之间的损失值来调整深度学习滤波器的参数,并重复这个过程,直至深度学习滤波器满足收敛条件为止。
在图7所示的基于深度学习的滤波方法的基础上,本申请实施例中还提出了一种视频编码方法,该视频编码方法可以由具有计算、存储等处理功能的设备来执行,比如可以由终端设备或服务器来执行。具体流程如图14所示,包括如下步骤S1410至步骤S1440:
在步骤S1410中,获取已编码图像对应的亮度分量重建图像和已编码图像对应的色度分量信息。
该步骤的具体实施细节可以参照前述步骤S710,不再赘述。
在步骤S1420中,根据亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数。
该步骤的具体实施细节可以参照前述步骤S720,不再赘述。
在步骤S1430中,将生成的输入参数输入至深度学习滤波器,得到深度学习滤波器输出的对应于亮度分量重建图像的已滤波图像。
在步骤S1440中,基于亮度分量重建图像的已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于该亮度分量预测图像对下一帧视频图像进行编码处理。
在实施例中,在生成已滤波图像之后,可以参照图3所示的流程,即亮度分量重建图像的已滤波图像作为下一帧图像的亮度分量参考图像进行运动估计及运动补偿预测,然后基于运动补偿预测的结果和帧内预测结果得到下一帧图像的亮度分量预测图像,并继续重复图3中所示的流程,直至对视频图像编码完成。
相应的,在图7所示的基于深度学习的滤波方法的基础上,本申请实施例中还提出了一种视频解码方法,该视频解码方法可以由具有计算、存储等处理功能的设备来执行,比如可以由终端设备或服务器来执行。具体流程如图15所示,包括如下步骤S1510至步骤S1540:
在步骤S1510中,获取已编码图像对应的亮度分量重建图像和已编码图像对应的色度分量信息。
该步骤的具体实施细节可以参照前述步骤S710,不再赘述。
在步骤S1520中,根据亮度分量重建图像和色度分量信息生成深度学习滤波器的输入参数。
该步骤的具体实施细节可以参照前述步骤S720,不再赘述。
在步骤S1530中,将生成的输入参数输入至深度学习滤波器,得到深度学习滤波器输出的对应于亮度分量重建图像的已滤波图像。
在步骤S1540中,基于亮度分量重建图像的已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于该亮度分量预测图像对视频码流进行解码处理。
在实施例中,在生成亮度分量重建图像的已滤波图像之后,可以将亮度分量重建图像的已滤波图像作为下一帧的亮度分量参考图像进行运动估计及运动补偿预测,然后基于运动补偿预测的结果和帧内预测结果得到下一帧图像的亮度分量预测图像,亮度分量预测图像与进行反量化和反变换处理得到的亮度分量重构残差信号再次叠加生成下一帧的亮度分量重建图像,并重复这个过程,以实现对视频码流的解码处理。
本申请实施例的技术方案使得在对图像的亮度分量进行滤波处理时,能够充分利用色度分量的信息,进而可以利用已有的色度分量信息提升针对亮度分量的深度学习滤波器的性能,从而可以提高滤波效果,有利于提升视频的编解码效率。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的基于深度学习的滤波方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的基于深度学习的滤波方法的实施例。
图16示出了根据本申请的一个实施例的基于深度学习的滤波装置的框图,该基于深度学习的滤波装置可以设置在具有计算、存储等处理功能的设备内,比如可以设置在终端设备或服务器内。
参照图16所示,根据本申请的一个实施例的基于深度学习的滤波装置1600,包括: 获取单元1602、生成单元1604和处理单元1606。
其中,获取单元1602配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;生成单元1604配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;处理单元1606配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像。
在本申请的一些实施例中,基于前述方案,所述色度分量信息包括以下至少一个:所述已编码图像对应的色度分量滤波前图像、所述已编码图像对应的色度分量预测图像、所述已编码图像对应的色度分量块划分信息。
在本申请的一些实施例中,基于前述方案,所述生成单元1604配置为:将所述色度分量信息进行上采样处理,得到与所述亮度分量重建图像尺寸相同的色度分量信息;将所述亮度分量重建图像与上采样处理后的色度分量信息进行图层合并处理,将图层合并处理的结果作为所述深度学习滤波器的输入参数。
在本申请的一些实施例中,基于前述方案,所述生成单元1604配置为:将所述色度分量信息进行上采样处理,得到与所述亮度分量重建图像尺寸相同的色度分量信息,所述色度分量信息包括:色度分量滤波前图像、色度分量预测图像、色度分量块划分信息;对所述亮度分量重建图像与上采样处理后的色度分量滤波前图像进行合并,并提取合并后的图像特征,得到第一特征;对所述已编码图像对应的亮度分量预测图像与上采样处理后的色度分量预测图像进行合并,并提取合并后的图像特征,得到第二特征;对所述已编码图像对应的亮度分量块划分信息与上采样处理后的色度分量块划分信息进行合并,并提取合并后的图像特征,得到第三特征;根据所述第一特征、所述第二特征和所述第三特征生成所述输入参数。
在本申请的一些实施例中,基于前述方案,所述深度学习滤波器包括顺次相连的卷积单元、残差单元和重排单元,所述残差单元中包含有至少一个残差块。
在本申请的一些实施例中,基于前述方案,若所述残差单元包括多个残差块,则所述多个残差块的通道数相同,或者所述多个残差块中各个残差块的通道数不完全相同。
在本申请的一些实施例中,基于前述方案,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元和第二卷积层;其中,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述第二卷积层的输出的叠加结果作为所述残差块的输出。
在本申请的一些实施例中,基于前述方案,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元、第二卷积层和卷积块注意力单元;其中,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述卷积块注意力单元的输出的叠加结果作为所述残差块的输出。
在本申请的一些实施例中,基于前述方案,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元、第三卷积层和第二卷积层;其中,所述第一卷积层的卷积核与所述第三卷积层的卷积核大小相同,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述第二卷积层的输出的叠加结果作为所述残差块的输出。
在本申请的一些实施例中,基于前述方案,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元、第三卷积层、第二卷积层和卷积块注意力单元;其中,所述第一卷积层的卷积核与所述第三卷积层的卷积核大小相同,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述卷积块注意力单元的输出的叠加结果作为所述残差块的输出。
图17示出了根据本申请的一个实施例的视频编码装置的框图,该视频编码装置可以设置在具有计算、存储等处理功能的设备内,比如可以设置在终端设备或服务器内。
参照图17所示,根据本申请的一个实施例的视频编码装置1700,包括:获取单元1602、生成单元1604、处理单元1606和编码单元1702。
其中,获取单元1602配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;生成单元1604配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;处理单元1606配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;编码单元1702配置为基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对所述下一帧视频图像进行编码处理。
图18示出了根据本申请的一个实施例的视频解码装置的框图,该视频解码装置可以设置在具有计算、存储等处理功能的设备内,比如可以设置在终端设备或服务器内。
参照图18所示,根据本申请的一个实施例的视频解码装置1800,包括:获取单元1602、生成单元1604、处理单元1606和解码单元1802。
其中,获取单元1602配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;生成单元1604配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;处理单元1606配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;解码单元1802配置为基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对视频码流进行解码处理。
图19示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图19示出的电子设备的计算机系统1900仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图19所示,计算机系统1900包括中央处理单元(Central Processing Unit,CPU)1901,其可以根据存储在只读存储器(Read-Only Memory,ROM)1902中的程序或者从存储部分1908加载到随机访问存储器(Random Access Memory,RAM)1903中的程序而执行各种适当的动作和处理,例如执行上述实施例中所述的方法。在RAM 1903中,还存储有系统操作所需的各种程序和数据。CPU 1901、ROM 1902以及RAM 1903通过总线1904彼此相连。输入/输出(Input/Output,I/O)接口1905也连接至总线1904。
以下部件连接至I/O接口1905:包括键盘、鼠标等的输入部分1906;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1907;包括硬盘等的存储部分1908;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1909。通信部分1909经 由诸如因特网的网络执行通信处理。驱动器1910也根据需要连接至I/O接口1905。可拆卸介质1911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1910上,以便于从其上读出的计算机程序根据需要被安装入存储部分1908。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的计算机程序。在这样的实施例中,该计算机程序可以通过通信部分1909从网络上被下载和安装,和/或从可拆卸介质1911被安装。在该计算机程序被中央处理单元(CPU)1901执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的计算机程序。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的计算机程序可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。其中,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上 述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备实现上述实施例中所述的方法。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (16)

  1. 一种基于深度学习的滤波方法,包括:
    获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;
    根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;
    将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像。
  2. 根据权利要求1所述的基于深度学习的滤波方法,其中,所述色度分量信息包括以下至少一个:
    所述已编码图像对应的色度分量滤波前图像、所述已编码图像对应的色度分量预测图像、所述已编码图像对应的色度分量块划分信息。
  3. 根据权利要求1所述的基于深度学习的滤波方法,其中,根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数,包括:
    将所述色度分量信息进行上采样处理,得到与所述亮度分量重建图像尺寸相同的色度分量信息;
    将所述亮度分量重建图像与上采样处理后的色度分量信息进行图层合并处理,将图层合并处理的结果作为所述深度学习滤波器的输入参数。
  4. 根据权利要求1所述的基于深度学习的滤波方法,其中,根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数,包括:
    将所述色度分量信息进行上采样处理,得到与所述亮度分量重建图像尺寸相同的色度分量信息,所述色度分量信息包括:色度分量滤波前图像、色度分量预测图像、色度分量块划分信息;
    对所述亮度分量重建图像与上采样处理后的色度分量滤波前图像进行合并,并提取合并后的图像特征,得到第一特征;
    对所述已编码图像对应的亮度分量预测图像与上采样处理后的色度分量预测图像进行合并,并提取合并后的图像特征,得到第二特征;
    对所述已编码图像对应的亮度分量块划分信息与上采样处理后的色度分量块划分信息进行合并,并提取合并后的图像特征,得到第三特征;
    根据所述第一特征、所述第二特征和所述第三特征生成所述输入参数。
  5. 根据权利要求1至4中任一项所述的基于深度学习的滤波方法,其中,所述深度学习滤波器包括顺次相连的卷积单元、残差单元和重排单元,所述残差单元中包含有至少一个残差块。
  6. 根据权利要求5所述的基于深度学习的滤波方法,其中,若所述残差单元包括多个残差块,则所述多个残差块的通道数相同,或者所述多个残差块中各个残差块的通道数不完全相同。
  7. 根据权利要求5所述的基于深度学习的滤波方法,其中,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元和第二卷积层;
    其中,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述第二卷积层的输出的叠加结果作为所述残差块的输出。
  8. 根据权利要求5所述的基于深度学习的滤波方法,其中,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元、第二卷积层和卷积块注意力单元;
    其中,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述卷积块注意力单元的输出的叠加结果作为所述残差块的输出。
  9. 根据权利要求5所述的基于深度学习的滤波方法,其中,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元、第三卷积层和第二卷积层;
    其中,所述第一卷积层的卷积核与所述第三卷积层的卷积核大小相同,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述第二卷积层的输出的叠加结果作为所述残差块的输出。
  10. 根据权利要求5所述的基于深度学习的滤波方法,其中,一个所述残差块包括:顺次相连的第一卷积层、参数化修正线性单元、第三卷积层、第二卷积层和卷积块注意力单元;
    其中,所述第一卷积层的卷积核与所述第三卷积层的卷积核大小相同,所述第一卷积层的输入作为所述残差块的输入,所述第一卷积层的输入与所述卷积块注意力单元的输出的叠加结果作为所述残差块的输出。
  11. 一种视频编码方法,包括:
    获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;
    根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;
    将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;
    基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对所述下一帧视频图像进行编码处理。
  12. 一种视频解码方法,包括:
    获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;
    根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;
    将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像;
    基于所述已滤波图像生成下一帧图像对应的亮度分量预测图像,并基于所述亮度分量预测图像对视频码流进行解码处理。
  13. 一种基于深度学习的滤波装置,包括:
    获取单元,配置为获取已编码图像对应的亮度分量重建图像和所述已编码图像对应的色度分量信息;
    生成单元,配置为根据所述亮度分量重建图像和所述色度分量信息生成深度学习滤波器的输入参数;
    处理单元,配置为将所述输入参数输入至所述深度学习滤波器,得到所述深度学习滤波器输出的对应于所述亮度分量重建图像的已滤波图像。
  14. 一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至10中任一项所述的基于深度学习的滤波方法,或实现如权利要求11所述的视频编码方法,或实现如权利要求12所述的视频解码方法。
  15. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述电子设备实现如权利要求1至10中任一项所述的基于深度学习的滤波方法,或实现如权利要求11所述的视频编码方法,或实现如权利要求12所述的视频解码方法。
  16. 一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机指令,所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行权利要求1至10中任一项所述的基于深度学习的滤波方法,或执行权利要求11所述的视频编码方法,或执行权利要求12所述的视频解码方法。
PCT/CN2022/118321 2021-09-28 2022-09-13 滤波及编解码方法、装置、计算机可读介质及电子设备 WO2023051223A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/472,484 US20240015336A1 (en) 2021-09-28 2023-09-22 Filtering method and apparatus, computer-readable medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111144705.7A CN115883842A (zh) 2021-09-28 2021-09-28 滤波及编解码方法、装置、计算机可读介质及电子设备
CN202111144705.7 2021-09-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/472,484 Continuation US20240015336A1 (en) 2021-09-28 2023-09-22 Filtering method and apparatus, computer-readable medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2023051223A1 true WO2023051223A1 (zh) 2023-04-06

Family

ID=85763623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118321 WO2023051223A1 (zh) 2021-09-28 2022-09-13 滤波及编解码方法、装置、计算机可读介质及电子设备

Country Status (3)

Country Link
US (1) US20240015336A1 (zh)
CN (1) CN115883842A (zh)
WO (1) WO2023051223A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040034611A1 (en) * 2002-08-13 2004-02-19 Samsung Electronics Co., Ltd. Face recognition method using artificial neural network and apparatus thereof
CN1816149A (zh) * 2005-02-06 2006-08-09 腾讯科技(深圳)有限公司 去除视频图像中块效应的滤波方法及环路滤波器
WO2017222140A1 (ko) * 2016-06-24 2017-12-28 한국과학기술원 Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치
CN111194555A (zh) * 2017-08-28 2020-05-22 交互数字Vc控股公司 用模式感知深度学习进行滤波的方法和装置
WO2020177134A1 (zh) * 2019-03-07 2020-09-10 Oppo广东移动通信有限公司 环路滤波实现方法、装置及计算机存储介质
WO2020177133A1 (zh) * 2019-03-07 2020-09-10 Oppo广东移动通信有限公司 环路滤波实现方法、装置及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040034611A1 (en) * 2002-08-13 2004-02-19 Samsung Electronics Co., Ltd. Face recognition method using artificial neural network and apparatus thereof
CN1816149A (zh) * 2005-02-06 2006-08-09 腾讯科技(深圳)有限公司 去除视频图像中块效应的滤波方法及环路滤波器
WO2017222140A1 (ko) * 2016-06-24 2017-12-28 한국과학기술원 Cnn 기반 인루프 필터를 포함하는 부호화 방법과 장치 및 복호화 방법과 장치
CN111194555A (zh) * 2017-08-28 2020-05-22 交互数字Vc控股公司 用模式感知深度学习进行滤波的方法和装置
WO2020177134A1 (zh) * 2019-03-07 2020-09-10 Oppo广东移动通信有限公司 环路滤波实现方法、装置及计算机存储介质
WO2020177133A1 (zh) * 2019-03-07 2020-09-10 Oppo广东移动通信有限公司 环路滤波实现方法、装置及计算机存储介质

Also Published As

Publication number Publication date
CN115883842A (zh) 2023-03-31
US20240015336A1 (en) 2024-01-11

Similar Documents

Publication Publication Date Title
CN111711824B (zh) 视频编解码中的环路滤波方法、装置、设备及存储介质
CN113766249B (zh) 视频编解码中的环路滤波方法、装置、设备及存储介质
WO2022078163A1 (zh) 视频解码方法、视频编码方法及相关装置
CN108881913B (zh) 图像编码的方法和装置
WO2022062880A1 (zh) 视频解码方法、装置、计算机可读介质及电子设备
WO2022078304A1 (zh) 视频解码方法、装置、计算机可读介质、程序及电子设备
WO2022063033A1 (zh) 视频解码方法、视频编码方法、装置、计算机可读介质及电子设备
US20120263225A1 (en) Apparatus and method for encoding moving picture
WO2022174660A1 (zh) 视频编解码方法、装置、计算机可读介质及电子设备
WO2022116836A1 (zh) 视频解码方法、视频编码方法、装置及设备
CN111669588A (zh) 一种超低时延的超高清视频压缩编解码方法
CN115956363A (zh) 用于后滤波的内容自适应在线训练方法及装置
WO2022063035A1 (zh) 上下文模型的选择方法、装置、设备及存储介质
CN117528074A (zh) 用于从多个交叉分量进行预测的视频编码、解码方法和设备
CN111182310A (zh) 视频处理方法、装置、计算机可读介质及电子设备
WO2023051223A1 (zh) 滤波及编解码方法、装置、计算机可读介质及电子设备
WO2022105678A1 (zh) 视频解码方法、视频编码方法及相关装置
WO2023051222A1 (zh) 滤波及编解码方法、装置、计算机可读介质及电子设备
CN106954074B (zh) 一种视频数据处理方法和装置
CN111212288B (zh) 视频数据的编解码方法、装置、计算机设备和存储介质
CN115209157A (zh) 视频编解码方法、装置、计算机可读介质及电子设备
WO2023202097A1 (zh) 环路滤波方法、视频编解码方法、装置、介质、程序产品及电子设备
US20240144439A1 (en) Filtering method and apparatus, computer-readable medium
WO2023130899A1 (zh) 环路滤波方法、视频编解码方法、装置、介质及电子设备
WO2022174701A1 (zh) 视频编解码方法、装置、计算机可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874610

Country of ref document: EP

Kind code of ref document: A1