WO2021196087A1 - 视频增强的方法及装置 - Google Patents

视频增强的方法及装置 Download PDF

Info

Publication number
WO2021196087A1
WO2021196087A1 PCT/CN2020/082815 CN2020082815W WO2021196087A1 WO 2021196087 A1 WO2021196087 A1 WO 2021196087A1 CN 2020082815 W CN2020082815 W CN 2020082815W WO 2021196087 A1 WO2021196087 A1 WO 2021196087A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
image block
enhancement
block
enhanced
Prior art date
Application number
PCT/CN2020/082815
Other languages
English (en)
French (fr)
Inventor
陈亮
孙凤宇
兰传骏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080001816.XA priority Critical patent/CN113767626B/zh
Priority to PCT/CN2020/082815 priority patent/WO2021196087A1/zh
Publication of WO2021196087A1 publication Critical patent/WO2021196087A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N11/00Colour television systems
    • H04N11/04Colour television systems using pulse code modulation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/114Adapting the group of pictures [GOP] structure, e.g. number of B-frames between two anchor frames

Definitions

  • This application relates to the technical field of video image processing, and in particular to a method and device for video enhancement.
  • VESPCN video efficient sub-pixel convolutional neural network
  • the computing power requirement is 2.0T
  • the computing power requirement of frame recurrent video super-resolution (FRVSR) is 12.30T
  • the computing power requirement of super-resolution generative adversarial network (SRGAN) is close 40T.
  • This kind of computing overhead can hardly meet the real-time processing needs on the current chip platform. Even if other methods, such as model compression and quantization, are used to barely run on the chip platform in real time, the power consumption overhead brought by it is huge. This will cause the device to become hot after watching the video for a few minutes.
  • the present application provides a method and device for video enhancement, which can improve the efficiency of video enhancement and meet the demand for video enhancement in real-time.
  • a method for video enhancement includes: decoding a video to be processed to obtain syntax elements of multiple frames; determining an I frame based on the syntax elements of the I frame in the multiple frames; performing frame enhancement on the I frame, To obtain the enhanced I frame; determine the first part of the image blocks in the multiple image blocks in the non-I frame based on the syntax elements of the non-I frame in the multiple frames, and the non-I frame includes the P frame or the B frame; A part of the image blocks is block-enhanced to obtain an enhanced first part of the image block; and an enhanced non-I frame is obtained according to the enhanced first part of the image block.
  • the video to be processed may be an encoded video (stream) or a compressed video (stream).
  • the current frame can be distinguished as an I frame or a non-I frame.
  • the current frame is a P frame/B frame
  • the syntax element does not include the difference information of the current frame
  • the current frame is an I frame.
  • enhancement includes, but is not limited to, noise reduction processing, sharpening processing, super-division processing, de-mosaic, contrast adjustment or saturation adjustment, etc.
  • performing frame enhancement on the I frame may include: inputting the I frame into the first enhancement model to perform frame enhancement on the I frame.
  • Frame enhancement can also be referred to as global enhancement.
  • the first enhancement model may be a traditional enhancement model.
  • the PQ pipeline is used to enhance I frames.
  • the first enhancement model may also be an artificial intelligence (AI) enhancement model.
  • AI artificial intelligence
  • the first enhancement model may be built based on the convolutional neural network CNN structure.
  • the first part of image blocks may include one image block or multiple image blocks.
  • performing block enhancement on the first partial image block may include: inputting the first partial image block into the second enhancement model to perform block enhancement on the first partial image block.
  • the second enhancement model may be a traditional enhancement model.
  • the PQ pipeline is used to enhance the first image block.
  • the second enhanced model may also be an AI enhanced model.
  • the second enhancement model may be built based on the convolutional neural network CNN structure.
  • the solution is more convenient to implement, which is conducive to real-time online processing, with fast enhancement speed, low power consumption, good enhancement effects, and can be greatly improved
  • the user-side viewing video experience especially for old videos with poor picture quality and low resolution.
  • the syntax element of the non-I frame includes the first reference image block of the first image block in the first partial image block and the first difference information corresponding to the first image block,
  • the first difference information is used to indicate the difference between the first image block and the first reference image block, and the information amount of the first difference information is greater than or equal to the first threshold;
  • the non-I frame is determined based on the syntax elements of the non-I frames in the multiple frames.
  • the first part of the image blocks in the plurality of image blocks in the I frame includes: performing differential compensation on the first reference image block by using the first difference information to obtain the first image block.
  • the information amount of the difference information can be obtained by calculating the variance of the pixel values in the difference information.
  • the first image block with a large difference from the reference image block is determined based on the difference information in the image block and the reference image block, and the first image block is enhanced, which can ensure video enhancement The effect of reducing the amount of calculation at the same time.
  • the method further includes: determining, based on the syntax elements of the non-I frame, a second part of the image blocks other than the first part of the image blocks, and
  • the non-I frame that is enhanced based on the enhanced first partial image block includes: the enhanced non-I frame based on the enhanced first partial image block and the second partial image block, wherein the second partial image block No block enhancement.
  • the syntax element of the non-I frame includes the second reference image block of the second image block in the second partial image block and the second difference information corresponding to the second image block.
  • the second difference information is used to indicate the difference between the second image block and the second reference image block.
  • the second reference image block is an image block in a reference frame other than the I frame.
  • Determining a second partial image block of the plurality of image blocks other than the first partial image block based on the syntax element of the non-I frame may include: performing differential compensation on the second reference image block according to the second differential information, To get the second part of the image block.
  • block enhancement is not performed on the second part of the image blocks in the non-I frame, which can further reduce the amount of calculation and improve the processing efficiency.
  • the syntax element of the non-I frame includes the second reference image block of the second image block in the second partial image block, and the second reference image block is an enhanced Image block.
  • the position of the reference image block of the second image block in the non-I frame can be obtained by decoding, and the reference image block of the second image block is located in the reference frame of the non-I frame.
  • the enhanced result of the reference frame of the non-I frame may be referred to as the enhanced reference frame.
  • the image block at the position in the enhanced reference frame may be determined, and the image block may be used as the second reference image block. That is to say, the second reference image block may be an image block in the enhanced reference frame.
  • the syntax element of the non-I frame further includes second difference information corresponding to the second image block, and the second difference information is used to indicate the second image block and the second reference
  • the difference between the image blocks, the information amount of the second difference information is less than the first threshold; the determination of the second part of the image blocks except the first part of the image blocks based on the syntax elements of the non-I frame includes: The second reference image block is determined as the second image block.
  • the enhanced image block is used as the second image block, and there is no need to obtain the second image block in the non-I frame, and accordingly, there is no need to The second image block in the non-I frame is block-enhanced, and the enhanced image block is directly multiplexed as the second image block in the enhanced non-I frame, which can further improve the efficiency of video enhancement processing.
  • the first enhancement model used for frame enhancement of the I frame is the same as the second enhancement model used for block enhancement of the first partial image block.
  • using the same enhancement model can reduce the training cost and storage space of the enhancement model, and improve processing efficiency.
  • At least one of the first enhancement model and the second enhancement model is a neural network enhancement model.
  • the use of the neural network enhancement model can make full use of various nonlinear operations in the model structure, learn the mapping relationship between the input and label (input, label) sample pairs, and improve the normalization of the model Ability to solve the problem of image quality effects and get better enhancement effects.
  • frame enhancement or block enhancement includes at least one of the following: noise reduction processing, sharpening processing, super-division processing, de-mosaic, contrast adjustment, or saturation adjustment.
  • the method further includes: removing blocking effects on the enhanced non-I frame.
  • the abrupt sense of the boundary of the image block can be eliminated.
  • a video enhancement device including: a decoding module for decoding a video to be processed to obtain syntax elements of multiple frames; an enhancement module for: syntax elements based on I frames in multiple frames Determine the I frame; perform frame enhancement on the I frame to obtain the enhanced I frame; determine the first part of the multiple image blocks in the non-I frame based on the syntax elements of the non-I frame in the multiple frames, and the non-I frame
  • the I frame includes a P frame or a B frame; the first part of the image block is block-enhanced to obtain the enhanced first part of the image block; the enhanced non-I frame is obtained according to the enhanced first part of the image block.
  • the solution is more convenient to implement, which is conducive to real-time online processing, with fast enhancement speed, low power consumption, good enhancement effects, and can be greatly improved
  • the user-side viewing video experience especially for old videos with poor picture quality and low resolution.
  • the syntax element of the non-I frame includes the first reference image block of the first image block in the first partial image block and the first difference information corresponding to the first image block,
  • the first difference information is used to indicate the difference between the first image block and the first reference image block, and the information amount of the first difference information is greater than or equal to the first threshold;
  • the enhancement module is specifically used to:
  • the reference image block is subjected to differential compensation to obtain the first image block.
  • the enhancement module is further configured to: determine the second part of the image blocks except the first part of the plurality of image blocks based on the syntax element of the non-I frame; and , The enhancement module is specifically configured to: obtain an enhanced non-I frame based on the enhanced first partial image block and the second partial image block, wherein the second partial image block has not undergone block enhancement.
  • the syntax element of the non-I frame includes the second reference image block of the second image block in the second partial image block, and the second reference image block is an enhanced Image block.
  • the syntax element of the non-I frame further includes second difference information corresponding to the second image block, and the second difference information is used to indicate the second image block and the second reference The difference between the image blocks, the information amount of the second difference information is less than the first threshold, and the enhancement module is specifically configured to determine the second reference image block as the second image block.
  • the first enhancement model used for frame enhancement of the I frame is the same as the second enhancement model used for block enhancement of the first partial image block.
  • At least one of the first enhancement model and the second enhancement model is a neural network enhancement model.
  • frame enhancement or block enhancement includes at least one of the following: noise reduction processing, sharpening processing, super-division processing, de-mosaic, contrast adjustment, or saturation adjustment.
  • the enhancement module is further configured to: remove the blocking effect on the enhanced non-I frame.
  • a video enhancement device including: a non-volatile memory and a processor coupled to each other, the processor calls the program code stored in the memory to execute any one of the methods of the first aspect Part or all of the steps.
  • a computer-readable storage medium stores program code, wherein the program code includes instructions for executing part or all of the steps of any one of the methods of the first aspect .
  • a computer program product which when the computer program product runs on a computer, causes the computer to execute part or all of the steps of any one of the methods in the first aspect.
  • a chip in a sixth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface and executes part or all of the steps of any method of the first aspect.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute part or all of the steps of any one of the methods in the first aspect.
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • Figure 4 is a schematic flowchart of a video encoding and decoding method
  • FIG. 5 is a schematic flowchart of a video enhancement method provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another video enhancement method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a video enhancement method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of global enhancement provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of differential information provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a differential information calculation method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the effect of blocking effect removal provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a method for video super division provided by an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of another video super-division method provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of the enhancement effect of one frame provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of the enhancement effect of another frame provided by an embodiment of the present application.
  • FIG. 16 is a schematic block diagram of a video enhancement device provided by an embodiment of the present application.
  • FIG. 17 is a schematic block diagram of another video enhancement device provided by an embodiment of the present application.
  • the technical solution in this application will be described below in conjunction with the accompanying drawings.
  • the technical solutions involved in the embodiments of the present application can be applied to video enhancement scenarios.
  • the video enhancement method of the embodiments of the present application may not only be applied to applications based on existing video coding standards (such as H.264, HEVC, etc.), but may also be applied to future video coding standards (such as H.266 standard) application.
  • end-side video enhancement can improve video quality and enhance user experience.
  • video enhancement method of the embodiment of the present application it is possible to combine video enhancement with video codec, and realize the video enhancement on the online terminal side while ensuring the video enhancement effect.
  • Video compression techniques perform spatial (intra-image) prediction and/or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences.
  • a video slice ie, a video frame or a part of a video frame
  • image blocks can also be called tree blocks, coding units (CU) and/ Or coding node.
  • the image block in the to-be-intra-coded (I) slice of the image is encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image.
  • the image blocks in the to-be-coded (P or B) slice of the image may use spatial prediction relative to reference samples in adjacent blocks in the same image or temporal prediction relative to reference samples in other reference images.
  • the image may be referred to as a frame, and the reference image may be referred to as a reference frame.
  • Video coding generally refers to processing a sequence of pictures that form a video or video sequence.
  • the terms "picture”, "frame” or “image” can be used as synonyms.
  • Video encoding is performed on the source side, and usually includes processing (for example, by compressing) the original video picture to reduce the amount of data required to represent the video picture, so as to store and/or transmit more efficiently.
  • Video decoding is performed on the destination side and usually involves inverse processing relative to the encoder to reconstruct the video picture.
  • the combination of the encoding part and the decoding part is also called codec (encoding and decoding).
  • a video sequence includes a series of pictures, the pictures are further divided into slices, and the slices are further divided into blocks.
  • Video coding is performed in units of blocks.
  • the concept of blocks is further expanded.
  • macroblocks MB
  • the macroblocks can be further divided into multiple prediction blocks (partitions) that can be used for predictive coding.
  • HEVC high-efficiency video coding
  • basic concepts such as coding unit (CU), prediction unit (PU), and transform unit (TU) are adopted, which are functionally Divide a variety of block units, and use a new tree-based description.
  • the CU can be divided into smaller CUs according to the quadtree, and the smaller CUs can be further divided to form a quadtree structure.
  • the CU is the basic unit for dividing and encoding the coded image.
  • the PU and TU also have a similar tree structure.
  • the PU can correspond to the prediction block and is the basic unit of prediction coding.
  • the CU is further divided into multiple PUs according to the division mode.
  • TU can correspond to the transform block and is the basic unit for transforming the prediction residual.
  • no matter CU, PU or TU they all belong to the concept of block (or image block) in nature.
  • the image block to be encoded in the currently encoded image may be referred to as the current block.
  • the decoded image block used to predict the current block in the reference image is called a reference block, that is, a reference block is a block that provides a reference signal for the current block, where the reference signal represents the pixel value in the image block.
  • the block in the reference image that provides the prediction signal for the current block may be a prediction block, where the prediction signal represents a pixel value or a sample value or a sample signal in the prediction block. For example, after traversing multiple reference blocks, the best reference block is found. This best reference block will provide prediction for the current block, and this block is called a prediction block.
  • Video coding standards of H.261 belong to "lossy hybrid video coding and decoding” (that is, combining spatial and temporal prediction in the sample domain with 2D transform coding for applying quantization in the transform domain).
  • Each picture of a video sequence is usually divided into a set of non-overlapping blocks, and is usually coded at the block level.
  • the encoder side usually processes the video at the block (video block) level, that is, encodes the video.
  • the prediction block is generated through spatial (intra-picture) prediction and temporal (inter-picture) prediction.
  • the processed block subtracts the prediction block to obtain the residual block, transforms the residual block in the transform domain and quantizes the residual block to reduce the amount of data to be transmitted (compressed), and the decoder side will process the inverse of the encoder Partially applied to the coded or compressed block to reconstruct the current block for representation.
  • the encoder duplicates the decoder processing loop, so that the encoder and the decoder generate the same prediction (for example, intra prediction and inter prediction) and/or reconstruction for processing, that is, encoding subsequent blocks.
  • a picture can be regarded as a two-dimensional array or matrix of picture elements.
  • the pixel points in the array can also be called sampling points.
  • the number of sampling points of the array or picture in the horizontal and vertical directions (or axis) defines the size and/or resolution of the picture.
  • An encoder (or called a video encoder) can be used to receive preprocessed picture data, and process the preprocessed picture data using a relevant prediction mode, thereby providing encoded picture data.
  • the decoder can be used to receive encoded picture data and provide decoded picture data or decoded pictures.
  • Both the encoder and the decoder can be implemented as any of various suitable circuits, for example, one or more microprocessors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits, ASIC), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof.
  • the device can store the instructions of the software in a suitable non-transitory computer-readable storage medium, and can use one or more processors to execute the instructions in hardware to execute the technology of the present disclosure. . Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be regarded as one or more processors.
  • the encoders and decoders in the embodiments of the present application may be video standard protocols such as H.263, H.264, HEVV, MPEG-2, MPEG-4, VP8, VP9, etc. or next-generation video standard protocols (such as H.266). Etc.) Corresponding encoder/decoder.
  • the coded bitstream of video data may include the data, indicators, index values, mode selection data, etc., related to the coded video frames discussed herein, such as data related to coded partitions (e.g., transform coefficients or quantized transform coefficients, (As discussed) optional indicators, and/or data defining code partitions).
  • the decoder can be used to decode the encoded bitstream.
  • the decoder can be used to receive and parse such syntax elements, and decode related video data accordingly.
  • the encoder may entropy encode the syntax elements into an encoded video bitstream. In such instances, the decoder can parse such syntax elements and decode the related video data accordingly.
  • the encoder and decoder are introduced below.
  • the encoder receives a picture or image block of a picture, for example, a picture in a picture sequence that forms a video or video sequence.
  • the image block can also be called the current picture block or the picture block to be coded.
  • the picture can be called the current picture or the picture to be coded (especially when the current picture is distinguished from other pictures in video coding, the other pictures are for example the same video sequence). That is, the previous coded and/or decoded picture in the video sequence that also includes the current picture).
  • the encoder may include a dividing unit for dividing the picture into a plurality of blocks such as image blocks, usually into a plurality of non-overlapping blocks.
  • the segmentation unit can be used to use the same block size and the corresponding grid that defines the block size for all pictures in the video sequence, or to change the block size between pictures or subsets or groups of pictures, and divide each picture into Corresponding block.
  • the image block is also or can be regarded as a two-dimensional array or matrix of sampling points with sample values, although its size is smaller than that of the picture.
  • the image block may include, for example, one sampling array (for example, a luminance array in the case of black-and-white pictures) or three sampling arrays (for example, one luminance array and two chrominance arrays in the case of color pictures) or according to the Any other number and/or type of array of applied color formats.
  • the number of sampling points in the horizontal and vertical directions (or axis) of the image block defines the size of the image block.
  • the encoder can encode pictures block by block, for example, perform encoding and prediction on each image block.
  • the encoder may calculate the residual block based on the picture image block and the prediction block, for example, by subtracting the sample value of the picture image block from the sample value of the prediction block sample by sample (pixel by pixel) to obtain the residual in the sample domain.
  • a transform such as discrete cosine transform (DCT) or discrete sine transform (DST) is applied to the sample values of the residual block to obtain transform coefficients in the transform domain.
  • Transform coefficients can also be referred to as transform residual coefficients, and represent residual blocks in the transform domain.
  • the transform coefficients are quantized, for example, by applying scalar quantization or vector quantization to obtain quantized transform coefficients.
  • the quantized transform coefficients can also be referred to as quantized residual coefficients.
  • the quantization process can reduce the bit depth associated with some or all of the transform coefficients.
  • the encoder can be used to determine a prediction mode based on rate-distortion optimization (RDO), that is, select a prediction mode that provides minimum rate-distortion optimization, or select a prediction mode with a relevant rate-distortion that at least meets the prediction mode selection criteria.
  • RDO rate-distortion optimization
  • the encoder can determine or select the best or optimal prediction mode from a set of (predetermined) prediction modes.
  • the prediction mode set may include, for example, an intra prediction mode and/or an inter prediction mode.
  • the set of intra prediction modes may include a variety of different intra prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in H.265, or may include 67 Different intra-frame prediction modes, for example, non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in H.266 under development.
  • the set of inter-frame prediction modes depends on the available reference pictures (ie, for example, the aforementioned at least part of the decoded pictures stored in the DBP230) and other inter-frame prediction parameters, such as whether the entire reference picture is used or only A part of the reference picture, such as the search window area surrounding the area of the current block, to search for the best matching reference block, and/or for example depending on whether pixel interpolation such as half-pixel and/or quarter-pixel interpolation is applied.
  • the set of inter prediction modes may include, for example, an advanced motion vector prediction (AMVP) mode and a merge mode.
  • AMVP advanced motion vector prediction
  • the encoder may include an inter prediction unit and an intra prediction unit.
  • the inter prediction unit may include a motion estimation (ME) unit and a motion compensation (MC) unit.
  • the motion estimation unit is used to receive or obtain a picture image block (current picture image block of the current picture) and a decoded picture, or at least one or more previously reconstructed blocks, for example, one or more other/different previously decoded pictures
  • the reconstructed block is used for motion estimation.
  • a video sequence may include a current picture and a previously decoded picture, or in other words, the current picture and a previously decoded picture may be part of, or form a sequence of pictures that form the video sequence.
  • the encoder can be used to select a reference block from multiple reference blocks of the same or different pictures in multiple other pictures, and provide the reference picture to the motion estimation unit and/or provide the position (X, Y coordinates) of the reference block
  • the offset (spatial offset) from the position of the current block is used as an inter prediction parameter. This offset is also called a motion vector (MV).
  • the motion compensation unit is used to obtain inter-frame prediction parameters, and perform inter-frame prediction based on or using the inter-frame prediction parameters to obtain an inter-frame prediction block.
  • the motion compensation performed by the motion compensation unit may include fetching or generating a prediction block based on a motion/block vector determined by motion estimation (interpolation of sub-pixel accuracy may be performed). Interpolation filtering can generate additional pixel samples from known pixel samples, thereby potentially increasing the number of candidate prediction blocks that can be used to encode picture blocks.
  • the motion compensation unit can locate the prediction block pointed to by the motion vector in a reference picture list.
  • the motion compensation unit may also generate syntax elements associated with the block and the video slice for the decoder to use when decoding the picture block of the video slice.
  • the intra prediction unit is used to obtain, for example, receive a picture block (current picture block) of the same picture and one or more previously reconstructed blocks, for example reconstructed adjacent blocks, for intra estimation.
  • the encoder can combine entropy coding algorithms or schemes (for example, variable length coding (VLC) scheme, context adaptive VLC (context adaptive VLC, CAVLC) scheme, arithmetic coding scheme, context adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), syntax-based context-adaptive binary arithmetic coding,
  • VLC variable length coding
  • CAVLC context adaptive VLC
  • CABAC context adaptive binary arithmetic coding
  • syntax-based context-adaptive binary arithmetic coding for example, variable length coding (VLC) scheme, context adaptive VLC (context adaptive VLC, CAVLC) scheme, arithmetic coding scheme, context adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), syntax-based context-adaptive binary arithmetic coding,
  • SBAC probability interval partitioning entropy
  • PIPE probability interval partitioning entropy
  • PIPE probability interval partitioning entropy
  • the encoded bitstream can be transmitted to the video decoder, or it can be archived for later transmission or retrieval by the video decoder. Further, it is also possible to entropy encode other syntax elements of the current video slice being encoded.
  • the video decoder is used to receive, for example, encoded picture data (for example, an encoded bit stream) encoded by an encoder to obtain decoded pictures.
  • encoded picture data for example, an encoded bit stream
  • the video decoder receives video data from the video encoder, such as an encoded video bitstream and associated syntax elements that represent the picture blocks of the encoded video slice.
  • the decoder may perform entropy decoding on the encoded picture data to obtain, for example, quantized coefficients and/or decoded encoding parameters, such as inter prediction, intra prediction parameters, loop filter parameters, and/or other syntax elements. Any one or all of (decoded).
  • the video decoder may receive syntax elements at the video slice level and/or the video block level.
  • the decoder may include an inter prediction unit and an intra prediction unit, where the inter prediction unit may be functionally similar to the inter prediction unit of the encoder, and the intra prediction unit may be functionally similar to the intra prediction unit of the encoder.
  • the intra prediction unit When a video slice is encoded as an intra-coded (I) slice, the intra prediction unit generates data for the current video slice based on the intra prediction mode represented by the signal and the data from the previously decoded block of the current frame or picture. The prediction block of the picture block with the band.
  • the inter prediction unit (for example, a motion compensation unit) generates a video block for the current video slice based on the motion vector and other syntax elements received The prediction block.
  • a prediction block can be generated from a reference picture in a reference picture list.
  • the prediction information for the video block of the current video slice is determined, and the prediction information is used to generate the prediction block for the current video block being decoded.
  • the prediction mode (for example, intra or inter prediction) and inter prediction slice type (for example, B slice) of the video block used to encode the video slice can be determined through some syntax elements received.
  • Band, P slice or GPB slice construction information for one or more of the reference picture list for the slice, the motion vector of each inter-coded video block used for the slice, each slice of the slice
  • the syntax elements received by the video decoder from the bitstream include receiving adaptive parameter set (APS), sequence parameter set (SPS), and picture parameter set (picture parameter set). set, PPS) or one or more of the syntax elements in the slice header.
  • APS adaptive parameter set
  • SPS sequence parameter set
  • PPS picture parameter set
  • the decoder can inversely quantize (ie, inversely quantize) the quantized transform coefficients provided in the bitstream.
  • the inverse quantization process may include using the quantization parameter calculated by the video encoder for each video block in the video slice to determine the degree of quantization that should be applied and also determine the degree of inverse quantization that should be applied.
  • an inverse transform for example, inverse DCT, inverse integer transform, or a conceptually similar inverse transform process
  • the inverse transformed block ie, the reconstructed residual block
  • the prediction block is added to the prediction block to obtain the reconstructed block in the sample domain, for example, by comparing the sample value of the reconstructed residual block with the sample value of the prediction block Add up.
  • the decoder filters the reconstructed block to obtain the filtered block, thereby smoothly performing pixel conversion or improving video quality.
  • the decoder may include one or more loop filters, such as deblocking filters, sample-adaptive offset (SAO) filters or other filters, such as bilateral filters, adaptive loop filters (adaptive loop filter, ALF), or sharpening or smoothing filter, or collaborative filter.
  • the loop filter unit may be an in-loop filter, and in other configurations, the loop filter unit may also be implemented as a post-loop filter.
  • the decoder can output the decoded picture for presentation to the user or for the user to view.
  • the video decoder can be used to decode the compressed bitstream.
  • the decoder can generate an output video stream without a loop filter unit.
  • Video enhancement refers to actions that can be performed on the video that can improve the video quality.
  • video enhancement includes: super-division, noise reduction, sharpening, demosaicing, contrast adjustment or saturation adjustment, etc.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to perform nonlinear transformation on the features obtained in the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression: in, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example, suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to a part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be formalized as a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • Recurrent Neural Networks are used to process sequence data.
  • the layers are fully connected, and the nodes in each layer are disconnected.
  • RNN Recurrent Neural Networks
  • the specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the error back-propagation algorithm is also used, but there is a difference: that is, if the RNN is networked, the parameters, such as W, are shared; this is not the case with the traditional neural network mentioned above.
  • the output of each step depends not only on the current step of the network, but also on the state of the previous steps of the network.
  • the neural network can use the backpropagation (BP) algorithm to modify the values of the parameters in the neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the neural network model updated by backpropagating the error loss information, so as to converge the error loss.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • an embodiment of the present application provides a system architecture 100.
  • a data collection device 160 is used to collect training data.
  • the training data may include training video.
  • the data collection device 160 stores the training data in the database 130, and the training device 120 trains to obtain the target model/rule 101 based on the training data maintained in the database 130.
  • the above-mentioned target model/rule 101 can be used to implement the video enhancement method of the embodiment of the present application.
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rule 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to this application. Limitations of the embodiment.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 1, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR) AR/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 110 configures input/output
  • the (input/output, I/O) interface 112 is used for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data may include: client The pending video input from the device.
  • the preprocessing module 113 is used for preprocessing according to the input data (such as the video to be processed) received by the I/O interface 112. In the embodiment of the present application, the preprocessing module 113 may not be provided, and the calculation module 111 is directly used for input Data is processed. In this embodiment of the present application, the preprocessing module 113 may be used to decode the input video to be processed.
  • the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing .
  • the data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
  • the I/O interface 112 returns the processing result, such as the enhanced video obtained as described above, to the client device 140 to provide it to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above tasks provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • the target model/rule 101 is obtained by training according to the training device 120.
  • the target model/rule 101 may be a neural network in the embodiment of the application.
  • the neural network in the embodiment of the application may be a CNN.
  • Convolutional neural networks deep convolutional neural networks, DCNN
  • recurrent neural networks recurrent neural network, RNN
  • FIG. 2 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50.
  • the chip may be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111.
  • the chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101.
  • the video enhancement model in the embodiment of the present application can be implemented in the chip as shown in FIG. 2.
  • a neural network processor (neural-network processing unit, NPU) 50 is mounted on a main central processing unit (central processing unit, CPU) (host CPU) as a coprocessor, and the main CPU allocates tasks.
  • the core part of the NPU is the arithmetic circuit 503.
  • the controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 503 includes multiple processing modules (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit takes the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 508 (accumulator).
  • the vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 507 can store the processed output vector in the unified buffer 506.
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 507 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.
  • the unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit 510 (bus interface unit, BIU) is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.
  • An instruction fetch buffer 509 (instruction fetch buffer) connected to the controller 504 is used to store instructions used by the controller 504;
  • the controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip (On-Chip) memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory.
  • Memory double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
  • the related operations of the video enhancement model in the embodiment of the present application may be executed by the arithmetic circuit 503 or the vector calculation unit 507.
  • an embodiment of the present application provides a system architecture 300.
  • the system architecture includes a local device 301, a local device 302, an execution device 310, and a data storage system 350.
  • the local device 301 and the local device 302 are connected to the execution device 310 through a communication network.
  • the execution device 310 may be implemented by one or more servers.
  • the execution device 310 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 310 may be arranged on one physical site or distributed on multiple physical sites.
  • the execution device 310 may use the data in the data storage system 350 or call the program code in the data storage system 350 to implement the video enhancement method of the embodiment of the present application.
  • the execution device 310 may perform the following process: decode the video to be processed to obtain the syntax elements of multiple frames; determine the I frame based on the syntax elements of the I frame in the multiple frames; Frame enhancement to obtain an enhanced I frame; determining the first part of the image blocks in the multiple image blocks in the non-I frame based on the syntax elements of the non-I frame in the multiple frames, the non-I frame including a P frame or a B frame ; Perform block enhancement on the first partial image block to obtain an enhanced first partial image block; obtain an enhanced non-I frame according to the enhanced first partial image block.
  • corresponding processing can be performed on I frames and non-I frames to obtain enhanced I frames and non-I frames.
  • Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
  • Each user's local device can interact with the execution device 310 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.
  • the local device 301 and the local device 302 obtain the relevant parameters of the video enhancement model from the execution device 310, deploy the video enhancement model on the local device 301 and the local device 302, and use the video enhancement model for video enhancement That is to say, the local device 301 or the local device 302 can execute the steps executed by the aforementioned execution device 310.
  • the video enhancement model can be directly deployed on the execution device 310, and the execution device 310 obtains the video to be processed from the local device 301 and the local device 302, and uses the video enhancement model to perform video enhancement on the video to be processed.
  • the foregoing execution device 310 may also be a cloud device. In this case, the execution device 310 may be deployed in the cloud; or, the foregoing execution device 310 may also be a terminal device. In this case, the execution device 310 may be deployed on the user terminal side. This is not limited.
  • an intra-frame encoding method can be used, similar to image encoding, for example, joint photographic experts group (JPEG) encoding; at the decoding end, almost lossless recovery of I-frames can be achieved.
  • JPEG joint photographic experts group
  • P-frames and B-frames are significantly different from I-frames.
  • the reference frame of the P frame may be an I frame.
  • the I frame and the P frame are divided into multiple macro blocks. Macroblock matching is performed on the macroblocks in the I frame and the P frame, and the degree of matching between the macroblocks of the two frames is compared. The higher the matching degree of the two macroblocks, the higher the similarity of the two macroblocks.
  • the macro block in the I frame can be used as the source macro block of the macro block in the P frame. By traversing all the macroblocks in the P frame in this way, the corresponding source macroblock can be found in the I frame.
  • an approximate P frame can be obtained, which is called a P'frame.
  • the difference information can be estimated by the optical flow network.
  • the difference information may also be referred to as motion difference information, inter-frame motion information, or difference compensation information, etc. Therefore, for P-frames, the encoder can use differential information as an encoding object instead of P-frames as an encoding object, and then encode the differential information and other syntax elements to obtain an encoded video bitstream. Syntax elements may include: block size, block matching, macro block source position (source position), macro block target position (target position), difference information, and so on.
  • P frames can be obtained from I frames.
  • FIG. 5 shows a video enhancement method 600 provided by an embodiment of the present application.
  • the method 600 includes step S610 to step S660. Steps S610 to S660 will be described in detail below.
  • the method 600 may be specifically executed by the execution device 110 shown in FIG. 1.
  • the video data in the method 600 may be the video data received by the client device 140 as shown in FIG. 1, and the preprocessing module 113 in the execution device 110 may be used to execute the video data in the method 600
  • the calculation module 111 in the execution device 110 may be used to execute step S620 and step S660.
  • the method 600 may be processed by a CPU, or may be processed by a CPU and a GPU together, or a GPU may not be used, and other processors suitable for neural network calculation may be used, which is not limited in this application.
  • the video to be processed may be an encoded video (stream) or a compressed video (stream) (compressed video). Encoded video can also be understood as an encoded bitstream. Specifically, the video to be processed can be received from the encoding terminal or other sources through the communication interface.
  • S620 Determine an I frame based on the syntax elements of the I frame in the multiple frames.
  • I frame is also called intra-picture.
  • the current frame can be distinguished as an I frame or a non-I frame. For example, if the syntax element includes the difference information of the current frame, the current frame is a P frame/B frame, and if the syntax element does not include the difference information of the current frame, the current frame is an I frame.
  • performing frame enhancement on the I frame may include: inputting the I frame into the first enhancement model to perform frame enhancement on the I frame.
  • Frame enhancement can also be referred to as global enhancement.
  • the I frame can be frame-enhanced by means of image enhancement.
  • Enhancing an image refers to actions that can improve the quality of the image. Enhancement includes, but is not limited to, noise reduction processing, sharpening processing, super-resolution processing, de-mosaic processing, contrast adjustment or saturation adjustment, etc.
  • the first enhancement model may be a traditional enhancement model. That is, the traditional method can be used to enhance the I frame.
  • the PQ pipeline is used to enhance the I frame.
  • the PQ pipeline can include one or more modules. Each module is independent of each other, that is, each module can process the image separately, such as: dithering (Dither), high dynamic range (high dynamic range, HDR) adjustment (also called “high dynamic contrast adjustment”), chroma transient State improvement (color transient improvement, CTI), automatic contrast enhancement (dynamic contrast improvement, DCI), noise reduction (NR), over-division, motion estimation and compensation, etc.
  • the various modules of the PQ pipeline adapt to each other and adjust each other to obtain the corresponding parameters of each module.
  • the final effect of image processing using PQ pipeline is obtained by equalizing the processing results of the image by each module.
  • the first enhanced model may be an AI enhanced model.
  • the AI enhancement model can be a black box model or a white box model.
  • the AI enhancement model can be built based on the CNN structure.
  • the AI enhancement model can also be built based on the RNN structure.
  • the AI enhancement model based on CNN may include: super-resolution convolutional neural network (SRCNN), fast super-resolution convolutional neural network (fast super-resolution convolutional neural network, FSRCNN) or super-resolution generative adversarial network (SRGAN), etc.
  • AI enhancement models built based on RNN may include: frame recurrent video super-resolution (FRVSR), etc.
  • the embodiments of the present application only take the CNN structure and the RNN structure as examples, and do not constitute a limitation on the specific structure of the AI enhancement model.
  • the AI-enhanced model can make full use of various nonlinear operations in the model structure, learn the mapping relationship between input and label (input, label) sample pairs, improve the model's normalization ability, and solve the problem of image quality effects. Better enhancement effect.
  • S640 Determine a first part of the image blocks in the multiple image blocks in the non-I frame based on the syntax elements of the non-I frames in the multiple frames, where the non-I frame includes a P frame or a B frame.
  • the P frame is also called a forward predictive-frame
  • the B frame is also called a bi-directional interpolation prediction frame.
  • the non-I frame includes multiple image blocks, and the multiple image blocks may not overlap each other.
  • the size of the image block can be set as needed.
  • the frame in the video encoding and decoding process, the frame can be divided into multiple macro blocks, and then encoding and decoding are performed in units of macro blocks.
  • the multiple image blocks may be determined according to the division manner of the multiple macro blocks, that is, the multiple macro blocks are regarded as the multiple image blocks. It should be understood that, in the embodiments of the present application, only a macro block is used as an image block as an example for description, and does not limit the division method of the image block.
  • the first part of image blocks may include one image block or multiple image blocks.
  • the syntax element of the non-I frame includes the first reference image block of the first image block in the first partial image block and the first difference information corresponding to the first image block, and the first difference information is used to indicate that the first image block and the The difference between the first reference image blocks.
  • Determining the first partial image block of the multiple image blocks in the non-I frame based on the syntax elements of the non-I frame in the multiple frames includes: performing differential compensation on the first reference image block according to the first differential information to obtain the first image Piece.
  • the differential compensation may include: adding the first reference image block and the first differential information to obtain the first image block.
  • the first reference image block may refer to a reference image block corresponding to the first image block.
  • the first reference image block may be located in a reference frame other than the I frame.
  • the reference frame can be an I frame or other P frames.
  • the image block may be a macro block in the video encoding and decoding process.
  • the reference image block may also be referred to as a source macro block.
  • the first reference image block may also be understood as the source macro block corresponding to the first image block.
  • the first difference information may be obtained by performing a subtraction operation on the first image block and the first reference image block in the video encoding end.
  • the first difference information may refer to the aforementioned reconstructed residual block.
  • the target position (x1, y1) of the first macroblock an image block of the first partial image block
  • the position of the source macroblock corresponding to the first macroblock x1'
  • the position (x1', y1') of the source macroblock corresponding to the first macroblock refers to the source position of the macroblock in the aforementioned syntax element.
  • the source macroblock at the position (x1', y1') (that is, the source macroblock corresponding to the first macroblock) is overlaid on the target position (x1, y1) of the first macroblock. Then, the source macro block corresponding to the first macro block and the difference information corresponding to the first macro block are added to obtain the first macro block.
  • the first image block may be preset. For example, if the position A in the non-I frame is preset, the image block at the position is the first image block.
  • the information amount of the first difference information is greater than or equal to the first threshold.
  • the greater the information amount of the difference information the greater the difference between an image block and its corresponding reference image block.
  • the greater the first difference information the greater the difference between the first image block and the first reference image block.
  • the information amount of the difference information can be obtained by calculating the variance of the pixel values in the difference information.
  • performing block enhancement on the first partial image block may include: inputting the first partial image block into the second enhancement model to perform block enhancement on the first partial image block.
  • the second enhancement model may be a traditional enhancement model.
  • the second enhanced model may be an AI enhanced model.
  • the data to be enhanced is changed from the frame level to the image block level, such as the block level, which can effectively reduce the amount of calculation.
  • the first enhanced model and the second enhanced model may be the same or different. Using the same enhanced model can reduce the training cost and storage space of the enhanced model, and improve processing efficiency.
  • the method 600 further includes: determining a second partial image block in the plurality of image blocks except for the first partial image block based on the syntax elements of the non-I frame in the plurality of frames.
  • the second part of image blocks may include one image block or multiple image blocks.
  • the syntax element of the non-I frame includes a second reference image block of the second image block in the second partial image block and second difference information corresponding to the second image block, and the second difference information is used to indicate the second image block The difference with the second reference image block.
  • Determining the second part of the image blocks of the multiple image blocks in the non-I frame based on the syntax elements of the non-I frames in the multiple frames includes: performing differential compensation on the second reference image block according to the second differential information to obtain the second Image block.
  • the difference compensation may include: adding the second reference image block and the second difference information to obtain the second image block.
  • the second reference image block may refer to a reference image block corresponding to the second image block.
  • the second reference image block may be located in a reference frame other than the I frame.
  • the reference frame can be an I frame or other P frames.
  • the image block may be a macro block in the video encoding and decoding process.
  • the reference image block may also be referred to as a source macro block.
  • the second reference image block may also be understood as the source macro block corresponding to the second image block.
  • the second difference information may be obtained by performing a subtraction operation on the first image block and the second reference image block in the video encoding terminal.
  • the second difference information may refer to the aforementioned reconstructed residual block.
  • the second image block may be preset.
  • the position B in the non-I frame is preset, and the image block corresponding to the position B is the second image block.
  • the information amount of the second difference information is less than the first threshold, and the information amount of the first difference information is greater than or equal to the first threshold.
  • the information amount of the second difference information is less than or equal to the first threshold, and the information amount of the first difference information is greater than the first threshold.
  • the image block is the first image block. If the information amount of the difference information corresponding to an image block is less than the first threshold, the image block is the second image block. Or, in a non-I frame, if the information amount of the difference information corresponding to an image block is greater than the first threshold, the image block is the first image block. If the information amount of the difference information corresponding to an image block is less than or equal to the first threshold, the image block is the second image block.
  • the second reference image block is an enhanced image block.
  • the second reference image block may refer to a reference image block corresponding to the second image block.
  • the enhanced result of the reference frame of the non-I frame may be referred to as the enhanced reference frame.
  • the reference frame can be an I frame or other P frames.
  • the second reference image block may be located in the enhanced reference frame.
  • the position of the reference image block of the second image block in the non-I frame can be obtained by decoding, and the reference image block of the second image block is located in the reference frame of the non-I frame.
  • the enhanced result of the reference frame of the non-I frame may be referred to as the enhanced reference frame.
  • the image block at the position in the enhanced reference frame may be determined, and the image block may be used as the second reference image block. That is to say, the second reference image block may be an image block in the enhanced reference frame.
  • determining the second partial image block of the multiple image blocks in the non-I frame based on the syntax elements of the non-I frame in the multiple frames includes: determining the second reference image block as the second image block.
  • the image block may be a macro block in the video encoding and decoding process.
  • Obtain the position (x2, y2) of the second macroblock taking the second image block in the P frame as an example) and the position (x2', of the source macroblock corresponding to the second macroblock) from the syntax elements obtained after decoding y2').
  • Take the enhanced source macroblock at position (x2', y2') (that is, the enhanced result of the source macroblock corresponding to the second macroblock) as the second reference image block and overlay it on Phen (enhanced Phen).
  • Frame) at the position (x2, y2) of the second macroblock, that is, the second reference image block is determined as the second image block in Phen. In this case, it is not necessary to obtain the second image block in the non-I frame, but directly multiplex the enhanced image block in the enhanced non-I frame as the second image block in the enhanced non-I frame.
  • S660 Obtain an enhanced non-I frame according to the enhanced first partial image block.
  • an enhanced non-I frame is obtained according to the enhanced first partial image block and the second partial image block.
  • the second partial image block is not obtained by performing block enhancement on the second partial image block, but is obtained by multiplexing the second reference image block.
  • step S660 further includes performing differential compensation on the second reference image block according to the second difference information, and determining the compensated result block as the second image block in the enhanced non-I frame.
  • the second reference image block may be added to the second difference information, and the obtained result may be used as the second image block in the enhanced non-I frame.
  • step S660 may further include: removing the blocking effect on the enhanced non-I frame. Since the enhanced non-I frame is obtained by combining multiple image blocks, the enhanced non-I frame may have blocking effects at the edges of each image block, such as a sense of abruptness at the border of the image block.
  • the blocking effect can be removed by the adaptive filter.
  • the adaptive filter can be used to determine whether a filtering operation needs to be used at the edge of the image block, and the strength of the filtering operation.
  • the adaptive filter can refer to the adaptive filter in the standard decoder.
  • different enhancement processing is performed on the I frame and the non-I frame, the first part of the image block in the non-I frame is block enhanced, and the data subjected to the enhancement processing is changed from the frame level to the image block level.
  • the enhanced image block is reused as the enhancement result, which further improves the efficiency of the enhancement process while ensuring the video enhancement Effect.
  • the solution is more convenient to implement, which is conducive to real-time online processing, with fast enhancement speed, low power consumption, good enhancement effects, and can be greatly improved
  • the user-side viewing video experience especially for old videos with poor picture quality and low resolution.
  • the video is enhanced frame by frame. Although each frame after processing can get a better video enhancement result, the ignoring of the time consistency between frames may cause jitter and affect The quality of the output video.
  • the solution of the embodiment of the present application reuses the information in the reference image block to ensure the correlation between frames. Any enhancement model, such as the CNN model, will not affect the correlation between frames and solve the problem of time consistency. .
  • FIG. 6 shows a video enhancement method 700 provided by an embodiment of the present application.
  • the method 700 includes step S710 to step S7100.
  • the method 700 may be used as an example of the method 600, and for related description, refer to the method 600.
  • FIG. 7 shows a schematic flow chart of video enhancement, and the method 700 will be described below in conjunction with FIG. 6 and FIG. 7.
  • video decoding is performed on the compressed video to obtain a code stream.
  • the manner of obtaining the compressed video is as shown in step S610 in the foregoing, and will not be repeated here.
  • the relevant information of the target frame can be obtained from the code stream, for example, syntax elements.
  • the syntax elements of the non-I frame may include: macroblock size, macroblock matching, macroblock source location (x', y'), macroblock target location (x, y), difference information, and so on.
  • the difference information is forward difference information
  • B frames the difference information is forward difference information and backward difference information.
  • the method 700 is described only by taking the image block as a macro block as an example. It should be understood that the image block in the method 700 may also be an image block of other sizes in the process of video encoding and decoding. That is to say, the method of dividing the frame into image blocks in the embodiment of the present application may be the same as the method of dividing the frame into image blocks in the encoding and decoding process.
  • the target frame may also be referred to as the "current frame”.
  • the target macroblock may also be referred to as the "current macroblock”.
  • the source macroblock may also be referred to as a "reference macroblock”.
  • the macroblock source position refers to the position of the source macroblock, and the macroblock target position refers to the position of the target macroblock.
  • step S720 Judge the target frame. If the target frame is an I frame, step S730 is executed. If the target frame is a P frame or a B frame, step S740 is executed.
  • the relevant information of the target frame can be obtained according to the code stream, and then the target frame can be judged by the difference information of the target frame.
  • the difference information refers to the difference information of the target frame in the video encoding process, that is, the difference between the target frame and the reference frame.
  • the difference information can be estimated through an optical flow network.
  • the target frame is a P frame or a B frame, and if there is no difference information of the target frame in the relevant information of the target frame, the target frame is an I frame .
  • S730 Determine an I frame. Specifically, the I frame is determined according to the syntax element of the I frame.
  • S740 Perform global enhancement on the I frame to obtain an enhanced I frame I enh .
  • the I frame is input into the first enhancement model, and the enhanced I frame I enh is obtained .
  • global enhancement refers to frame-level image enhancement. That is, the complete frame information is input into the enhanced model, and the enhanced I frame I enh is obtained .
  • the first enhancement model is as described in step S630 above, and will not be repeated here.
  • the first enhanced model is an AI enhanced model. Input the complete frame information into the AI enhancement model to obtain the enhanced result.
  • step S750 Determine the difference information corresponding to the target macroblock in the target frame. Specifically, if the information amount of the difference information corresponding to the target macroblock in the target frame is greater than or equal to the first threshold, for example, if the difference information corresponding to the target macroblock in FIG. 6 has a large amount of information, step S770 is executed; if the target The information amount of the difference information corresponding to the target macroblock in the frame is less than the first threshold. If the difference information corresponding to the target macroblock in FIG. 6 has less information, then step S760 is executed. If the information amount of the difference information corresponding to the target macroblock in the target frame is greater than or equal to the first threshold, the target macroblock corresponds to the first image block in the method 600. If the information amount of the difference information corresponding to the target macroblock in the target frame is less than the first threshold, the target macroblock corresponds to the second image block in the method 600.
  • step S770 is executed; if the target frame The information amount of the difference information corresponding to the target macroblock in is less than or equal to the first threshold. For example, if the information amount of the difference information corresponding to the target macroblock in FIG. 6 is small, step S760 is executed. If the information amount of the difference information corresponding to the target macroblock in the target frame is greater than the first threshold, the target macroblock corresponds to the first image block in the method 600. If the information amount of the difference information corresponding to the target macroblock in the target frame is less than or equal to the first threshold, the target macroblock corresponds to the second image block in the method 600.
  • the difference information corresponding to the target macro block is used to indicate the degree of matching or similarity between the target macro block and the source macro block.
  • the greater the amount of difference information the greater the degree of matching between the target macro block and the source macro block. Low, that is, the greater the difference between the two.
  • the pixel value variance of the difference information corresponding to the target macroblock may be used as the information amount of the difference information corresponding to the target macroblock.
  • the source macro block corresponding to the target macro block can be determined according to the macro block matching obtained in the code stream information. That is to perform macro block matching and determine the source macro block corresponding to the target macro block.
  • the target frame is a P frame
  • the difference information corresponding to the target macro block in the P frame is determined, that is, the similarity between the target macro block and the reference macro block in the P frame is determined.
  • the target frame may be the P frame in FIG. 7
  • the reference frame of the target frame may be the I frame in FIG. 7, and the source macroblock corresponding to the target macroblock in the target frame is located in the reference frame.
  • the difference information corresponding to the target frame is shown in Figure 9.
  • the object or part of the object corresponds to the target macroblock 1 in the target frame, and the object or part of the object corresponds to the source macro in the reference frame Block 1', the similarity between the target macro block 1 and the source macro block 1'is relatively high.
  • the difference information corresponding to the target macro block 1 has a small amount of information, or the target macro block 1 The amount of information of the corresponding difference information is single.
  • the difference information corresponding to block 2 is rich in information. Generally, the difference information is caused by the movement of the object. The more violent the movement of the object, the greater the amount of information of the difference information corresponding to the target macroblock.
  • the difference information corresponding to the target macroblock in the B frame is determined, that is, the similarity between the target macroblock and the reference macroblock in the B frame is determined.
  • the similarity between the target macroblock and the reference macroblock in the B frame is determined according to the similarity between the target macroblock in the B frame and the two reference macroblocks. For example, the average value of the information amount of the two difference information corresponding to the target macroblock in the B frame is taken as the information amount of the difference information corresponding to the target macroblock in the B frame.
  • the macro block is moved. Specifically, the enhanced source macroblock is moved/covered to the position of the target macroblock in the target frame. That is, the enhanced source macroblock is used as the enhanced target macroblock. Step S7100 can be executed afterwards.
  • the target frame is a P frame
  • the reference frame of the target frame is an I frame
  • the source macroblock 1'corresponding to the target macroblock 1 is located in the reference frame.
  • the macro at the position (x1', y1') in the enhanced I frame I enh The block is overlaid at the position (x1, y1) in the target frame.
  • the macro block on (x1', y1') in I enh is the enhanced result of the source macro block 1'on (x1, y1) in the I frame, that is, the enhanced source macro block.
  • the difference information corresponding to the target macroblock 1 in FIG. 7 has a small amount of information, and the macroblock at the position (x1', y1') in I enh is overlaid at the position (x1, y1 ) in the Penh frame. )superior.
  • the P enh frame refers to an enhanced P frame.
  • the description here only takes the source macroblock as the macroblock in the I frame as an example, and does not limit the source macroblock in the embodiment of the present application.
  • the source macro block of the target macro block can be determined by the information obtained after the target frame is decoded.
  • the reference frame may also be another decoded P frame, and the source macro block may be a macro block in the decoded P frame.
  • the target frame is a B frame
  • the reference frames of the target frame are frame 1'and frame 1".
  • the target macroblock is located in the target frame
  • the source macroblock 1'corresponding to the target macroblock is located in frame 1'
  • the target macroblock The source macroblock 1'corresponding to the block is located in frame 1".
  • the macro at the position (x1', y1') in the enhanced frame 1' The block is overlaid at the position (x1, y1) in the target frame.
  • the information amount of the difference information between the source macro block 1'and the target macro block is less than or equal to the information amount of the difference information between the source macro block 1" and the target macro block, that is, the source macro block 1'and the target macro block
  • the similarity between the two is higher.
  • the macro block on (x1', y1') in the enhanced frame 1' is the result of the enhancement of the source macro block on (x1', y1') in the frame 1' , That is, the enhanced source macroblock.
  • the position in the enhanced frame 1' covers the position (x1, y1) in the target frame; and, according to the position (x1", y1") of the source macroblock 1" and the position (x1, y1") of the target macroblock y1), overlay the macroblock at the position (x1", y1") in the enhanced frame 1" on the position (x1, y1) in the target frame.
  • the enhanced frame 1' Take the weighted average of the pixel values in the macroblock at the position (x1', y1') and the macroblock at the position (x1”, y1”) in the enhanced frame 1”, and overlay the weighted average to The location of the target macroblock (x1, y1).
  • the macro block on (x1', y1') in the enhanced frame 1' is the result of the enhancement of the source macro block on (x1', y1') in the frame 1', and the enhanced frame 1"
  • the macroblock on (x1", y1") is the result of the enhancement of the source macroblock on (x1", y1") in frame 1".
  • the reference frame can be a decoded I frame or other decoded P frames.
  • the macro block is moved. Specifically, the source macroblock is moved/overlaid on the position of the target macroblock in the target frame.
  • the target frame is a P frame
  • the reference frame of the target frame is an I frame
  • the source macroblock 2'corresponding to the target macroblock 2 is located in the reference frame.
  • the macro block at the position (x2', y2') in the reference frame is overlaid on the position in the target frame (x2, y2) on.
  • the source macro block of the target macro block can be determined by the information obtained after the target frame is decoded.
  • the reference frame may also be another decoded P frame
  • the source macro block may be a macro block in the decoded P frame.
  • the target frame is frame 1
  • frame 1 is frame B
  • the reference frames of frame 1 are frame 1'and frame 2'.
  • the source macro block 1'and the source macro block 2'corresponding to the target macro block are determined according to the matching relationship between the decoded macro blocks.
  • the information amount of the difference information between the source macro block 1'and the source macro block 2'and the target macro block is determined.
  • the macro block at the position (x2, y2) in the frame 1' is overlaid on the position in the target frame (x2', y2') on.
  • the information amount of the difference information between the source macro block 1'and the target macro block is less than or equal to the information amount of the difference information between the source macro block 2'and the target macro block, that is, the source macro block 1'and the target macro block The similarity between them is higher.
  • the macroblock at the position (x2, y2) in the frame 1' is covered in the target frame Position (x2', y2'); and, according to the position (x4, y4) of the source macroblock 2'and the position (x2', y2') of the target macroblock, the position (x3, The macro block on y3) covers the position (x2', y2') in the target frame.
  • the pixel values in the macro block at the position (x2, y2) in the frame 1'and the pixel value in the macro block at the position (x4, y4) in the frame 2' may be taken as a weighted average, and the weighted average Cover to the location of the target macro block.
  • the differential compensation is the reverse operation of the differential calculation corresponding to the encoding end.
  • the difference compensation may be adding the difference information corresponding to the source macro block and the target macro block.
  • the target frame is a P frame
  • the difference information of the target macroblock is forward difference information, that is, the reference frame of the target frame is located before the target frame.
  • the pixel value at the position of the target macroblock obtained in step S770 is added to the difference information of the target macroblock.
  • an approximate target frame can be obtained after the macroblocks in the reference frame are reorganized according to the position of the source macroblock and the position of the target macroblock.
  • the approximate target frame and the target frame The difference between the two is the difference information.
  • the target frame is a P frame
  • the reference frame of the target frame is an I frame
  • the target macro block is located in the target frame
  • the source macro block is located in the reference frame.
  • the source macroblocks corresponding to the macroblocks in the P frame can be obtained in the I frame through macroblock matching, and the macroblocks in the I frame can be reorganized to obtain the P'frame through the macroblock splicing.
  • the difference between the P frame and the P'frame is the difference information corresponding to the P frame. Therefore, when performing differential compensation, the target macro block 2 is located in the target frame, and the source macro block 2'is located in the reference frame.
  • the difference information corresponding to the source macro block 2'and the target macro block 2 is added to obtain the target Macro block 2.
  • FIG. 10 shows the difference information corresponding to the target frame, which does not mean that it is necessary to perform difference compensation for each image block in the target frame.
  • the target frame is a B frame
  • the difference information of the target macroblock includes forward difference information and backward difference information, that is, the reference frame of the target frame is located before and after the target frame, respectively.
  • the forward difference information is the difference information between the target macro block and the source macro block 1'
  • the backward difference information is the difference information between the target macro block and the source macro block 1". For example, it is judged whether the forward difference information and the backward The information amount of the difference information.
  • the pixel value at the position of the target macroblock is added to the forward difference information; if the backward difference information is If the amount of information is less than or equal to the information amount of the forward difference information, the pixel value at the position of the target macroblock is added to the backward difference information.
  • a weighted average value can be taken for the forward difference information and the backward difference information, The weighted average value is added to the pixel value at the position of the target macroblock.
  • step S770 and step S780 may be reversed. That is, step S780 may be executed first, and then step S770 may be executed. Specifically, differential compensation may be performed first, and the macroblock obtained after compensation may be overlaid on the position of the target macroblock in the target frame.
  • step S790 macro block enhancement. Specifically, block enhancement is performed on the target macroblock obtained in step S770. That is, block enhancement is performed on the target macroblock obtained after differential compensation. As shown in FIG. 7, the target macroblock can be input into the second video enhancement model to obtain the enhanced target macroblock. Step S790 can be enhanced with reference to the aforementioned step S650, which will not be repeated here.
  • FIG. 11 shows the comparison of image effects before and after the blocking effect is removed.
  • FIG. 11(a) is the target frame before the blocking effect is removed
  • FIG. 11(b) is the target frame after the blocking effect is removed. After the block effect is eliminated, the abrupt sense of the block boundary is obviously eliminated.
  • steps S750 to S790 may be repeated for each macroblock in the target frame to obtain an enhanced target frame obtained by combining the respective macroblocks, and then the enhanced target frame is removed from blocking effects.
  • steps S750 to S770 can be performed for each macroblock in the target frame to obtain the macroblock at the position of each macroblock in the target frame, and then steps S780 to S7100 are performed, that is, differential compensation is performed, and then The macro block obtained after the differential compensation is enhanced to obtain the enhanced target frame, and then the block effect is removed on the enhanced target frame.
  • the method 700 further includes determining whether to perform blocking removal.
  • the adaptive filter in the decoder can be used to determine whether block artifact removal is required, that is, whether filtering is required.
  • the strength of the filtering can also be determined by an adaptive encoder.
  • the difference compensation can be performed and the area can be enhanced; in the case of the inter-frame difference information
  • the amount of information is small, for example, an area where there is almost no relative motion between frames
  • the enhancement result of this area in the reference frame is directly multiplexed. In this way, the redundant information between frames can be fully utilized, without the need for global enhancement frame by frame, which greatly improves the efficiency of video enhancement.
  • the solution is more convenient to implement, which is conducive to real-time online processing, with fast enhancement speed, low power consumption, good enhancement effects, and can be greatly improved
  • the user-side viewing video experience especially for old videos with poor picture quality and low resolution.
  • the correlation between frames can be guaranteed, and the use of any enhancement model, such as the CNN model, will not affect the correlation between frames and solve the problem of time consistency.
  • Fig. 12 shows a video enhancement method according to an embodiment of the present application.
  • the video enhancement method of the embodiment of the present application is described by taking an example of video super division.
  • the syntax elements of the target frame may include: the size of the macroblock in the target frame, the matching of the macroblock, the location of the source macroblock (x', y'), the location of the target macroblock (x, y), and difference information, etc.
  • the image block is a macroblock as an example for description.
  • the macroblock in the syntax element can also be replaced with an image block of another size in the process of video encoding and decoding. That is to say, the method of dividing the frame into image blocks in the embodiment of the present application may be the same as the method of dividing the frame into image blocks in the encoding and decoding process.
  • I frames in a low-scoring video can be input into the first enhancement model.
  • the first enhancement model can be used for video super-division.
  • steps refer to the aforementioned step S630.
  • the macroblock is moved according to the syntax element obtained in step (A). Specifically, the enhanced source macroblock is moved/covered to the position of the target macroblock in the target frame.
  • step (A) If the information amount of the difference information corresponding to the target macroblock is greater than or equal to the first threshold, macroblock movement and difference compensation are performed according to the syntax element obtained in step (A), and the superdivision operation is performed on the obtained target macroblock.
  • step S650 For specific steps, refer to the aforementioned step S650.
  • Figures 14 and 15 show the experimental results of super-dividing videos using the video enhancement method provided in the embodiments of the present application.
  • Figures 14 and 15 are the super-divided results of the second frame (the first P frame) and the 15th frame (the 15th P frame) in a video stream obtained by continuous decoding, respectively.
  • Figure 14(a) and Figure 14 15(a) is the result of directly performing the super-division operation
  • FIG. 14(b) and FIG. 15(b) are the result obtained by using the video enhancement method of the embodiment of the present application. It can be seen from FIG. 14 and FIG. 15 that the video enhancement method provided by the embodiment of the present application hardly loses the image quality effect.
  • FIG. 16 is a schematic block diagram of a video enhancement device 800 provided by an embodiment of the present application.
  • the video enhancement device 800 shown in FIG. 16 includes a decoding module 810 and an enhancement module 820.
  • the apparatus 800 may be an embodiment of the execution device 110 in FIG. 1.
  • At least one of the decoding module 810 and the enhancement module 820 may be implemented in hardware, or in the form of software, or a combination of software and hardware.
  • a module When a module is implemented by hardware, it may be included in the execution device 110, specifically included in at least one of the main CPU and the neural network processor 50 in FIG. 2.
  • the module When the module is implemented in software, it may be a software module executed by the execution device 110 or the main CPU and the neural network processor 50 in FIG. 2.
  • the decoding module 810 and the enhancement module 820 may be used to execute the video enhancement method of the embodiment of the present application. Specifically, the method 600 shown in FIG. 5 or the method 700 shown in FIG. 6 may be executed.
  • the decoding module 810 is configured to decode the to-be-processed video to obtain syntax elements of multiple frames.
  • the enhancement module 820 is configured to: determine an I frame based on the syntax elements of the I frames in the multiple frames; perform frame enhancement on the I frames to obtain the enhanced I frames; determine based on the syntax elements of the non-I frames in the multiple frames
  • the first part of the image block in the multiple image blocks in the non-I frame, the non-I frame includes a P frame or a B frame; block enhancement is performed on the first part of the image block to obtain the enhanced first part of the image block; according to the enhanced The first part of the image block is enhanced non-I frame.
  • the syntax element of the non-I frame includes the first reference image block of the first image block in the first partial image block and the first difference information corresponding to the first image block, and the first difference information is used to indicate that the first image block and the For the difference between the first reference image blocks, the information amount of the first difference information is greater than or equal to the first threshold; the enhancement module 820 is specifically configured to: use the first difference information to perform differential compensation on the first reference image block to obtain the first image Piece.
  • the enhancement module 820 is further configured to: determine a second part of the image blocks except the first part of the image blocks in the plurality of image blocks based on the syntax elements of the non-I frame; and, the enhancement module 820 is specifically configured to: The first part of the image block and the second part of the image block are enhanced non-I frames, where the second part of the image block has not undergone block enhancement.
  • the syntax element of the non-I frame includes a second reference image block of the second image block in the second partial image block, and the second reference image block is an enhanced image block.
  • the syntax element of the non-I frame further includes second difference information corresponding to the second image block, and the second difference information is used to indicate the difference between the second image block and the second reference image block.
  • the amount of information is less than the first threshold, and the enhancement module 820 is specifically configured to determine the second reference image block as the second image block.
  • the first enhancement model used for frame enhancement of the I frame is the same as the second enhancement model used for block enhancement of the first partial image block.
  • At least one of the first enhancement model and the second enhancement model is a neural network enhancement model.
  • frame enhancement or block enhancement includes at least one of the following: noise reduction processing, sharpening processing, super-division processing, de-mosaic, contrast adjustment or saturation adjustment.
  • the enhancement module is also used to: remove the blocking effect on the enhanced non-I frame.
  • Fig. 17 is a schematic diagram of the hardware structure of a video enhancement device provided by an embodiment of the present application.
  • the video enhancement device 3000 shown in FIG. 17 includes a memory 3001, a processor 3002, a communication interface 3003, and a bus 3004.
  • the memory 3001, the processor 3002, and the communication interface 3003 implement communication connections between each other through the bus 3004.
  • the apparatus 3000 may be an embodiment of the execution device 110 in FIG. 1.
  • the memory 3001 may be located in the data storage system 150 in FIG. 1, and the processor 3002 and the communication interface 3003 are located in the execution device 110 through the bus 3004.
  • the memory 3001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 3001 may store a program.
  • the processor 3002 is configured to execute each step of the video enhancement method in the embodiment of the present application. Specifically, the processor 3002 may execute the method 600 shown in FIG. 5 or the method 700 shown in FIG. 6 above.
  • the processor 3002 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the video enhancement method in the method embodiment of the present application.
  • the processor 3002 may also be an integrated circuit chip with signal processing capability. For example, it may be the chip shown in FIG. 2.
  • each step of the video enhancement method of the present application can be completed by hardware integrated logic circuits in the processor 3002 or instructions in the form of software.
  • the above-mentioned processor 3002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 3001, and the processor 3002 reads the information in the memory 3001, and combines its hardware to complete the functions required by the units included in the video enhancement device of the embodiment of the present application, or perform the video enhancement of the embodiment of the present application Methods.
  • the communication interface 3003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 3000 and other devices or communication networks.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 3000 and other devices or communication networks.
  • the video to be processed can be obtained through the communication interface 3003.
  • the bus 3004 may include a path for transferring information between various components of the device 3000 (for example, the memory 3001, the processor 3002, and the communication interface 3003).
  • the processor in the embodiment of the present application may be a central processing unit (CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits. (application specific integrated circuit, ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • Access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory Take memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).
  • the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions or computer programs.
  • the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium.
  • the semiconductor medium may be a solid state drive.
  • the computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another (for example, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this application.
  • the computer program product may include a computer readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or structures that can be used to store instructions or data Any other media that can be accessed by the computer in the form of desired program code. And, any connection is properly termed a computer-readable medium.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave to transmit instructions from a website, server, or other remote source
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transitory tangible storage media.
  • magnetic disks and optical disks include compact disks (CDs), laser disks, optical disks, digital versatile disks (DVD) and Blu-ray disks, where disks usually reproduce data magnetically, while optical disks use lasers to reproduce data optically data. Combinations of the above should also be included in the scope of computer-readable media.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • DSP digital signal processors
  • ASSIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • processor may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or combined Into the combined codec.
  • the technology may be fully implemented in one or more circuits or logic elements.
  • the technology of this application can be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a set of ICs (for example, chipsets).
  • ICs integrated circuits
  • a set of ICs for example, chipsets.
  • Various components, modules, or units are described in this application to emphasize the functional aspects of the device for implementing the disclosed technology, but they do not necessarily need to be implemented by different hardware units.
  • various units can be combined with appropriate software and/or firmware in the codec hardware unit, or by interoperating hardware units (including one or more processors as described above). supply.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开了人工智能(AI)领域中的一种视频增强的方法及装置,该视频增强的方法包括:对待处理视频进行解码得到多个帧的语法元素;基于多个帧中的I帧的语法元素确定I帧;对I帧进行帧增强,以得到增强后的I帧;基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;对第一部分图像块进行块增强,以得到增强后的第一部分图像块;根据增强后的第一部分图像块得到增强后的非I帧。本申请的方法能够提高视频增强的效率,满足视频增强实时性的需求。

Description

视频增强的方法及装置 技术领域
本申请涉及视频图像处理技术领域,尤其涉及一种视频增强的方法及装置。
背景技术
随着诸如手机、平板电脑、相机等各种视频播放终端设备的迅速发展,终端侧在线视频增强的应用也越来越广泛。目前,图像、视频增强方法通常是以逐帧处理的方式实现的,但该方法需要很大的算力开销,导致处理速度缓慢,严重影响了处理效率。例如,对于视频超分任务,若原始分辨率为720p,目标分辨率为4k,视频的帧率为30fps,采用视频高效亚像素卷积神经网络(video efficient sub-pixel convolutional neural network,VESPCN)的算力需求为2.0T,帧递归视频超分(frame recurrent video super-resolution,FRVSR)的算力需求为12.30T,超分生成对抗网络(super-resolution generative adversarial network,SRGAN)的算力需求接近40T。这种算力开销在目前的芯片平台上几乎无法满足实时处理的需求。即使采用其他手段,例如,模型压缩、量化等,勉强在芯片平台上实时运行,其带来的功耗开销也是巨大的。这样会导致用户在观看几分钟视频后,设备就出现发热发烫的问题。
因此,如何在保证视频增强效果的情况下,快速实现视频增强成为一个亟待解决的问题。
发明内容
本申请提供一种视频增强的方法及装置,能够提高视频增强的效率,满足视频增强实时性的需求。
第一方面,提供了一种视频增强的方法,包括:对待处理视频进行解码得到多个帧的语法元素;基于多个帧中的I帧的语法元素确定I帧;对I帧进行帧增强,以得到增强后的I帧;基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;对第一部分图像块进行块增强,以得到增强后的第一部分图像块;根据增强后的第一部分图像块得到增强后的非I帧。
其中,该待处理视频可以为经编码视频(流)或者说是压缩的视频(流)。
根据语法元素可以区别当前帧为I帧或非I帧。
例如,若语法元素中包括当前帧的差分信息,则当前帧为P帧/B帧,若语法元素中不包括当前帧的差分信息,则该当前帧为I帧。
本申请实施例中,增强包括但不限于降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节等。
具体地,对I帧进行帧增强,可以包括:将I帧输入第一增强模型中对I帧进行帧增强。帧增强也可以称为全局增强。其中,第一增强模型可以为传统增强模型。例如,利用 PQ pipeline对I帧进行增强。或者,第一增强模型也可以为人工智能(artificial intelligence,AI)增强模型。例如,第一增强模型可以是基于卷积神经网络CNN结构搭建的。
第一部分图像块可以包括一个图像块,也可以包括多个图像块。
具体地,对第一部分图像块进行块增强,可以包括:将第一部分图像块输入第二增强模型中对第一部分图像块进行块增强。其中,第二增强模型可以为传统增强模型。例如,利用PQ pipeline对第一图像块进行增强。或者,第二增强模型也可以为AI增强模型。例如,第二增强模型可以是基于卷积神经网络CNN结构搭建的。
根据本申请实施例的方案,分别对I帧和非I帧进行不同的增强处理,对非I帧中的第一部分图像块进行块增强,将进行增强处理的数据由帧级变为图像块级,能够大大减少计算量,提高增强处理的效率。
此外,通过将解码过程与增强过程进行融合,充分利用编解码过程中的码流信息,使方案实现更便捷,有利于实时在线处理,增强速度快、功耗低,增强效果好,能够大大改善用户端侧观看视频体验,尤其是对于画质较差、分辨率较低的老旧视频。
结合第一方面,在第一方面的某些实现方式中,非I帧的语法元素包括第一部分图像块中第一图像块的第一参考图像块以及第一图像块对应的第一差分信息,第一差分信息用于指示第一图像块与第一参考图像块之间的差异,第一差分信息的信息量大于或等于第一阈值;基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块包括:利用第一差分信息对第一参考图像块进行差分补偿以得到第一图像块。
差分信息的信息量越大,图像块与参考图像块之间的差异越大。
示例性地,差分信息的信息量可以通过计算该差分信息内的像素值的方差得到。该差分信息内的像素值的方差越大,则差分信息的信息量越大。
根据本申请实施例的方案,基于图像块与参考图像块中的差分信息确定与参考图像块之间的差异较大的第一图像块,并对第一图像块进行增强处理,能够保证视频增强的效果,同时减少计算量。
结合第一方面,在第一方面的某些实现方式中,方法还包括:基于非I帧的语法元素确定多个图像块中除了所述第一部分图像块外的第二部分图像块,以及所述根据所述增强后的第一部分图像块得到增强后的非I帧,包括:根据增强后的第一部分图像块和第二部分图像块得到增强后的非I帧,其中,第二部分图像块没有经过块增强。
例如,非I帧的语法元素包括第二部分图像块中的第二图像块的第二参考图像块和第二图像块对应的第二差分信息。第二差分信息用于指示第二图像块和第二参考图像块之间的差异。该第二参考图像块为非I帧的参考帧中的图像块。
基于所述非I帧的语法元素确定所述多个图像块中除了所述第一部分图像块外的第二部分图像块,可以包括:根据第二差分信息对第二参考图像块进行差分补偿,以得到第二部分图像块。
根据本申请实施例的方案,对非I帧中的第二部分图像块不进行块增强,能够进一步减少计算量,提高处理的效率。
结合第一方面,在第一方面的某些实现方式中,非I帧的语法元素包括第二部分图像块中的第二图像块的第二参考图像块,第二参考图像块是增强后的图像块。
例如,通过解码可以得到非I帧中的第二图像块的参考图像块的位置,该第二图像块 的参考图像块位于非I帧的参考帧中。非I帧的参考帧的增强后的结果可以称为增强后的参考帧。基于该第二图像块的参考图像块的位置,可以确定在增强后的参考帧中该位置上的图像块,将该图像块作为第二参考图像块。也就是说第二参考图像块可以为增强后的参考帧中的图像块。
结合第一方面,在第一方面的某些实现方式中,非I帧的语法元素还包括第二图像块对应的第二差分信息,第二差分信息用于指示第二图像块与第二参考图像块之间的差异,第二差分信息的信息量小于第一阈值;基于所述非I帧的语法元素确定多个图像块中除了第一部分图像块外的第二部分图像块包括:将第二参考图像块确定为第二图像块。
根据本申请实施例的方案,对于与参考图像块差异较小的图像块,将增强后的图像块作为第二图像块,无需得到非I帧中的第二图像块,相应地,也无需对非I帧中的第二图像块进行块增强,直接复用增强后的图像块作为增强后的非I帧中的第二图像块,这样可以进一步提高视频增强处理的效率。
结合第一方面,在第一方面的某些实现方式中,用于对I帧进行帧增强的第一增强模型与用于对第一部分图像块进行块增强的第二增强模型相同。
根据本申请实施例的方案,采用相同的增强模型,能够减少增强模型的训练成本以及存储空间,提高处理效率。
结合第一方面,在第一方面的某些实现方式中,所述第一增强模型和所述第二增强模型中的至少一个增强模型是神经网络增强模型。
根据本申请实施例的方案,采用神经网络增强模型能够充分利用模型结构中的各种非线性操作,学习隐藏在输入、标签(input,label)样本对之间的映射关系,提高模型的范化能力,解决画质效果问题,得到更好的增强效果。
结合第一方面,在第一方面的某些实现方式中,帧增强或块增强包括如下至少一项:降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节。
结合第一方面,在第一方面的某些实现方式中,方法还包括:对增强后的非I帧进行块效应去除。
根据本申请实施例的方案,通过对增强后的非I帧进行块效应去除,能够消除图像块边界的突兀感。
第二方面,提供了一种视频增强的装置,包括:解码模块,用于对待处理视频进行解码得到多个帧的语法元素;增强模块,用于:基于多个帧中的I帧的语法元素确定I帧;对I帧进行帧增强,以得到增强后的I帧;基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;对第一部分图像块进行块增强,以得到增强后的第一部分图像块;根据增强后的第一部分图像块得到增强后的非I帧。
根据本申请实施例的方案,分别对I帧和非I帧进行不同的增强处理,对非I帧中的第一部分图像块进行块增强,将进行增强处理的数据由帧级变为图像块级,能够大大减少计算量,提高增强处理的效率。
此外,通过将解码过程与增强过程进行融合,充分利用编解码过程中的码流信息,使方案实现更便捷,有利于实时在线处理,增强速度快、功耗低,增强效果好,能够大大改善用户端侧观看视频体验,尤其是对于画质较差、分辨率较低的老旧视频。
结合第二方面,在第二方面的某些实现方式中,非I帧的语法元素包括第一部分图像块中第一图像块的第一参考图像块以及第一图像块对应的第一差分信息,第一差分信息用于指示第一图像块与第一参考图像块之间的差异,第一差分信息的信息量大于或等于第一阈值;增强模块具体用于:利用第一差分信息对第一参考图像块进行差分补偿以得到第一图像块。
结合第二方面,在第二方面的某些实现方式中,增强模块还用于:基于的非I帧的语法元素确定多个图像块中除了第一部分图像块外的第二部分图像块;以及,增强模块具体用于:根据增强后的第一部分图像块和第二部分图像块得到增强后的非I帧,其中,第二部分图像块没有经过块增强。
结合第二方面,在第二方面的某些实现方式中,非I帧的语法元素包括第二部分图像块中的第二图像块的第二参考图像块,第二参考图像块是增强后的图像块。
结合第二方面,在第二方面的某些实现方式中,非I帧的语法元素还包括第二图像块对应的第二差分信息,第二差分信息用于指示第二图像块与第二参考图像块之间的差异,第二差分信息的信息量小于第一阈值,以及增强模块具体用于:将第二参考图像块确定为第二图像块。
结合第二方面,在第二方面的某些实现方式中,用于对I帧进行帧增强的第一增强模型与用于对第一部分图像块进行块增强的第二增强模型相同。
结合第二方面,在第二方面的某些实现方式中,所述第一增强模型和所述第二增强模型中的至少一个增强模型是神经网络增强模型。
结合第二方面,在第二方面的某些实现方式中,帧增强或块增强包括如下至少一项:降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节。
结合第二方面,在第二方面的某些实现方式中,增强模块还用于:对增强后的非I帧进行块效应去除。
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面中相同的内容。
第三方面,提供一种视频增强装置,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行第一方面的任意一种方法的部分或全部步骤。
第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行第一方面的任意一种方法的部分或全部步骤的指令。
第五方面,提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面的任意一种方法的部分或全部步骤。
第六方面,提供一种芯片,该芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行第一方面的任意一种方法的部分或全部步骤。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面的任意一种方法的部分或全部步骤。
应当理解的是,本申请的第二至六方面与本申请的第一方面的技术方案一致,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
图1是本申请实施例提供的系统架构的结构示意图;
图2是本申请实施例提供的一种芯片硬件结构示意图;
图3是本申请实施例提供的系统架构的结构示意图;
图4是一种视频编解码方法示意性流程图;
图5是本申请实施例提供的一种视频增强的方法的示意性流程图;
图6是本申请实施例提供的另一种视频增强的方法的示意性流程图;
图7是本申请实施例提供的视频增强的方法的示意图;
图8是本申请实施例提供的全局增强的示意图;
图9是本申请实施例提供的差分信息的示意图;
图10是本申请实施例提供的差分信息计算方法的示意图;
图11是本申请实施例提供的块效应去除的效果示意图;
图12是本申请实施例提供的一种视频超分的方法的示意性流程图;
图13是本申请实施例提供的另一种视频超分的方法的示意性流程图;
图14是本申请实施例提供的一帧的增强效果示意图;
图15是本申请实施例提供的另一帧的增强效果示意图;
图16是本申请实施例提供的一种视频增强的装置的示意性框图;
图17是本申请实施例提供的另一种视频增强的装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。本申请实施例所涉及的技术方案能够应用于视频增强的场景中。具体而言,本申请实施例的视频增强的方法不仅可能应用于基于现有的视频编码标准中(如H.264、HEVC等标准)的应用,还可能应用于基于未来的视频编码标准中(如H.266标准)的应用。
目前,当用户在终端设备(例如,手机)等各种视频播放设备上播放视频时,通过端侧视频增强能够提高视频质量,提升用户体验。利用本申请实施例的视频增强的方法,能够将视频增强与视频编解码相结合,在保证视频增强效果的情况下,实现在线终端侧视频增强。
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。下面先对本申请实施例可能涉及的一些概念进行简单介绍。
(1)视频编码和视频解码
视频压缩技术执行空间(图像内)预测和/或时间(图像间)预测以减少或去除视频序列中固有的冗余。对于基于块的视频编码,视频条带(即,视频帧或视频帧的一部分)可分割成若干图像块,所述图像块也可被称作树块、编码单元(coding unit,CU)和/或编码节点。使用关于同一图像中的相邻块中的参考样本的空间预测来编码图像的待帧内编码(I)条带中的图像块。图像的待帧间编码(P或B)条带中的图像块可使用相对于同一图像中的相邻块中的参考样本的空间预测或相对于其它参考图像中的参考样本的时间预测。图像可被称作帧,且参考图像可被称作参考帧。
视频编码通常是指处理形成视频或视频序列的图片序列。在视频编码领域,术语“图片(picture)”、“帧(frame)”或“图像(image)”可以用作同义词。视频编码在源侧执行,通常包括处理(例如,通过压缩)原始视频图片以减少表示该视频图片所需的数据量,从而更高效地存储和/或传输。视频解码在目的地侧执行,通常包括相对于编码器作逆处理,以重构视频图片。编码部分和解码部分的组合也称为编解码(编码和解码)。
视频序列包括一系列图像(picture),图像被进一步划分为切片(slice),切片再被划分为块(block)。视频编码以块为单位进行编码处理,在一些新的视频编码标准中,块的概念被进一步扩展。比如,在H.264标准中有宏块(macroblock,MB),宏块可进一步划分成多个可用于预测编码的预测块(partition)。在高性能视频编码(high efficiency video coding,HEVC)标准中,采用编码单元(coding unit,CU),预测单元(prediction unit,PU)和变换单元(transform unit,TU)等基本概念,从功能上划分了多种块单元,并采用全新的基于树结构进行描述。比如CU可以按照四叉树进行划分为更小的CU,而更小的CU还可以继续划分,从而形成一种四叉树结构,CU是对编码图像进行划分和编码的基本单元。对于PU和TU也有类似的树结构,PU可以对应预测块,是预测编码的基本单元。对CU按照划分模式进一步划分成多个PU。TU可以对应变换块,是对预测残差进行变换的基本单元。然而,无论CU,PU还是TU,本质上都属于块(或称图像块)的概念。
本申请实施例中,为了便于描述和理解,可将当前编码图像中待编码的图像块称为当前块,例如在编码中,指当前正在编码的块;在解码中,指当前正在解码的块。将参考图像中用于对当前块进行预测的已解码的图像块称为参考块,即参考块是为当前块提供参考信号的块,其中,参考信号表示图像块内的像素值。可将参考图像中为当前块提供预测信号的块为预测块,其中,预测信号表示预测块内的像素值或者采样值或者采样信号。例如,在遍历多个参考块以后,找到了最佳参考块,此最佳参考块将为当前块提供预测,此块称为预测块。
H.261的几个视频编码标准属于“有损混合型视频编解码”(即,将样本域中的空间和时间预测与变换域中用于应用量化的2D变换编码结合)。视频序列的每个图片通常分割成不重叠的块集合,通常在块层级上进行编码。换句话说,编码器侧通常在块(视频块)层级处理亦即编码视频,例如,通过空间(图片内)预测和时间(图片间)预测来产生预测块,从当前块(当前处理或待处理的块)减去预测块以获取残差块,在变换域变换残差块并量化残差块,以减少待传输(压缩)的数据量,而解码器侧将相对于编码器的逆处理部分应用于经编码或经压缩块,以重构用于表示的当前块。另外,编码器复制解码器处理循环,使得编码器和解码器生成相同的预测(例如帧内预测和帧间预测)和/或重构,用于处理亦即编码后续块。
图片可以视为像素点(picture element)的二维阵列或矩阵。阵列中的像素点也可以称为采样点。阵列或图片在水平和垂直方向(或轴线)上的采样点数目定义图片的尺寸和/或分辨率。
编码器(或称视频编码器)可以用于接收经预处理的图片数据,采用相关预测模式对经预处理的图片数据进行处理,从而提供经编码图片数据。
解码器可以用于接收经编码图片数据并提供经解码图片数据或经解码图片。
编码器和解码器都可以实施为各种合适电路中的任一个,例如,一个或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件实施所述技术,则设备可将软件的指令存储于合适的非暂时性计算机可读存储介质中,且可使用一或多个处理器以硬件执行指令从而执行本公开的技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可视为一或多个处理器。
本申请实施例中的编码器和解码器可以是例如H.263、H.264、HEVV、MPEG-2、MPEG-4、VP8、VP9等视频标准协议或者下一代视频标准协议(如H.266等)对应的编/解码器。
视频数据的经编码比特流可以包含本文所论述的与编码视频帧相关的数据、指示符、索引值、模式选择数据等,例如与编码分割相关的数据(例如,变换系数或经量化变换系数,(如所论述的)可选指示符,和/或定义编码分割的数据)。解码器可以用于解码经编码比特流。
关于信令语法元素,解码器可以用于接收并解析这种语法元素,相应地解码相关视频数据。在一些例子中,编码器可以将语法元素熵编码成经编码视频比特流。在此类实例中,解码器可以解析这种语法元素,并相应地解码相关视频数据。
下面对编码器和解码器进行介绍。
首先,编码器接收图片或图片的图像块,例如,形成视频或视频序列的图片序列中的图片。该图像块也可以称为当前图片块或待编码图片块,该图片可以称为当前图片或待编码图片(尤其是在视频编码中将当前图片与其它图片区分开时,其它图片例如同一视频序列亦即也包括当前图片的视频序列中的先前经编码和/或经解码图片)。
编码器可以包括分割单元,用于将图片分割成多个例如图像块的块,通常分割成多个不重叠的块。分割单元可以用于对视频序列中所有图片使用相同的块大小以及定义块大小的对应栅格,或用于在图片或子集或图片群组之间更改块大小,并将每个图片分割成对应的块。
图像块也是或可以视为具有采样值的采样点的二维阵列或矩阵,虽然其尺寸比图片小。换句话说,图像块可以包括,例如,一个采样阵列(例如黑白图片情况下的亮度阵列)或三个采样阵列(例如,彩色图片情况下的一个亮度阵列和两个色度阵列)或依据所应用的色彩格式的任何其它数目和/或类别的阵列。图像块的水平和垂直方向(或轴线)上采样点的数目定义图像块的尺寸。
编码器可以逐块编码图片,例如,对每个图像块执行编码和预测。
具体地,编码器可以基于图片图像块和预测块计算残差块,例如,通过逐样本(逐像素)将图片图像块的样本值减去预测块的样本值,以在样本域中获取残差块。然后,在残差块的样本值上应用例如离散余弦变换(discrete cosine transform,DCT)或离散正弦变换(discrete sine transform,DST)的变换,以在变换域中获取变换系数。变换系数也可以称为变换残差系数,并在变换域中表示残差块。进一步地,例如通过应用标量量化或向量量化来量化变换系数,以获取经量化变换系数。经量化变换系数也可以称为经量化残差系数。量化过程可以减少与部分或全部变换系数有关的位深度。
编码器可以用于基于码率失真优化(rate distortion optimization,RDO)确定预测模式,即选择提供最小码率失真优化的预测模式,或选择相关码率失真至少满足预测模式选择标准的预测模式。例如,编码器可以从(预先确定的)预测模式集合中确定或选择最好或最优的预测模式。预测模式集合可以包括例如帧内预测模式和/或帧间预测模式。
帧内预测模式集合可以包括多种不同的帧内预测模式,例如,如DC(或均值)模式和平面模式的非方向性模式,或如H.265中定义的方向性模式,或者可以包括67种不同的帧内预测模式,例如,如DC(或均值)模式和平面模式的非方向性模式,或如正在发展中的H.266中定义的方向性模式。
在可能的实现中,帧间预测模式集合取决于可用参考图片(即,例如前述存储在DBP230中的至少部分经解码图片)和其它帧间预测参数,例如取决于是否使用整个参考图片或只使用参考图片的一部分,例如围绕当前块的区域的搜索窗区域,来搜索最佳匹配参考块,和/或例如取决于是否应用如半像素和/或四分之一像素内插的像素内插,帧间预测模式集合例如可包括先进运动矢量(advanced motion vector prediction,AMVP)模式和融合(merge)模式。
编码器可以包括帧间预测单元和帧内预测单元。
帧间预测单元可以包含运动估计(motion estimation,ME)单元和运动补偿(motion compensation,MC)单元。运动估计单元用于接收或获取图片图像块(当前图片的当前图片图像块)和经解码图片,或至少一个或多个先前经重构块,例如,一个或多个其它/不同先前经解码图片的经重构块,来进行运动估计。例如,视频序列可以包括当前图片和先前经解码图片,或换句话说,当前图片和先前经解码图片可以是形成视频序列的图片序列的一部分,或者形成该图片序列。
例如,编码器可以用于从多个其它图片中的同一或不同图片的多个参考块中选择参考块,并向运动估计单元提供参考图片和/或提供参考块的位置(X、Y坐标)与当前块的位置之间的偏移(空间偏移)作为帧间预测参数。该偏移也称为运动向量(motion vector,MV)。
运动补偿单元用于获取帧间预测参数,并基于或使用帧间预测参数执行帧间预测来获取帧间预测块。由运动补偿单元执行的运动补偿可以包含基于通过运动估计(可能执行对子像素精确度的内插)确定的运动/块向量取出或生成预测块。内插滤波可从已知像素样本产生额外像素样本,从而潜在地增加可用于编码图片块的候选预测块的数目。一旦接收到用于当前图片块的PU的运动向量,运动补偿单元可以在一个参考图片列表中定位运动向量指向的预测块。运动补偿单元还可以生成与块和视频条带相关联的语法元素,以供解码器在解码视频条带的图片块时使用。
帧内预测单元用于获取,例如接收同一图片的图片块(当前图片块)和一个或多个先前经重构块,例如经重构相相邻块,以进行帧内估计。
编码器可以将熵编码算法或方案(例如,可变长度编码(variable length coding,VLC)方案、上下文自适应VLC(context adaptive VLC,CAVLC)方案、算术编码方案、上下文自适应二进制算术编码(context adaptive binary arithmetic coding,CABAC)、基于语法的上下文自适应二进制算术编码(syntax-based context-adaptive binary arithmetic coding,
SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)编码或其它熵 编码方法或技术)应用于经量化残差系数、帧间预测参数、帧内预测参数和/或环路滤波器参数中的单个或所有上(或不应用),以获取可以通过输出以例如经编码比特流的形式输出的经编码图片数据。可以将经编码比特流传输到视频解码器,或将其存档稍后由视频解码器传输或检索。进一步地,还可以熵编码正被编码的当前视频条带的其它语法元素。
视频解码器用于接收例如由编码器编码的经编码图片数据(例如,经编码比特流),以获取经解码图片。在解码过程期间,视频解码器从视频编码器接收视频数据,例如表示经编码视频条带的图片块的经编码视频比特流及相关联的语法元素。
解码器可以对经编码图片数据执行熵解码,以获取例如经量化系数和/或经解码的编码参数,例如,帧间预测、帧内预测参数、环路滤波器参数和/或其它语法元素中(经解码)的任意一个或全部。视频解码器可接收视频条带层级和/或视频块层级的语法元素。
解码器可以包括帧间预测单元和帧内预测单元,其中帧间预测单元功能上可以类似于编码器的帧间预测单元,帧内预测单元功能上可以类似于编码器的帧内预测单元。
当视频条带经编码为经帧内编码(I)条带时,帧内预测单元基于信号表示的帧内预测模式及来自当前帧或图片的先前经解码块的数据来产生用于当前视频条带的图片块的预测块。当视频帧经编码为经帧间编码(即B或P)条带时,帧间预测单元(例如,运动补偿单元)基于运动向量及接收的其它语法元素生成用于当前视频条带的视频块的预测块。对于帧间预测,可从一个参考图片列表内的一个参考图片中产生预测块。
通过解析运动向量和其它语法元素,确定用于当前视频条带的视频块的预测信息,并使用预测信息产生用于正经解码的当前视频块的预测块。在本申请一实例中,通过接收到的一些语法元素可以确定用于编码视频条带的视频块的预测模式(例如,帧内或帧间预测)、帧间预测条带类型(例如,B条带、P条带或GPB条带)、用于条带的参考图片列表中的一个或多个的建构信息、用于条带的每个经帧间编码视频块的运动向量、条带的每个经帧间编码视频块的帧间预测状态以及其它信息,以解码当前视频条带的视频块。在本申请的另一实例中,视频解码器从比特流接收的语法元素包含接收自适应参数集(adaptive parameter set,APS)、序列参数集(sequence parameter set,SPS)、图片参数集(picture parameter set,PPS)或条带标头中的一个或多个中的语法元素。
解码器可以逆量化(即,反量化)在比特流中提供的经量化变换系数。逆量化过程可包含使用由视频编码器针对视频条带中的每一视频块所计算的量化参数来确定应该应用的量化程度并同样确定应该应用的逆量化程度。然后,将逆变换(例如,逆DCT、逆整数变换或概念上类似的逆变换过程)应用于变换系数,以便在像素域中产生残差块。进一步地,将逆变换块(即经重构残差块)添加到预测块,以在样本域中获取经重构块,例如通过将经重构残差块的样本值与预测块的样本值相加。
进一步地,解码器对经重构块进行滤波以获取经滤波块,从而顺利进行像素转变或提高视频质量。解码器可以包括一个或多个环路滤波器,例如去块滤波器、样本自适应偏移(sample-adaptive offset,SAO)滤波器或其它滤波器,例如双边滤波器、自适应环路滤波器(adaptive loop filter,ALF),或锐化或平滑滤波器,或协同滤波器。环路滤波器单元可以为环内滤波器,在其它配置中,环路滤波器单元也可实施为环后滤波器。
解码器可以输出经解码图片,以向用户呈现或供用户查看。
视频解码器的其它变型可用于对压缩的比特流进行解码。例如,解码器可以在没有环 路滤波器单元的情况下生成输出视频流。
(2)视频增强
视频增强指的是对视频所做的能够提高视频质量的动作,例如,视频增强包括:超分、降噪、锐化、去马赛克、对比度调节或饱和度调节等。
(3)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2020082815-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于对神经网络中获取到的特征进行非线性变换,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(4)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2020082815-appb-000002
其中,
Figure PCTCN2020082815-appb-000003
是输入向量,
Figure PCTCN2020082815-appb-000004
是输出向量,
Figure PCTCN2020082815-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2020082815-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2020082815-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2020082815-appb-000008
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例,假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020082815-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020082815-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(5)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷 积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(6)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。
在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。
(7)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(8)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正的神经网络模型中参数的数值,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在 得到最优的神经网络模型的参数,例如权重矩阵。
如图1所示,本申请实施例提供了一种系统架构100。在图1中,数据采集设备160用于采集训练数据。针对本申请实施例的视频增强的方法来说,训练数据可以包括训练视频。
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
上述目标模型/规则101能够用于实现本申请实施例的视频增强的方法。本申请实施例中的目标模型/规则101具体可以为神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图1所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图1中,执行设备110配置输入/输出
(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待处理视频。
预处理模块113用于根据I/O接口112接收到的输入数据(如待处理视频)进行预处理,在本申请实施例中,也可以没有预处理模块113,而直接采用计算模块111对输入数据进行处理。在本申请实施例中,预处理模块113可以用于对输入的待处理视频进行解码。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,如上述得到的增强后的视频返回给客户设备140,从而提供给用户。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图1中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图1中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
如图1所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是神经网络,具体的,本申请实施例的神经网络可以为CNN,深度卷积神经网络(deep convolutional neural networks,DCNN),循环神经网络(recurrent neural network,RNN)等等。
图2为本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器50。该芯片可以被设置在如图1所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图1所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则101。本申请实施例中的视频增强模型可在如图2所示的芯片中得以实现。
神经网络处理器(neural-network processing unit,NPU)50作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路503,控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路503内部包括多个处理模块(process engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器508(accumulator)中。
向量计算单元507可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。
在一些实现种,向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器506用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器505(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。
总线接口单元510(bus interface unit,BIU),用于通过总线实现主CPU、DMAC和 取指存储器509之间进行交互。
与控制器504连接的取指存储器509(instruction fetch buffer),用于存储控制器504使用的指令;
控制器504,用于调用指存储器509中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
其中,本申请实施例中的视频增强模型的相关运算可以由运算电路503或向量计算单元507执行。
如图3所示,本申请实施例提供了一种系统架构300。该系统架构包括本地设备301、本地设备302以及执行设备310和数据存储系统350,其中,本地设备301和本地设备302通过通信网络与执行设备310连接。
执行设备310可以由一个或多个服务器实现。可选的,执行设备310可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备310可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备310可以使用数据存储系统350中的数据,或者调用数据存储系统350中的程序代码来实现本申请实施例的视频增强的方法。
具体地,在一种实现方式中,执行设备310可以执行以下过程:对待处理视频进行解码得到多个帧的语法元素;基于多个帧中的I帧的语法元素确定I帧;对I帧进行帧增强,以得到增强后的I帧;基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;对第一部分图像块进行块增强,以得到增强后的第一部分图像块;根据增强后的第一部分图像块得到增强后的非I帧。通过上述过程执行设备110能够针对I帧和非I帧进行相应的处理,得到增强后的I帧和非I帧。
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备310进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备310进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。
在一种实现方式中,本地设备301、本地设备302从执行设备310获取到视频增强模型的相关参数,将视频增强模型部署在本地设备301、本地设备302上,利用该视频增强模型进行视频增强,也就是说本地设备301或本地设备302可以执行前述执行设备310执行的步骤。
在另一种实现中,执行设备310上可以直接部署视频增强模型,执行设备310通过从本地设备301和本地设备302获取待处理视频,并采用视频增强模型对待处理视频进行视频增强。
上述执行设备310也可以为云端设备,此时,执行设备310可以部署在云端;或者,上述执行设备310也可以为终端设备,此时,执行设备310可以部署在用户终端侧,本申请实施例对此并不限定。
下面结合图4对视频编解码进行说明。对于I帧,在编码端,可以采用帧内编码方式,类似于图像编码,例如,联合图像专家组(joint photographic experts group,JPEG)编码;在解码端,几乎可以实现对I帧的无损恢复。
P帧和B帧的编解码方式与I帧有显著不同。下面以P帧为例进行说明。
如图4所示,P帧的参考帧可以为I帧。在编码端,将I帧和P帧划分成多个宏块。对I帧和P帧中的宏块进行宏块匹配,比较两帧的宏块之间的匹配程度。两个宏块的匹配程度越高,则该两个宏块的相似度越高。当P帧中的某一个宏块与I帧中的某一个宏块最匹配时,则I帧中的该宏块可以作为P帧中的该宏块的源宏块。如此遍历所有P帧中的宏块,在I帧中均可以找到对应的源宏块。这些源宏块进行宏块重拼/宏块组合之后可以得到近似的P帧,称为P’帧。通过比较P帧和P’帧,得到两者之间的差异,该差异可以由差分信息指示。例如,可以由光流网络估计得到差分信息。差分信息的本质来源于帧间的物体运动。在本申请实施例中,差分信息也可以称为动作差分信息或帧间运动信息或差分补偿信息等。因此,对于P帧,编码端可以以差分信息作为编码对象,而非以P帧作为编码对象,进而对差分信息以及其他语法元素进行编码得到经编码视频比特流。语法元素可以包括:宏块尺寸(block size)、宏块匹配(block matching)、宏块源位置(source position)、宏块目标位置(target position)和差分信息等。在解码端,根据语法元素等,就可以由I帧得到P帧。
应理解,以上仅以宏块为例对视频编解码过程进行说明,即以宏块为单位进行编解码,不构成对视频编解码处理所采用的单位构成限制,本申请实施例还可以以其他图像块为单位进行编解码。
下面结合图5至图15详细阐述本申请实施例的方案。
图5是示出了本申请实施例提供的一种视频增强方法600。方法600包括步骤S610至步骤S660。下面对步骤S610至步骤S660进行详细说明。所述方法600具体可以由如图1所示的执行设备110执行。示例性地,所述方法600中的视频数据可以是如图1所示的客户设备140接收到的视频数据,所述执行设备110中的预处理模块113可以用来执行所述方法600中的步骤S610,所述执行设备110中的计算模块111可以用于执行步骤S620和步骤S660。可选的,所述方法600可以由CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,本申请不做限制。
S610,对待处理视频进行解码得到多个帧(frame)的语法元素。该待处理视频可以为经编码视频(流)或者说是压缩的视频(流)(compressed video)。经编码视频也可以理解为经编码比特流。具体地,可以通过通信接口从编码端或者其他源接收待处理视频。
S620,基于多个帧中的I帧的语法元素确定I帧。其中,I帧也称为帧内编码帧(intra-picture)。根据语法元素可以区别当前帧为I帧或非I帧。例如,若语法元素中包括当前帧的差分信息,则当前帧为P帧/B帧,若语法元素中不包括当前帧的差分信息,则该当前帧为I帧。
S630,对I帧进行帧增强,以得到增强后的I帧。具体地,对I帧进行帧增强,可以 包括:将I帧输入第一增强模型中对I帧进行帧增强。帧增强也可以称为全局增强。
示例性地,可以通过对图像增强的方式对I帧进行帧增强。对图像进行增强指的是对图像所做的可以提高图像质量的动作。增强包括但不局限于降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节等。该第一增强模型可以为传统增强模型。也就是可以利用传统方法对I帧进行增强。
例如,利用PQ pipeline对I帧进行增强。PQ pipeline可以包括一个或多个模块。各个模块彼此独立,即各个模块可以分别对图像进行处理,例如:抖色(Dither)、高动态范围(high dynamic range,HDR)调节(也可以称为“高动态对比度调节”)、色度瞬态改进(color transient improvement,CTI)、自动对比度增强(dynamic contrast improvement,DCI)、降噪(noise reduction,NR)、超分、动作估计和补偿等。
PQ pipeline的各个模块之间彼此适配,互相调节,得到各个模块对应的参数。利用PQ pipeline对图像进行处理的最终效果是通过各个模块对图像的处理结果进行均衡得到的。
再如,该第一增强模型可以为AI增强模型。AI增强模型可以是一个黑盒模型,也可以是一个白盒模型。例如,该AI增强模型可以是基于CNN结构搭建的。再如,该AI增强模型也可以是基于RNN结构搭建的。
示例性地,当第一增强模型用于超分,基于CNN搭建的AI增强模型可以包括:超分卷积神经网络(super-resolution convolutional neural network,SRCNN)、快速超分卷积神经网络(fast super-resolution convolutional neural network,FSRCNN)或超分生成对抗网络(super-resolution generative adversarial network,SRGAN)等。基于RNN搭建的AI增强模型可以包括:帧递归视频超分(frame recurrent video super-resolution,FRVSR)等。
应理解,本申请实施例仅以CNN结构和RNN结构作为示例,不构成对AI增强模型的具体结构的限定。采用AI增强模型能够充分利用模型结构中的各种非线性操作,学习隐藏在输入、标签(input,label)样本对之间的映射关系,提高模型的范化能力,解决画质效果问题,得到更好的增强效果。
S640,基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧。其中,P帧也称为前向预测编码帧(predictive-frame),B帧也称为双向预测内插编码帧(bi-directional interpolated prediction frame)。
非I帧中包括多个图像块,该多个图像块之间可以是互相不重叠的。图像块的尺寸可以根据需要自行设定。例如,在视频编解码过程中,可以将帧划分为多个宏块,进而以宏块为单位进行编解码。可以根据该多个宏块的划分方式确定多个图像块,也就是说将该多个宏块作为该多个图像块。应理解,本申请实施例中仅以宏块作为图像块为例进行说明,不对图像块的划分方式构成限制。
第一部分图像块可以包括一个图像块,也可以包括多个图像块。
可选地,非I帧的语法元素包括第一部分图像块中第一图像块的第一参考图像块以及第一图像块对应的第一差分信息,第一差分信息用于指示第一图像块与第一参考图像块之间的差异。基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,包括:根据第一差分信息对第一参考图像块进行差分补偿,得到第一图像块。
例如,差分补偿可以包括:将第一参考图像块与第一差分信息相加,得到第一图像块。
第一参考图像块可以指的是第一图像块对应的参考图像块。该第一参考图像块可以位于非I帧的参考帧中。该参考帧可以为I帧,也可以为其他P帧。如前所述,图像块可以为视频编解码过程中的宏块,在该情况下,参考图像块也可以称为源宏块。第一参考图像块也可以理解为第一图像块对应的源宏块。
例如,第一差分信息可以是在视频编码端中对第一图像块与第一参考图像块做减法操作得到的。或者说,第一差分信息可以指的是前述重构残差块。这样,可以复用解码过程中的差分信息,而不必重复提取差分信息,提高了处理效率。
示例性地,从解码后得到的语法元素中获取第一宏块(第一部分图像块的一个图像块)的目标位置(x1,y1)、第一宏块对应的源宏块的位置(x1’,y1’)和第一宏块对应的差分信息。第一宏块对应的源宏块的位置(x1’,y1’)指的是前述语法元素中的宏块的源位置。将位置(x1’,y1’)上的源宏块(即为第一宏块对应的源宏块)覆盖在第一宏块的目标位置(x1,y1)上。然后将第一宏块对应的源宏块与第一宏块对应的差分信息相加,得到第一宏块。
为了便于描述,下面以第一部分图像块中的第一图像块为例进行说明。第一图像块可以是预先设定的。例如,预先设定非I帧中的位置A,位置处的图像块即为第一图像块。
可选地,第一差分信息的信息量大于或等于第一阈值。在本申请实施例中,差分信息的信息量越大,图像块与其对应的参考图像块之间的差异越大。例如,第一差分信息越大,第一图像块与第一参考图像块之间的差异越大。
示例性地,差分信息的信息量可以通过计算该差分信息内的像素值的方差得到。该差分信息内的像素值的方差越大,则差分信息的信息量越大。
S650,对第一部分图像块进行块增强,以得到增强后的第一部分图像块。具体地,对第一部分图像块进行块增强,可以包括:将第一部分图像块输入第二增强模型中对第一部分图像块进行块增强。第二增强模型可以为传统增强模型。或者,该第二增强模型可以为AI增强模型。这样,将进行增强处理的数据由帧级变为图像块级,例如块(block)级,能够有效减少运算量。可选地,第一增强模型与第二增强模型可以相同,也可以不同。采用相同的增强模型,能够减少增强模型的训练成本以及存储空间,提高处理效率。
可选地,方法600还包括:基于多个帧中的非I帧的语法元素确定多个图像块中除了第一部分图像块外的第二部分图像块。第二部分图像块可以包括一个图像块,也可以包括多个图像块。
可选地,非I帧的语法元素包括第二部分图像块中第二图像块的第二参考图像块以及第二图像块对应的第二差分信息,第二差分信息用于指示第二图像块与第二参考图像块之间的差异。基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第二部分图像块,包括:根据第二差分信息对第二参考图像块进行差分补偿,得到第二图像块。
例如,差分补偿可以包括:将第二参考图像块与第二差分信息相加,得到第二图像块。第二参考图像块可以指的是第二图像块对应的参考图像块。该第二参考图像块可以位于非I帧的参考帧中。该参考帧可以为I帧,也可以为其他P帧。
如前所述,图像块可以为视频编解码过程中的宏块,在该情况下,参考图像块也可以称为源宏块。第二参考图像块也可以理解为第二图像块对应的源宏块。
例如,第二差分信息可以是在视频编码端中对第人图像块与第二参考图像块做减法操 作得到的。或者说,第二差分信息可以指的是前述重构残差块。这样,可以复用解码过程中的差分信息,而不必重复提取差分信息,提高了处理效率。
为了便于描述,下面以第二部分图像块中的第二图像块为例进行说明。第二图像块可以是预先设定的。例如,预先设定非I帧中的位置B,位置B对应的图像块即为第二图像块。
可选地,第二差分信息的信息量小于第一阈值,第一差分信息的信息量大于或等于第一阈值。或者,第二差分信息的信息量小于或等于第一阈值,第一差分信息的信息量大于第一阈值。
示例性地,在非I帧中,若一个图像块对应的差分信息的信息量大于或等于第一阈值,则该图像块为第一图像块。若一个图像块对应的差分信息的信息量小于第一阈值,则该图像块为第二图像块。或者,在非I帧中,若一个图像块对应的差分信息的信息量大于第一阈值,则该图像块为第一图像块。若一个图像块对应的差分信息的信息量小于或等于第一阈值,则该图像块为第二图像块。
可选地,第二参考图像块是增强后的图像块。
第二参考图像块可以指的是第二图像块对应的参考图像块。非I帧的参考帧的增强后的结果可以称为增强后的参考帧。该参考帧可以为I帧,也可以为其他P帧。该第二参考图像块可以位于增强后的参考帧中。
例如,通过解码可以得到非I帧中的第二图像块的参考图像块的位置,该第二图像块的参考图像块位于非I帧的参考帧中。非I帧的参考帧的增强后的结果可以称为增强后的参考帧。基于该第二图像块的参考图像块的位置,可以确定在增强后的参考帧中该位置上的图像块,将该图像块作为第二参考图像块。也就是说第二参考图像块可以为增强后的参考帧中的图像块。
可选地,基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第二部分图像块,包括:将第二参考图像块确定为第二图像块。
示例性地,图像块可以为视频编解码过程中的宏块。从解码后得到的语法元素中获取第二宏块(以P帧中的第二图像块的为例)的位置(x2,y2)和第二宏块对应的源宏块的位置(x2’,y2’)。将位置(x2’,y2’)上的增强后的源宏块(即为第二宏块对应的源宏块增强后的结果)作为第二参考图像块,覆盖在P hen(增强后的P帧)中的第二宏块的位置(x2,y2)上,即将第二参考图像块确定为P hen中的第二图像块。在该情况下,可以无需得到非I帧中的第二图像块,而是在增强后的非I帧中直接复用增强后的图像块作为增强后的非I帧中的第二图像块。
S660,根据增强后的第一部分图像块得到增强后的非I帧。
可选地,根据增强后的第一部分图像块和第二部分图像块得到增强后的非I帧。示例性地,该第二部分图像块不是通过对第二部分图像块进行块增强得到的,而是复用第二参考图像块得到的。
可替换地,步骤S660还包括,根据第二差分信息对第二参考图像块进行差分补偿,将补偿后的结果块确定为增强后的非I帧中的第二图像块。
例如,可以将第二参考图像块与第二差分信息相加,将得到的结果作为增强后的非I帧中的第二图像块。
可选地,步骤S660还可以包括:对增强后的非I帧进行块效应去除。由于该增强后的非I帧由多个图像块组合得到,该增强后的非I帧在各个图像块的边缘可能出现块状效应,例如图像块边界的突兀感。
通过自适应滤波器可以进行块效应去除。具体地,可以通过自适应滤波器判断是否需要在图像块边缘采用滤波操作,以及滤波操作的强度。自适应滤波器可以参考标准解码器中的自适应滤波器。通过对增强后的非I帧进行块效应去除,能够消除图像块边界的突兀感。
根据本申请实施例的方案,分别对I帧和非I帧进行不同的增强处理,对非I帧中的第一部分图像块进行块增强,将进行增强处理的数据由帧级变为图像块级,能够大大减少计算量,提高增强处理的效率。例如,利用图像块与参考图像块中的差分信息,仅对差分信息的信息量较大的图像块进行增强,能够保证视频增强的效果,同时减少计算量。
此外,利用图像块与参考图像块中的差分信息,对于差分信息的信息量较小的图像块,复用增强后的图像块作为增强结果,进一步提高了增强处理的效率,同时保证了视频增强的效果。
此外,通过将解码过程与增强过程进行融合,充分利用编解码过程中的码流信息,使方案实现更便捷,有利于实时在线处理,增强速度快、功耗低,增强效果好,能够大大改善用户端侧观看视频体验,尤其是对于画质较差、分辨率较低的老旧视频。
此外,传统的视频增强方案中对视频进行逐帧增强,虽然处理后的每一帧均能得到较好的视频增强结果,但由于忽略了帧与帧之间的时间一致性可能造成抖动,影响输出的视频的质量。本申请实施例的方案,复用参考图像块中的信息,能够保证帧间的相关性,采用任意增强模型,例如,CNN模型,均不会影响帧间的相关性,解决时间一致性的问题。
图6示出了本申请实施例提供的一种视频增强方法700。方法700包括步骤S710至步骤S7100。方法700可以作为方法600的一例,相关描述参见方法600。图7中示出了视频增强的流程示意图,下面结合图6和图7对方法700进行说明。
S710,视频解码。具体地,对压缩的视频进行视频解码,得到码流。获取压缩的视频的方式如前文中的步骤S610所示,此处不再赘述。从码流中可以获取目标帧的相关信息,例如,语法元素。如图7所示,非I帧的语法元素可以包括:宏块尺寸、宏块匹配、宏块源位置(x’,y’)、宏块目标位置(x,y)以及差分信息等。对于P帧,该差分信息为前向差分信息,对于B帧,该差分信息为前向差分信息和后向差分信息。
本申请实施例中仅以图像块为宏块为例对方法700进行说明,应理解,方法700中的图像块也可以为视频编解码的过程中的其他尺寸的图像块。也就是说本申请实施例中将帧划分为图像块的划分方式可以与编解码过程中将帧划分为图像块的方式相同。
本申请实施例中,目标帧也可以称为“当前帧”。目标宏块也可以称为“当前宏块”。源宏块也可以称为“参考宏块”。宏块源位置指的是源宏块的位置,宏块目标位置指的是目标宏块的位置。
S720,对目标帧进行判断。若目标帧为I帧,则执行步骤S730。若目标帧为P帧或B帧,则执行步骤S740。
具体地,可以根据码流获取目标帧的相关信息,进而通过目标帧的差分信息对目标帧进行判断。该差分信息指的是在视频编码过程中目标帧的差分信息,即目标帧与参考帧之 间的差异。例如,在编码过程中,可以通过光流网络估计得到该差分信息。示例性地,若目标帧的相关信息中包括目标帧的差分信息,则该目标帧为P帧或B帧,若目标帧的相关信息中没有目标帧的差分信息,则该目标帧为I帧。
S730,确定I帧。具体地,根据I帧的语法元素确定I帧。
S740,对I帧进行全局增强,得到增强后的I帧I enh。例如,将I帧输入第一增强模型中,得到增强后的I帧I enh。在本申请实施例中,全局增强指的是帧级的图像增强。也就是将完整的帧信息输入增强模型中,得到增强后的I帧I enh。第一增强模型如上文中的步骤S630中所述,此处不再赘述。例如,如图8所示,第一增强模型为AI增强模型。将完整的帧信息输入AI增强模型中,得到增强后的结果。
S750,判断目标帧中的目标宏块对应的差分信息。具体地,若目标帧中的目标宏块对应的差分信息的信息量大于或等于第一阈值,例如,图6中的目标宏块对应的差分信息的信息量多,则执行步骤S770;若目标帧中的目标宏块对应的差分信息的信息量小于第一阈值,如图6中的目标宏块对应的差分信息的信息量少,则执行步骤S760。若目标帧中的目标宏块对应的差分信息的信息量大于或等于第一阈值,该目标宏块对应方法600中的第一图像块。若目标帧中的目标宏块对应的差分信息的信息量小于第一阈值,则该目标宏块对应方法600中的第二图像块。
可替换地,若目标帧中的目标宏块对应的差分信息的信息量大于第一阈值,例如,图6中的目标宏块对应的差分信息的信息量多,则执行步骤S770;若目标帧中的目标宏块对应的差分信息的信息量小于或等于第一阈值,例如,图6中的目标宏块对应的差分信息的信息量少,则执行步骤S760。若目标帧中的目标宏块对应的差分信息的信息量大于第一阈值,该目标宏块对应方法600中的第一图像块。若目标帧中的目标宏块对应的差分信息的信息量小于或等于第一阈值,则该目标宏块对应方法600中的第二图像块。
目标宏块对应的差分信息用于指示目标宏块与源宏块之间的匹配度或者说是相似度,差分信息的信息量越大,则目标宏块与源宏块之间的匹配度越低,即两者之间的差异越大。例如,可以将目标宏块对应的差分信息的像素值方差作为该目标宏块对应的差分信息的信息量。目标宏块对应的源宏块可以根据码流信息中得到的宏块匹配确定。也就是进行宏块匹配,确定目标宏块对应的源宏块。
若目标帧为P帧,判断P帧中的目标宏块对应的差分信息,也就是判断P帧中的目标宏块与参考宏块之间的相似度。例如,目标帧可以为图7中的P帧,该目标帧的参考帧可以为图7中的I帧,目标帧中的目标宏块对应的源宏块位于参考帧中。目标帧对应的差分信息如图9所示。目标帧中相对于参考帧保持静止的物体,例如图7中的天空,该物体或该物体的一部分在目标帧中对应目标宏块1,该物体或该物体的一部分在参考帧中对应源宏块1’,该目标宏块1和源宏块1’之间的相似度较高,如图9所示,该目标宏块1对应的差分信息的信息量较小,或者说目标宏块1对应的差分信息的信息量单一。目标帧中相对于参考帧发生运动的物体,例如图7中的车,该物体或该物体的一部分在目标帧中对应目标宏块2,该物体或该物体的一部分在参考帧中对应源宏块2’,若该目标宏块2和源宏块2’之间的相似度较低,如图9所示,该目标宏块2对应的差分信息的信息量较大,或者说该目标宏块2对应的差分信息的信息量丰富。通常,该差分信息是由于物体运动导致的,运动越剧烈的物体,目标宏块对应的差分信息的信息量越大。
若目标帧为B帧,判断B帧中的目标宏块对应的差分信息,也就是判断B帧中的目标宏块与参考宏块之间的相似度。具体地,根据B帧中的目标宏块与两个参考宏块之间的相似度,确定B帧中的目标宏块与参考宏块之间的相似度。例如,将B帧中的目标宏块对应的两个差分信息的信息量的平均值作为B帧中的目标宏块对应的差分信息的信息量。
S760,宏块搬移。具体地,将增强后的源宏块搬移/覆盖在目标帧中的目标宏块的位置上。也就是将增强后的源宏块作为增强后的目标宏块。之后可以执行步骤S7100。
示例性地,如图7所示,目标帧为P帧,目标帧的参考帧为I帧,目标宏块1对应的源宏块1’位于参考帧中。根据解码得到的源宏块1’位置(x1’,y1’)和目标宏块1位置(x1,y1),将增强后的I帧I enh中的位置(x1’,y1’)上的宏块覆盖在目标帧中的位置(x1,y1)上。I enh中的(x1’,y1’)上的宏块即为I帧中的(x1,y1)上的源宏块1’增强后的结果,也就是增强后的源宏块。例如,图7中的目标宏块1对应的差分信息的信息量较小,则将I enh中的位置(x1’,y1’)上的宏块覆盖在P enh帧中的位置(x1,y1)上。P enh帧指的是为增强后的P帧。
应理解,此处仅以源宏块为I帧中的宏块为例进行说明,不对本申请实施例中的源宏块构成限制。目标宏块的源宏块可以由目标帧解码后得到的信息确定。例如,参考帧还可以为其他已解码的P帧,源宏块可以为该已解码的P帧中的宏块。
示例性地,目标帧为B帧,目标帧的参考帧为帧1’和帧1”。目标宏块位于目标帧中,目标宏块对应的源宏块1’位于帧1’中,目标宏块对应的源宏块1’位于帧1”中。根据解码得到的源宏块1’的位置(x1’,y1’)和目标宏块的位置(x1,y1),将增强后的帧1’中的位置(x1’,y1’)上的宏块覆盖在目标帧中的位置(x1,y1)上。其中,源宏块1’与目标宏块之间的差分信息的信息量小于或等于源宏块1”与目标宏块之间的差分信息的信息量,即源宏块1’与目标宏块之间的相似度更高。增强后的帧1’中的(x1’,y1’)上的宏块即为帧1’中的(x1’,y1’)上的源宏块增强后的结果,也就是增强后的源宏块。或者,根据源宏块1’的位置(x1’,y1’)和目标宏块的位置(x1,y1),将增强后的帧1’中的位置(x1’,y1’)上的宏块覆盖在目标帧中的位置(x1,y1)上;并且,根据源宏块1”的位置(x1”,y1”)和目标宏块的位置(x1,y1),将增强后的帧1”中的位置(x1”,y1”)上的宏块覆盖在目标帧中的位置(x1,y1)上。例如,可以为对增强后的帧1’中的位置(x1’,y1’)上的宏块和增强后的帧1”中的位置(x1”,y1”)上的宏块中的像素值取加权平均值,将该加权平均值覆盖至目标宏块的位置(x1,y1)。增强后的帧1’中的(x1’,y1’)上的宏块即为帧1’中的(x1’,y1’)上的源宏块增强后的结果,增强后的帧1”中的(x1”,y1”)上的宏块即为帧1”中的(x1”,y1”)上的源宏块增强后的结果。参考帧可以为已解码的I帧,也可以为其他已解码的P帧。
S770,宏块搬移。具体地,将源宏块搬移/覆盖在目标帧中的目标宏块的位置上。
示例性地,如图7所示,目标帧为P帧,目标帧的参考帧为I帧,目标宏块2对应的源宏块2’位于参考帧中。根据源宏块2’位置(x2’,y2’)和目标宏块2位置(x2,y2),将参考帧中的位置(x2’,y2’)上的宏块覆盖在目标帧中的位置(x2,y2)上。应理解,此处仅以源宏块为I帧中的宏块为例进行说明,不对本申请实施例中的宏块构成限制。目标宏块的源宏块可以由目标帧解码后得到的信息确定。例如,参考帧还可以为其他已解码的P帧,源宏块可以为该已解码的P帧中的宏块。
示例性地,目标帧为帧1,帧1为B帧,帧1的参考帧为帧1’和帧2’。根据解码得到的宏块之间的匹配关系确定目标宏块对应的源宏块1’和源宏块2’。确定源宏块1’和源宏块2’与目标宏块之间的差分信息的信息量。根据源宏块1’的位置(x2,y2)和目标宏块的位置(x2’,y2’),将帧1’中的位置(x2,y2)上的宏块覆盖在目标帧中的位置(x2’,y2’)上。其中,源宏块1’与目标宏块之间的差分信息的信息量小于或等于源宏块2’与目标宏块之间的差分信息的信息量,即源宏块1’与目标宏块之间的相似度更高。或者,根据源宏块1’的位置(x2,y2)和目标宏块的位置(x2’,y2’),将帧1’中的位置(x2,y2)上的宏块覆盖在目标帧中的位置(x2’,y2’)上;并且,根据源宏块2’的位置(x4,y4)和目标宏块的位置(x2’,y2’),将帧2’中的位置(x3,y3)上的宏块覆盖在目标帧中的位置(x2’,y2’)上。例如,可以为对帧1’中的位置(x2,y2)上的宏块和帧2’中的位置(x4,y4)上的宏块中的像素值取加权平均值,将该加权平均值覆盖至目标宏块的位置。
S780,差分补偿。其中,差分补偿是对应编码端差分计算的反向操作。例如,差分补偿可以为,将源宏块与目标宏块对应的差分信息相加。
示例性地,目标帧为P帧,目标宏块的差分信息为前向差分信息,即该目标帧的参考帧位于目标帧之前。例如,将步骤S770得到的目标宏块的位置上的像素值与目标宏块的差分信息相加。如前所述,在视频解码的过程中,由参考帧中的宏块根据源宏块的位置和目标宏块的位置重组后,能够得到近似的目标帧,该近似的目标帧和目标帧之间的差值即为差分信息。图10中差分信息的计算方式的示意图。如图10所示,目标帧为P帧,目标帧的参考帧为I帧,目标宏块位于目标帧中,源宏块位于参考帧中。如前所述,通过宏块匹配可以在I帧中得到P帧中的宏块对应的源宏块,通过宏块拼接可以将I帧中的宏块重组得到P’帧。P帧与P’帧的差值即为P帧对应的差分信息。因此,在进行差分补偿时,目标宏块2位于目标帧中,源宏块2’位于参考帧中,将源宏块2’与该目标宏块2对应的差分信息相加,即可得到目标宏块2。应理解,为了便于理解,图10中示出了目标帧对应的差分信息,这不意味需要对目标帧中的每一个图像块均进行差分补偿。
示例性地,目标帧为B帧,目标宏块的差分信息包括前向差分信息和后向差分信息,即该目标帧的参考帧分别位于目标帧之前和目标帧之后。前向差分信息为目标宏块与源宏块1’之间的差分信息,后向差分信息为目标宏块与源宏块1”之间的差分信息。例如,判断前向差分信息和后向差分信息的信息量。若前向差分信息的信息量小于或等于后向差分信息的信息量,则将目标宏块的位置上的像素值与前向差分信息相加;若后向差分信息的信息量小于或等于前向差分信息的信息量,则将目标宏块的位置上的像素值与后向差分信息相加。或者,可以对前向差分信息和后向差分信息取加权平均值,并将该加权平均值与目标宏块的位置上的像素值相加。以上仅为示例,不对本申请实施例中的差分补偿方式构成限制。
可替换地,步骤S770和步骤S780的顺序可以相反。即可以先执行步骤S780,再执行步骤S770。具体地,可以先进行差分补偿,并将补偿后得到的宏块覆盖在目标帧中的目标宏块的位置上。
S790,宏块增强。具体地,将步骤S770得到的目标宏块进行块增强。也就是对差分补偿后得到的目标宏块进行块增强。如图7所示,可以将该目标宏块输入第二视频增强模型中,得到增强后的目标宏块。步骤S790可以参考前述步骤S650进行增强,此处不再赘 述。
S7100,块效应去除。示例性地,重复步骤S750至步骤S790,直至得到该目标帧中所有位置上的宏块,再执行步骤S7100。图11示出了块效应去除前后图像效果对比,图11的(a)为块效应去除前的目标帧,图11的(b)为块效应去除后的目标帧。块效应消除后,块边界的突兀感明显消除。
具体地,可以对目标帧中的每个宏块重复执行步骤S750至步骤S790,得到由各个宏块组合得到的增强后的目标帧,然后对增强后的目标帧进行块效应去除。或者,可以对目标帧中的每个宏块先执行步骤S750至步骤S770,得到由目标帧中的每个宏块位置上的宏块,然后执行步骤S780至步骤S7100,即进行差分补偿,然后对差分补偿后得到的宏块进行增强,得到增强后的目标帧,然后对增强后的目标帧进行块效应去除。
可选地,在步骤S7100之前,方法700还包括,判断是否进行块效应去除。例如,可以通过解码器中的自适应滤波器判断是否需要块效应去除,即是否需要滤波。进一步地,还可以通过自适应编码器确定滤波的强弱程度。
根据本申请实施例的方案,能够在帧间差分信息的信息量较大的情况下,例如,帧间发生剧烈运动的区域,进行差分补偿,并对该区域进行增强;在帧间差分信息的信息量较小的情况下,例如,帧间几乎没有发生相对运动的区域,直接复用参考帧中该区域的增强结果。这样能够充分利用帧间冗余信息,无需逐帧进行全局增强,大大提升视频增强的效率。
此外,通过将解码过程与增强过程进行融合,充分利用编解码过程中的码流信息,使方案实现更便捷,有利于实时在线处理,增强速度快、功耗低,增强效果好,能够大大改善用户端侧观看视频体验,尤其是对于画质较差、分辨率较低的老旧视频。
此外,能够保证帧间的相关性,采用任意增强模型,例如,CNN模型,均不会影响帧间的相关性,解决时间一致性的问题。
图12示出了本申请实施例的一种视频增强的方法。在图12中,以视频超分为例对本申请实施例的视频增强的方法进行说明。
(A)对压缩的视频进行视频解码,解码后可以得到低分辨率的视频(low res video)和语法元素(syntax elements)。
目标帧的语法元素可以包括:目标帧中的宏块尺寸、宏块匹配、源宏块的位置(x’,y’)、目标宏块的位置(x,y)以及差分信息等。本申请实施例中仅以图像块为宏块为例进行说明,应理解,语法元素中的宏块也可以替换为视频编解码的过程中的其他尺寸的图像块。也就是说本申请实施例中将帧划分为图像块的划分方式可以与编解码过程中将帧划分为图像块的方式相同。
(B)若目标帧为低分辨率的视频中的I帧,执行超分操作,得到高分辨率(high resolution)的I帧。
具体地,可以将低分率的视频中的I帧输入第一增强模型中,例如,该第一增强模型可以用于视频超分。具体步骤详见前述步骤S630。
(C)若目标帧为低分辨率的视频中的P帧或B帧,则跳过超分操作。
进一步地,若目标宏块对应的差分信息的信息量小于第一阈值,则根据步骤(A)得到的语法元素进行宏块搬移。具体地,将增强后的源宏块搬移/覆盖在目标帧中的目标宏 块的位置上。
若目标宏块对应的差分信息的信息量大于或等于第一阈值,则根据步骤(A)得到的语法元素进行宏块搬移以及差分补偿,并对得到的目标宏块执行超分操作。具体步骤详见前述步骤S650。
如图13所示,在图12所示的视频增强的方法中,仅对I帧执行超分操作,对于B帧或P帧,则跳过超分操作,基于参考帧和语法元素得到增强后的P帧或B帧。
图14和图15示出了采用本申请实施例提供的视频增强的方法对视频进行超分的实验结果。图14和图15分别为一段连续解码得到的视频流中的第2帧(第1个P帧)和第15帧(第15个P帧)的超分结果,图14的(a)和图15的(a)为直接执行超分操作的结果,图14的(b)和图15的(b)为采用本申请实施例的视频增强的方法得到的结果。从图14和图15可以看出,本申请实施例提供的视频增强的方法几乎不损失画质效果。
上文结合图1至图15详细的描述了本申请实施例的方法实施例,下面结合图16至图17,详细描述本申请实施例的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。
图16是本申请实施例提供的视频增强的装置800的示意性框图。图16所示的视频增强的装置800包括解码模块810和增强模块820。例如,该装置800可以是图1中的执行设备110的一个实施例。解码模块810和增强模块820中至少一个可以硬件实现,也可以软件形式实现或者软件与硬件结合实现。当一个模块以硬件实现,可以包括在执行设备110内部,具体包括在图2的主CPU和神经网络处理器50中的至少一个内。当该模块以软件实现,可以是执行设备110或图2中主CPU和神经网络处理器50执行的软件模块。
解码模块810和增强模块820可以用于执行本申请实施例的视频增强的方法,具体地,可以执行图5所示的方法600或图6所示的方法700。
具体地,解码模块810,用于对待处理视频进行解码得到多个帧的语法元素。增强模块820,用于:基于多个帧中的I帧的语法元素确定I帧;对I帧进行帧增强,以得到增强后的I帧;基于多个帧中的非I帧的语法元素确定非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;对第一部分图像块进行块增强,以得到增强后的第一部分图像块;根据增强后的第一部分图像块得到增强后的非I帧。
可选地,非I帧的语法元素包括第一部分图像块中第一图像块的第一参考图像块以及第一图像块对应的第一差分信息,第一差分信息用于指示第一图像块与第一参考图像块之间的差异,第一差分信息的信息量大于或等于第一阈值;增强模块820具体用于:利用第一差分信息对第一参考图像块进行差分补偿以得到第一图像块。
可选地,增强模块820还用于:基于的非I帧的语法元素确定多个图像块中除了第一部分图像块外的第二部分图像块;以及,增强模块820具体用于:根据增强后的第一部分图像块和第二部分图像块得到增强后的非I帧,其中,第二部分图像块没有经过块增强。
可选地,非I帧的语法元素包括第二部分图像块中的第二图像块的第二参考图像块,第二参考图像块是增强后的图像块。
可选地,非I帧的语法元素还包括第二图像块对应的第二差分信息,第二差分信息用于指示第二图像块与第二参考图像块之间的差异,第二差分信息的信息量小于第一阈值,以及增强模块820具体用于:将第二参考图像块确定为第二图像块。
可选地,用于对I帧进行帧增强的第一增强模型与用于对第一部分图像块进行块增强的第二增强模型相同。
可选地,所述第一增强模型和所述第二增强模型中的至少一个增强模型是神经网络增强模型。
可选地,帧增强或块增强包括如下至少一项:降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节。
可选地,增强模块还用于:对增强后的非I帧进行块效应去除。
图17是本申请实施例提供的视频增强的装置的硬件结构示意图。图17所示的视频增强的装置3000(该装置3000具体可以是一种计算机设备)包括存储器3001、处理器3002、通信接口3003以及总线3004。其中,存储器3001、处理器3002、通信接口3003通过总线3004实现彼此之间的通信连接。例如,该装置3000可以是图1中的执行设备110的一个实施例。或者存储器3001可以位于图1中的数据存储系统150内,而处理器3002、通信接口3003通过总线3004位于执行设备110内。
存储器3001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器3001可以存储程序,当存储器3001中存储的程序被处理器3002执行时,处理器3002用于执行本申请实施例的视频增强的方法的各个步骤。具体地,处理器3002可以执行上文中图5所示的方法600或图6所示的方法700。
处理器3002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的视频增强的方法。
处理器3002还可以是一种集成电路芯片,具有信号的处理能力,例如,可以是图2所示的芯片。在实现过程中,本申请的视频增强的方法的各个步骤可以通过处理器3002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器3002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器3001,处理器3002读取存储器3001中的信息,结合其硬件完成本申请实施例的视频增强的装置中包括的单元所需执行的功能,或者执行本申请实施例的视频增强的方法。
通信接口3003使用例如但不限于收发器一类的收发装置,来实现装置3000与其他设备或通信网络之间的通信。例如,可以通过通信接口3003获取待处理的视频。
总线3004可包括在装置3000各个部件(例如,存储器3001、处理器3002、通信接口3003)之间传送信息的通路。
应理解,本申请实施例中的处理器可以为中央处理模块(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM)可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理模块执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒 体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (19)

  1. 一种视频增强的方法,其特征在于,包括:
    对待处理视频进行解码得到多个帧的语法元素;
    基于所述多个帧中的I帧的语法元素确定所述I帧;
    对所述I帧进行帧增强,以得到增强后的I帧;
    基于所述多个帧中的非I帧的语法元素确定所述非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;
    对所述第一部分图像块进行块增强,以得到增强后的第一部分图像块;
    根据所述增强后的第一部分图像块得到增强后的非I帧。
  2. 如权利要求1所述的方法,其特征在于,所述非I帧的语法元素包括所述第一部分图像块中第一图像块的第一参考图像块以及所述第一图像块对应的第一差分信息,所述第一差分信息用于指示所述第一图像块与所述第一参考图像块之间的差异,所述第一差分信息的信息量大于或等于第一阈值;
    所述基于所述多个帧中的非I帧的语法元素确定所述非I帧中的多个图像块中的第一部分图像块包括:
    利用所述第一差分信息对所述第一参考图像块进行差分补偿以得到第一图像块。
  3. 如权利要求1或2所述的方法,其特征在于,所述方法还包括:基于所述非I帧的语法元素确定所述多个图像块中除了所述第一部分图像块外的第二部分图像块,以及
    所述根据所述增强后的第一部分图像块得到增强后的非I帧,包括:
    根据所述增强后的第一部分图像块和所述第二部分图像块得到增强后的非I帧,其中,所述第二部分图像块没有经过块增强。
  4. 如权利要求3所述的方法,其特征在于,所述非I帧的语法元素包括所述第二部分图像块中的第二图像块的第二参考图像块,所述第二参考图像块是增强后的图像块。
  5. 如权利要求4所述的方法,其特征在于,所述非I帧的语法元素还包括所述第二图像块对应的第二差分信息,所述第二差分信息用于指示所述第二图像块与所述第二参考图像块之间的差异,所述第二差分信息的信息量小于所述第一阈值;
    所述基于所述非I帧的语法元素确定所述多个图像块中除了所述第一部分图像块外的第二部分图像块包括:
    将所述第二参考图像块确定为所述第二图像块。
  6. 如权利要求1至5中任一项所述的方法,其特征在于,用于对所述I帧进行帧增强的第一增强模型与用于对所述第一部分图像块进行块增强的第二增强模型相同。
  7. 如权利要求6所述的方法,其特征在于,所述第一增强模型和所述第二增强模型中的至少一个增强模型是神经网络增强模型。
  8. 如权利要求1至7中任一项所述的方法,其特征在于,所述帧增强或块增强包括如下至少一项:降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节。
  9. 如权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:
    对所述增强后的非I帧进行块效应去除。
  10. 一种视频增强的装置,其特征在于,包括:
    解码模块,用于对待处理视频进行解码得到多个帧的语法元素;
    增强模块,用于:
    基于所述多个帧中的I帧的语法元素确定所述I帧;
    对所述I帧进行帧增强,以得到增强后的I帧;
    基于所述多个帧中的非I帧的语法元素确定所述非I帧中的多个图像块中的第一部分图像块,该非I帧包括P帧或B帧;
    对所述第一部分图像块进行块增强,以得到增强后的第一部分图像块;
    根据所述增强后的第一部分图像块得到增强后的非I帧。
  11. 如权利要求10所述的装置,其特征在于,所述非I帧的语法元素包括所述第一部分图像块中第一图像块的第一参考图像块以及所述第一图像块对应的第一差分信息,所述第一差分信息用于指示所述第一图像块与所述第一参考图像块之间的差异,所述第一差分信息的信息量大于或等于第一阈值;
    所述增强模块具体用于:
    利用所述第一差分信息对所述第一参考图像块进行差分补偿,以得到第一图像块。
  12. 如权利要求10或11所述的装置,其特征在于,所述增强模块还用于:
    基于所述多个帧中的非I帧的语法元素确定所述多个图像块中除了所述第一部分图像块外的第二部分图像块;
    以及,所述增强模块具体用于:
    根据所述增强后的第一部分图像块和所述第二部分图像块得到增强后的非I帧,其中,所述第二部分图像块没有经过块增强。
  13. 如权利要求12所述的装置,其特征在于,所述非I帧的语法元素包括所述第二部分图像块中的第二图像块的第二参考图像块,所述第二参考图像块是增强后的图像块。
  14. 如权利要求13所述的装置,其特征在于,所述非I帧的语法元素还包括所述第二图像块对应的第二差分信息,所述第二差分信息用于指示所述第二图像块与所述第二参考图像块之间的差异,所述第二差分信息的信息量小于所述第一阈值;
    以及,所述增强模块具体用于:
    将所述第二参考图像块确定为所述第二图像块。
  15. 如权利要求10至14中任一项所述的装置,用于对所述I帧进行帧增强的第一增强模型与用于对所述第一部分图像块进行块增强的第二增强模型相同。
  16. 如权利要求15所述的装置,其特征在于,所述第一增强模型和所述第二增强模型中的至少一个增强模型是神经网络增强模型。
  17. 如权利要求10至16中任一项所述的装置,其特征在于,所述帧增强或块增强包括如下至少一项:降噪处理、锐化处理、超分处理、去除马赛克、对比度调节或饱和度调节。
  18. 如权利要求10至17中任一项所述的装置,其特征在于,所述增强模块还用于:
    对所述增强后的非I帧进行块效应去除。
  19. 一种视频增强的装置,其特征在于,包括:存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求1-9任一项所述的方法。
PCT/CN2020/082815 2020-04-01 2020-04-01 视频增强的方法及装置 WO2021196087A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080001816.XA CN113767626B (zh) 2020-04-01 2020-04-01 视频增强的方法及装置
PCT/CN2020/082815 WO2021196087A1 (zh) 2020-04-01 2020-04-01 视频增强的方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/082815 WO2021196087A1 (zh) 2020-04-01 2020-04-01 视频增强的方法及装置

Publications (1)

Publication Number Publication Date
WO2021196087A1 true WO2021196087A1 (zh) 2021-10-07

Family

ID=77930288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082815 WO2021196087A1 (zh) 2020-04-01 2020-04-01 视频增强的方法及装置

Country Status (2)

Country Link
CN (1) CN113767626B (zh)
WO (1) WO2021196087A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082971A1 (zh) * 2022-10-19 2024-04-25 腾讯科技(深圳)有限公司 一种视频处理方法及相关装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115643407A (zh) * 2022-12-08 2023-01-24 荣耀终端有限公司 视频处理方法及其相关设备
CN118590073A (zh) * 2023-03-02 2024-09-03 华为技术有限公司 数据处理方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060268981A1 (en) * 2005-05-31 2006-11-30 James Owens Method of enhancing images extracted from video
CN101005622A (zh) * 2007-01-12 2007-07-25 清华大学 一种支持视频帧随机读取的视频编解码方法
CN101383964A (zh) * 2007-09-07 2009-03-11 华为技术有限公司 一种对压缩视频流进行编辑、解码的方法、装置和系统
CN104301727A (zh) * 2014-09-22 2015-01-21 中国人民解放军重庆通信学院 基于cabac的质量可控的h.264视频感知加密算法
CN108989802A (zh) * 2018-08-14 2018-12-11 华中科技大学 一种利用帧间关系的hevc视频流的质量估计方法及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7567617B2 (en) * 2003-09-07 2009-07-28 Microsoft Corporation Predicting motion vectors for fields of forward-predicted interlaced video frames
US8064520B2 (en) * 2003-09-07 2011-11-22 Microsoft Corporation Advanced bi-directional predictive coding of interlaced video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060268981A1 (en) * 2005-05-31 2006-11-30 James Owens Method of enhancing images extracted from video
CN101005622A (zh) * 2007-01-12 2007-07-25 清华大学 一种支持视频帧随机读取的视频编解码方法
CN101383964A (zh) * 2007-09-07 2009-03-11 华为技术有限公司 一种对压缩视频流进行编辑、解码的方法、装置和系统
CN104301727A (zh) * 2014-09-22 2015-01-21 中国人民解放军重庆通信学院 基于cabac的质量可控的h.264视频感知加密算法
CN108989802A (zh) * 2018-08-14 2018-12-11 华中科技大学 一种利用帧间关系的hevc视频流的质量估计方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
J. BOYCE, V. DRUGEON, G. J. SULLIVAN, Y.-K. WANG: "Supplemental enhancement information messages for coded video bitstreams (Draft 3)", 17. JVET MEETING; 20200107 - 20200117; BRUSSELS; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), 18 March 2020 (2020-03-18), XP030285393 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082971A1 (zh) * 2022-10-19 2024-04-25 腾讯科技(深圳)有限公司 一种视频处理方法及相关装置

Also Published As

Publication number Publication date
CN113767626A (zh) 2021-12-07
CN113767626B (zh) 2024-03-26

Similar Documents

Publication Publication Date Title
CN114650419B (zh) 进行帧内预测的编码器、解码器和对应方法
WO2021196087A1 (zh) 视频增强的方法及装置
WO2022068716A1 (zh) 熵编/解码方法及装置
WO2019184639A1 (zh) 一种双向帧间预测方法及装置
CN117440170A (zh) 编码器、解码器及对应方法
WO2021109978A1 (zh) 视频编码的方法、视频解码的方法及相应装置
TWI806212B (zh) 視訊編碼器、視訊解碼器及相應方法
WO2020114394A1 (zh) 视频编解码方法、视频编码器和视频解码器
CN113196783B (zh) 去块效应滤波自适应的编码器、解码器及对应方法
WO2022063265A1 (zh) 帧间预测方法及装置
WO2021249290A1 (zh) 环路滤波方法和装置
CN112534808A (zh) 视频处理方法、视频处理装置、编码器、解码器、介质和计算机程序
WO2023279961A1 (zh) 视频图像的编解码方法及装置
JP2022535859A (ja) Mpmリストを構成する方法、クロマブロックのイントラ予測モードを取得する方法、および装置
CN116235496A (zh) 编码方法、解码方法、编码器、解码器以及编码系统
US20160050431A1 (en) Method and system for organizing pixel information in memory
WO2023011420A1 (zh) 编解码方法和装置
WO2023020320A1 (zh) 熵编解码方法和装置
CN114731406A (zh) 编码方法、解码方法和编码装置、解码装置
WO2022171042A1 (zh) 一种编码方法、解码方法及设备
WO2022063267A1 (zh) 帧内预测方法及装置
CN114830665B (zh) 仿射运动模型限制
WO2020253681A1 (zh) 融合候选运动信息列表的构建方法、装置及编解码器
WO2023092256A1 (zh) 一种视频编码方法及其相关装置
WO2021027799A1 (zh) 视频编码器及qp设置方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929125

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20929125

Country of ref document: EP

Kind code of ref document: A1