WO2024078148A1 - 一种视频解码方法、视频处理设备、介质及产品 - Google Patents
一种视频解码方法、视频处理设备、介质及产品 Download PDFInfo
- Publication number
- WO2024078148A1 WO2024078148A1 PCT/CN2023/114456 CN2023114456W WO2024078148A1 WO 2024078148 A1 WO2024078148 A1 WO 2024078148A1 CN 2023114456 W CN2023114456 W CN 2023114456W WO 2024078148 A1 WO2024078148 A1 WO 2024078148A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- decoded
- frame
- video
- list
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 99
- 238000012545 processing Methods 0.000 title claims abstract description 21
- 238000004590 computer program Methods 0.000 claims abstract description 22
- 238000001914 filtration Methods 0.000 claims description 80
- 238000013528 artificial neural network Methods 0.000 claims description 54
- 230000008569 process Effects 0.000 claims description 39
- 230000015654 memory Effects 0.000 claims description 18
- 230000002123 temporal effect Effects 0.000 claims description 10
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 21
- 238000013139 quantization Methods 0.000 description 21
- 230000009466 transformation Effects 0.000 description 20
- 238000005516 engineering process Methods 0.000 description 10
- 230000007774 longterm Effects 0.000 description 8
- 238000012937 correction Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 3
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 3
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
Definitions
- the embodiments of the present application relate to the field of video processing technology, and in particular to a video decoding method, a video processing device, a computer-readable storage medium, and a computer program product.
- loop filtering based on neural network is one of the effective ways to improve the performance of video encoding and decoding.
- the selection and correction of neural network filtering mode and traditional filtering mode are all based on blocks.
- the overall performance of the image obtained by the above method may be good, but there are situations where local performance deteriorates and leads to local distortion, and the distortion of the reference frame is easy to cause error transmission; in addition, for the inter-frame prediction process, there are repeated reference frames in the reference image list of some video frames, which lacks diversity and leads to poor inter-frame prediction effect. Therefore, in the process of video processing, how to further improve the quality of video images is an issue that needs to be discussed and solved urgently.
- the embodiments of the present application provide a video decoding method, a video processing device, a computer-readable storage medium, and a computer program product, aiming to improve the quality of video images.
- an embodiment of the present application provides a video decoding method, the method comprising: obtaining reference frame information of a video frame to be decoded; obtaining a reference image list of the video frame to be decoded based on the reference frame information and supplementary frame information; decoding the video frame to be decoded based on the reference image list to obtain a first reconstructed image and a decoded image of the video frame to be decoded.
- an embodiment of the present application provides a video decoding method applied to a video frame including a time layer identifier, the method comprising: when the time layer identifier of the video frame to be decoded is less than a preset threshold, performing the video decoding method as described in any one of the first aspects on the video frame to be decoded.
- an embodiment of the present application provides a video processing device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the video decoding method as described in any one of the first aspect or the second aspect.
- an embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are used to execute the video decoding method as described in any one of the first aspect or the second aspect.
- an embodiment of the present application provides a computer program product, comprising a computer program or computer instructions, wherein the computer program or the computer instructions are stored in a computer-readable storage medium, a processor of a computer device reads the computer program or the computer instructions from the computer-readable storage medium, and the processor executes the computer program or the computer instructions, so that the computer device performs the video decoding method as described in any one of the first aspect or the second aspect.
- FIG1 is a schematic diagram of a video encoding process provided by a related art
- FIG2 is a schematic diagram of a video decoding process provided by a related art
- FIG3 is a schematic diagram of a loop filtering process provided by the related art
- FIG4 is a schematic diagram of a random access video coding configuration provided by a related art
- FIG5 is a schematic diagram of a low-delay B video encoding configuration provided by a related art
- FIG6 is a schematic diagram of a process of video encoding based on NN encoding provided by the related art
- FIG7 is a schematic diagram of a process of video decoding based on NN coding provided by the related art
- FIG8 is a schematic diagram of a system architecture of an application scenario of a video decoding method provided by an embodiment of the present application.
- FIG9 is a flow chart of a video decoding method provided by an embodiment of the present application.
- FIG10 is a flowchart of a video decoding method provided by an embodiment of the present application.
- FIG11 is a schematic diagram of inserting reference frames into different positions provided in this example.
- FIG12 is a flowchart of replacing a reference frame provided by this example.
- FIG13 is a flowchart of storing a luminance component into a decoded image buffer provided by this example
- FIG14 is a flowchart of video decoding provided by this example.
- FIG15 is a flowchart of video decoding provided by this example.
- FIG. 16 is a schematic diagram of the structure of a video processing device provided in an embodiment of the present application.
- FIG. 1 is a schematic diagram of the video coding process provided by the relevant technology.
- the H.266/VVC coding framework of the new generation of video coding standards developed by the ITU-T and ISO/IEC joint video project includes functional modules such as intra-frame prediction, inter-frame prediction, transformation, quantization, loop filtering, and entropy coding.
- the video coding process includes at least the following steps:
- Step S101 Divide the input video frame image into blocks to form a coding tree unit (CTU).
- CTU coding tree unit
- Step S102 Send the divided CTU to the intra-frame/inter-frame prediction module for prediction coding.
- the intra-frame prediction module is mainly used to remove the spatial correlation of the image, and predicts the current pixel block through the encoded reconstructed block information to remove spatial redundant information;
- the inter-frame prediction module is mainly used to remove the temporal correlation of the image, and obtains the motion information of each block by using the encoded image as the reference image of the current frame, thereby removing temporal redundancy.
- Step S103 Subtract the predicted value from the original block to obtain the residual value, and then transform and quantize the residual to remove the frequency domain correlation and perform lossy compression on the data.
- Transform coding transforms the image from the spatial domain signal to the frequency domain and concentrates the energy in the low-frequency area.
- the quantization module can reduce the dynamic range of image coding.
- Step S104 Finally, all the coding parameters and residuals are entropy-coded to form a binary stream for storage or transmission.
- the output data of the entropy coding module is the compressed code stream of the original video.
- Step S105 Add the predicted value and the residual after inverse quantization and inverse transformation to obtain a block reconstruction value, and finally form a reconstructed image.
- Step S106 The reconstructed image is filtered through a loop filter and stored in an image cache as a reference image for the future.
- the loop filtering technology in H.266/VVC includes Luma Mapping With Chroma Scaling (LMCS), Deblocking Filter (DBF), Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF).
- LMCS improves compression efficiency by reallocating codewords to information within the dynamic range; DBF is used to reduce block effects; SAO is used to improve ringing effects; ALF can reduce decoding errors.
- FIG2 is a schematic diagram of a video decoding process provided by the related art. As shown in FIG2, the video decoding process includes at least the following steps:
- Step S201 parse the code stream to obtain a prediction mode and obtain a prediction value.
- Step S202 performing inverse transformation and inverse quantization on the residual obtained by parsing the bitstream.
- Step S203 Add the predicted value and the residual after inverse quantization and inverse transformation to obtain a block reconstruction value, and finally form a reconstructed image.
- Step S204 The reconstructed image is filtered by a loop filter and stored in an image cache as a reference image in the future.
- FIG. 3 is a schematic diagram of the loop filtering process provided by the related technology.
- the reconstructed image first passes through the LMCS module, which can change the dynamic range of the input video signal amplitude distribution piecewise and linearly before encoding to improve the encoding efficiency, and reversely restore it at the decoding end; then passes through the DBF module, which adds a longer filter and a brightness adaptive filtering mode specially designed for high dynamic video; then passes through the SAO module, which can reduce the loss of high-frequency components without reducing the encoding efficiency, and compensates for the ringing area from the pixel domain; finally passes through the ALF module, which uses diamond filters based on brightness and chrominance respectively, and selects one filter from the multiple groups of filters passed for each block.
- the LMCS module which can change the dynamic range of the input video signal amplitude distribution piecewise and linearly before encoding to improve the encoding efficiency, and reversely restore it at the decoding end
- the DBF module which adds a longer filter and
- NN neural networks
- SPS sequence parameter set
- the input video frame is the basic unit of video.
- a video sequence can include multiple video frames.
- the input video frame can be a video frame acquired in real time, for example, a video frame acquired in real time by a terminal camera, or a video frame corresponding to a stored video.
- the input video frame can be an I frame, a P frame, or a B frame, where an I frame is an intra-frame prediction frame, a P frame is a forward prediction frame, and a B frame is a bidirectional prediction frame.
- a reference frame is a video frame to be referenced when encoding a video frame to be decoded.
- a reference frame is a video frame reconstructed from the encoded data corresponding to a video frame that can be used as a reference frame.
- the reference frame corresponding to the video frame to be decoded may be a forward reference frame or a bidirectional reference frame, depending on the type of inter-frame prediction, and the number of current reference frames corresponding to the video frame to be encoded may be one or more.
- the corresponding reference frame may be 1.
- the corresponding reference frame may be 2.
- the reference frame corresponding to the video frame to be encoded may be obtained based on a reference relationship, and the reference relationship may be different according to different video encoding and decoding standards.
- Reference frame management is a key technology in video coding. It is mainly responsible for managing the decoded picture buffer (DPB), selecting the best reference scheme from it, and creating a reference picture queue.
- DPB is a cache used to store decoded images in video coding. In order to remove temporal redundancy, the current coded frame can use the decoded image in DPB as a reference frame, and only transmit the inter-frame prediction residual through inter-frame prediction, thereby improving coding efficiency.
- the list storing the forward reference frames of the current frame is called the forward reference image list L0, also referred to as the first reference image list in the embodiment of the present application; the list storing the backward reference frames of the current frame is called the backward prediction reference image list L1, also referred to as the second reference image list in the embodiment of the present application.
- the candidate list only has the L0 list; if the video frame to be decoded is bidirectionally predicted, the candidate list has L0 and L1.
- reference_pictures_L0/L1 indicates the distance between the reference frame in L0 and L1 and the current frame respectively; ref_pics_L0/L1: indicates the maximum number of reference frames that can be used in the current frame L0/L1.
- ref_pics_active_L0, ref_pics_active_L1 indicates the number of reference frames allowed to be used in L0/L1.
- the encoder will transmit the reference frame POC number used by each frame to the bitstream.
- the code stream is parsed to obtain the reference frame of the current frame.
- Video encoding configurations include random access (RA) configuration and low delay (loW-delay B) configuration.
- RA random access
- LoW-delay B low delay
- FIG4 is a schematic diagram of random access video coding configuration provided by the related art. As shown in FIG4, each rectangle in the figure represents a frame, carrying reference frame information and encoding/decoding sequence number.
- the reference frame information can be a picture order value (Picture Order Count, POC), which indicates the playback order of the decoded video frames, and the encoding/decoding sequence number indicates the order of the video frames during the encoding/decoding process.
- POC Picture Order Count
- a P frame with a POC of 32 is a forward prediction frame
- the corresponding reference frame is an I frame with a POC of 0
- a B frame with a POC of 16 is a bidirectional prediction frame
- the corresponding reference frames are 2, namely an I frame with a POC of 0 and a P frame with a POC of 32, and so on.
- FIG5 is a schematic diagram of a low-delay B video encoding configuration provided by the related art.
- the first encoded image is an I image
- the remaining encoded images are B images or P images that only have past reference images in the display order.
- the display order of the images is the same as the decoding order.
- the arrows indicate the reference relationship between the images, and the arrows point to the reference images.
- Each frame of the image only refers to the reconstructed frame that is before the current encoded image in the playback order.
- the video sequence is encoded and decoded in the playback order, without waiting for the encoding.
- the encoding and decoding of images that are behind the current image but in the previous playback order has a relatively smaller delay, so it is called a low-latency structure and is suitable for scenarios with high latency requirements such as live broadcasts and video calls.
- FIG6 is a schematic diagram of a process of video encoding based on NN encoding provided by the related art. As shown in FIG6 , the video encoding process includes at least the following steps:
- Step S301 Construct a reference frame list.
- the encoder constructs reference lists L0 and L1 according to the reference frame POC differences indicated in cfg.
- the reference frames can be marked as "short-term reference frames", “non-reference frames” or "long-term reference frames”.
- Step S302 Taking cu as a unit, prediction, transformation quantization, inverse quantization and inverse transformation are performed to obtain a reconstructed block.
- Step S303 After the whole frame prediction is completed, the LMCS step is performed.
- Step S304 performing conventional filtering on the reconstructed image after LMCS.
- Step S305 Perform NN filtering on the reconstructed image after LMCS.
- Step S306 Adapt and modify the NN and traditional filtering according to the original image, and determine the modification-related syntax elements.
- Step S307 Perform ALF filtering operation on the NN filtered and corrected frame of the above process.
- Step S308 The current frame is marked as a "short-term reference frame" and stored in the DPB.
- FIG7 is a schematic diagram of a process of video decoding based on NN coding provided by the related art. As shown in FIG7 , the video decoding process includes at least the following steps:
- Step S401 Construct a reference frame list according to the bitstream information. Construct reference lists L0 and L1 according to the reference frame POC differences parsed from the bitstream.
- the reference frames can be marked as "short-term reference frames", “non-reference frames” or "long-term reference frames”.
- Step S402 Taking cu as a unit, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information of bit stream analysis to obtain a reconstructed block.
- Step S403 After the whole frame prediction is completed, the LMCS step is performed.
- Step S404 performing conventional filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of the code stream analysis.
- Step S405 performing NN filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of bitstream analysis.
- Step S406 Adapt NN and traditional filtering according to the result of bitstream analysis.
- Step S407 Perform ALF filtering operation on the NN filtered and corrected frame of the above process.
- Step S408 The current frame is marked as a "short-term reference frame" and stored in the DPB.
- the model parameters of the NN filtering module can be preset or transmitted through the bitstream.
- the existing NN filtering network is generally an offline network, and the model is trained offline through a large amount of data.
- the image after the NN filtering network may be better than the traditional filtering scheme, and there are also some pixels where the NN filtering network image is worse than the traditional filtering scheme.
- the NN filtered image and the reconstructed value of the traditional filter are generally combined for correction operations to obtain a balanced filtering effect based on the original image, and the correction-related information is written into the bitstream and transmitted to the decoding end.
- the selection and correction of the NN filtering mode and the traditional filtering mode are all based on a block.
- larger blocks are generally used, such as 64x64, 128x128, 256x256, etc.
- This approach may have better overall performance for a block, but the local performance may deteriorate.
- the corrected reconstructed frame will be used as a reference frame for subsequent frames.
- the image may have local distortion after being processed by the neural network filter, which can easily cause error transmission.
- the other is that for the inter-frame prediction process, there are duplicate reference frames in the reference image list of some video frames, which lacks diversity and leads to poor inter-frame prediction effects.
- Currently are no effective solutions to these problems yet.
- the embodiments of the present application provide a video decoding method, a video processing device, a computer-readable storage medium and a computer program product.
- the video decoding method is based on a hybrid coding framework, provides reference frames processed by multiple loop filtering methods for subsequent frames to use, increases the diversity of reference frames, can improve the problem of detail quality, and improve the overall video image quality.
- FIG8 is a schematic diagram of the application scenario system architecture of the video decoding method provided by an embodiment of the present application.
- a terminal 110 and a server 120 are included.
- the terminal 110 or the server 120 can perform video encoding through an encoder, or perform video decoding through a decoder.
- the terminal 110 or the server 120 can also perform video encoding through a processor running a video encoding program, or perform video decoding through a processor running a video decoding program.
- the server 120 receives the encoded data sent by the terminal 110 through the input interface, it can be directly transferred to the processor for decoding, or it can be stored in a database waiting for subsequent decoding.
- the server 120 encodes the original video frame through the processor to obtain the encoded data, it can be directly sent to the terminal 110 through the output interface, or the encoded data can be stored in the database waiting for subsequent transmission.
- the video decoding method can be completed in the terminal 110 or the server 120.
- the terminal 110 can encode the input video frame using the video encoding method and then send the encoded data to the server 120, or can receive the encoded data from the server 120 and decode it to generate a decoded video frame.
- the server 120 can encode the video frame. In this case, the video encoding method is completed in the server 120. If the server 120 needs to decode the encoded data, the video decoding method is completed in the server 120.
- the server 120 can send the encoded data to the corresponding receiving terminal, which is decoded by the receiving terminal.
- the encoding end and the decoding end can be the same end or different ends, and the above-mentioned computer device, such as a terminal or a server, can be an encoding end or a decoding end.
- the terminal 110 and the server 120 are connected via a network.
- the terminal 110 of the embodiment of the present application may be a device related to image and video playback, such as a mobile phone, a tablet computer, a computer, a laptop computer, a wearable device, a vehicle-mounted device, a liquid crystal display, a cathode ray tube display, a holographic imaging display or other terminal devices such as a projection, etc., which are not limited by the embodiment of the present application.
- the server 120 may be implemented as an independent server or a server cluster consisting of multiple servers.
- Fig. 9 is a flow chart of a video processing method such as a video decoding method provided in an embodiment of the present application. As shown in Fig. 9, the video decoding method is applied to a video processing device. In the embodiment of Fig. 9, the video decoding method may include but is not limited to step S1000, step S2000 and step S3000.
- Step S1000 Acquire reference frame information of a video frame to be decoded.
- the reference frame information corresponding to the video frame to be decoded is obtained, and the reference frame information includes a picture order value POC of the reference frame.
- the reference frame information includes a POC, which indicates the frame position of the reference frame.
- reference frame information corresponding to the video frame to be decoded is obtained, and the reference frame information includes the picture order values POC of multiple reference frames.
- the reference frame information includes two POCs, and the POCs indicate the frame positions of the reference frames.
- the reference frame is used to reconstruct the video frame to be decoded.
- the reference frame information may also be other information other than the image sequence value that can represent the image playback sequence or the image playback position, which is not limited in this embodiment.
- the reference frame may be a forward frame of the current frame, or a backward frame of the current frame, or may include both the forward frame and the backward frame of the current frame.
- the reference frame may be one or more.
- Step S2000 Obtain a reference image list of a video frame to be decoded according to the reference frame information and the supplementary frame information.
- the frame position of the reference frame corresponding to the video frame to be decoded is obtained according to the reference frame information. Specifically, the frame position of the reference frame corresponding to the video frame to be decoded is obtained according to the image sequence value POC of the reference frame. Then, according to the supplementary frame information isInsertFlag, the image corresponding to the reference frame is obtained. That is, according to the POC of the reference frame, the frame position corresponding to the reference frame is obtained, and then according to the isInsertFlag of the reference frame, the image corresponding to the reference frame is obtained, and finally, according to the above content, the corresponding image is found/extracted from the image cache. In the above manner, all images of the reference frame corresponding to the video frame to be decoded can be obtained, thereby forming a reference image list.
- the reference frame information corresponding to the to-be-decoded video frame is obtained by parsing the video encoding bitstream.
- the frame corresponding to POC 16 is the current frame
- the frame corresponding to POC 0 and the frame corresponding to POC 32 are all reference frames corresponding to the current frame.
- the reference image list includes a first reference image list and a second reference image list.
- the first reference image list L0 and the second reference image list L1 can be divided in different ways.
- the images whose POC values corresponding to the reference frame are less than the POC values corresponding to the video frame to be decoded are configured as the first reference image list L0, and the images whose POC values corresponding to the reference frame are greater than the POC values corresponding to the video frame to be decoded are configured as the second reference image list L1;
- the images whose POC values corresponding to the reference frame are greater than the POC values corresponding to the video frame to be decoded can also be configured as the first reference image list L0, and the images whose POC values corresponding to the reference frame are less than the POC values corresponding to the video frame to be decoded can be configured as the second reference image list L1;
- the division can also be carried out according to other specified methods, which are not limited here.
- the frame corresponding to POC 16 is the current frame
- the frame corresponding to POC 0 and the frame corresponding to POC 32 are all reference frames corresponding to the current frame. All reference frame images are obtained according to the reference frame information POC and the supplementary frame information isInsertFlag.
- All reference frame images are divided into different decoded images according to the difference of isInsertFlag.
- the image sequence value of the first decoded image is equal to one of the image sequence values of the reference frame
- the second decoded image can be located at different positions in the reference image list, for example, at the first position of the first reference image list or the second reference image list, at the end of the first reference image list or the second reference image list, or at the second end of the first reference image list or the second reference image list. It can also be configured to a specified position according to the indication information of the video encoding code stream. The position of the first decoded image can also be replaced.
- Step S3000 Decode the video frame to be decoded according to the reference image list to obtain a first reconstructed image and a decoded image of the video frame to be decoded.
- a suitable reference image is selected for video processing such as decoding processing to obtain a reconstructed image or a restored image of the video frame to be decoded.
- the reconstructed image includes at least two images, exemplarily a first reconstructed image and a decoded image.
- the reconstructed image such as the first reconstructed image and the decoded image are stored in the image cache and can be used as a reference image for the next frame (i.e., the current frame at the next moment) to restore the image of the next frame (i.e., the current frame at the next moment).
- the decoded image can also be output as a decoded image.
- supplementary frame information may be configured for the reconstructed image.
- the first reconstructed image is different from the decoded image.
- the first reconstructed image is not subjected to loop filtering, while the decoded image is subjected to loop filtering.
- the loop filtering process may include deblocking filter (DBF); sample adaptive offset (SAO); luminance mapping with chroma scaling (LMCS); neural network filter (NNF); adaptive loop filter (ALF).
- the first reconstructed image and the decoded image are processed by different loop filters.
- the first reconstructed image is processed by luminance mapping based on chroma scaling LMCS
- the decoded image is processed by LMCS, deblocking loop filtering DBF, sample adaptive compensation loop filtering SAO, neural network-based loop filtering NNF, and adaptive loop filtering.
- the first reconstructed image and the decoded image may have different loop filtering processes, which is not limited here.
- the technical solution of this embodiment is not only applicable to the video frame to be decoded, but also applicable to the video/image frame to be processed, or the target video/image frame, or the current video/image frame.
- the selection and correction of the neural network loop filtering mode and the traditional filtering mode are based on a block.
- the local performance of the image after the neural network filtering process deteriorates. If the decoded image after the neural network loop filtering continues to be used as the reference frame of the subsequent frame, it will cause local distortion.
- the image quality of the traditional filtering technology is stable. This embodiment increases the diversity of the reference frame and improves the quality of the predicted reconstructed image of the frame by storing the first reconstructed image after the traditional filtering process in the decoded image cache as a reference frame.
- this embodiment increases the quality of the predicted reconstructed image of the subsequent frames by storing the reconstructed image without ALF filtering as the first reconstructed image in the decoded image cache as a reference frame for subsequent frames.
- Fig. 10 is a flow chart of a video decoding method provided by an embodiment of the present application. As shown in Fig. 10, the video decoding method is applied to a video processing device. In the embodiment of Fig. 10, the video decoding method may include but is not limited to step S1000, step S2000, step S3000 and step S4000.
- Step S1000, step S2000, and step S3000 in the aforementioned embodiments are applicable to this embodiment and will not be described in detail herein.
- Step S4000 Obtain a third reconstructed image according to the luminance component of the first reconstructed image and the chrominance component of the decoded image.
- the third reconstructed image is stored in the image cache.
- An embodiment of the present application provides a video processing method such as a video decoding method applied to a video frame including a time layer identifier. If the time layer identifier of the video frame to be decoded is less than a preset threshold, the video processing method described in Figure 9 or Figure 10 is performed on the video frame to be decoded.
- a preset threshold a preset threshold
- 32 frames are divided into 6 time layers from top to bottom, the frames above the figure are in the low time layer, and the frames below the figure are in the high time layer. Since the probability of the frame of the high time layer being used as a reference frame is much smaller than that of the low time layer, the improvement of the reconstructed frame after storing DBF in the high time layer is not as great as that in the low time layer. Therefore, considering the reasonable use of resources, when dividing the current time layer TL, the reconstructed frame after DBF can be stored for the low time layer frame whose time layer TL is less than the threshold T, which can save unnecessary storage overhead.
- this example describes one of the insertion positions.
- Fig. 11 is a schematic diagram of inserting reference frames at different positions provided by this example.
- the video decoding method provided by this example includes at least the following steps:
- Step S501 construct a reference image list.
- the decoding end obtains the image order value POC of the reference frame of the video frame to be decoded from the bitstream parsing, and constructs the first reference image list L0 and the second reference image list L1 from the decoded image cache according to the image order value POC.
- the first reference image list and the second reference image list constitute the reference image list.
- the reference image list includes the first decoded image and the second decoded image.
- the first decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 0
- the second decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 1.
- the reference frame can be marked as a "short-term reference frame", "non-reference frame” or "long-term reference frame".
- the reference frames can be labeled as "short-term reference frame”, “non-reference frame” or “long-term reference frame”.
- Step S502 If there is an image with isInsertFlag equal to 1 in the decoded image cache at this time, and the corresponding POC number appears in L0 and L1, then the image is inserted into the first/last/next-last position of the corresponding reference image list, or the position of the inserted image can be flexibly determined by a specific cost.
- Step S503 Taking coding units (CU) as units, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information parsed by the bit stream to obtain a reconstructed block.
- CU coding units
- Step S504 After the whole frame prediction is completed, the LMCS step is performed.
- Step S505 Perform conventional filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of code stream analysis. At the same time, store the filtered reconstructed image as the first reconstructed image in the decoded image cache, record the image sequence value of the video frame to be decoded, and set the supplementary frame information as the second supplementary frame information.
- Step S506 Perform NN filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of the code stream analysis, so as to prepare for the subsequent traditional filtering and NN filtering reconstructed frames.
- Step S507 Adapt NN filtering and traditional filtering according to the result of bitstream analysis.
- Step S508 performing an ALF filtering operation on the reconstructed frame in step S107.
- Step S509 The decoded image is marked as a "short-term reference frame", and the supplementary frame information is set as the first supplementary frame information.
- the decoded image is stored in a decoded image buffer.
- FIG12 is a flowchart of replacing a reference frame provided in this example. As shown in the figure, the process includes at least the following steps:
- Step S601 construct a reference image list.
- the decoding end obtains the image order value POC of the reference frame of the video frame to be decoded from the bitstream parsing, and constructs the first reference image list L0 and the second reference image list L1 from the decoded image cache according to the image order value POC.
- the first reference image list and the second reference image list constitute the reference image list.
- the reference image list includes the first decoded image and the second decoded image.
- the first decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 0
- the second decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 1.
- the reference frame can be marked as a "short-term reference frame", "non-reference frame” or "long-term reference frame".
- Step S602 if there is an image with isInsertFlag equal to 1 in the decoded image cache, and the corresponding POC number appears in L0 and L1, then the image replaces the original reference image with isInsertFlag equal to 0.
- Step S603 Taking CU as a unit, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information analyzed in the bitstream to obtain a reconstructed block.
- Step S604 After the whole frame prediction is completed, the LMCS step is performed.
- Step S605 Perform conventional filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of code stream analysis. Meanwhile, store the result after conventional filtering in the decoded image buffer, record the POC number corresponding to the video frame to be decoded, and set isInsertFlag to 1.
- Step S606 Perform NN filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of bitstream analysis.
- Step S607 Adapt NN and traditional filtering according to the result of bitstream analysis.
- Step S608 performing an ALF filtering operation on the reconstructed frame of the above process.
- Step S609 the decoded image is marked as a "short-term reference frame" and stored in the decoded image cache, and isInsertFlag is set to 0.
- Step S701 construct a reference image list.
- the decoding end obtains the image order value POC of the reference frame of the video frame to be decoded from the bitstream parsing, and constructs the first reference image list L0 and the second reference image list L1 from the decoded image cache according to the image order value POC, and the first reference image list and the second reference image list constitute the reference image list.
- the reference image list includes the first decoded image and the second decoded image, the first decoded image corresponds to the image whose isInsertFlag identifier value is equal to 0, and the second decoded image corresponds to the image whose isInsertFlag identifier value is equal to 1.
- Step S702 At this time, if the current time layer TL is less than the threshold T (T ⁇ [0, 5], which can be set according to needs), it is determined whether there is an image with isInsertFlag equal to 1 in the decoded image cache, and the corresponding POC number appears in L0 and L1, then the image is inserted after the corresponding reference image list.
- T T ⁇ [0, 5], which can be set according to needs
- Step S703 Taking CU as a unit, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information of bitstream analysis to obtain a reconstructed block.
- Step S704 After the whole frame prediction is completed, the LMCS step is performed.
- Step S705 Perform conventional filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of the code stream analysis. Meanwhile, store the result after conventional filtering in the decoded image buffer, record the POC number corresponding to the video frame to be decoded, and set isInsertFlag to 1.
- Step S706 Perform NN filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of bitstream analysis.
- Step S707 Adapt NN and traditional filtering according to the result of bitstream analysis.
- Step S708 performing an ALF filtering operation on the reconstructed frame of the above process.
- Step S709 the decoded image is marked as a "short-term reference frame" and stored in the decoded image cache, and isInsertFlag is set to 0.
- this example only stores the luminance component after traditional filtering into the decoded image cache, and the chroma component remains the result after NN and traditional adaptation.
- FIG13 is a flow chart of storing the brightness component into the decoded image buffer provided by this example. As shown in the figure, it at least includes the following steps:
- Step S801 construct a reference image list.
- the decoding end obtains the image order value POC of the reference frame of the video frame to be decoded from the bitstream parsing, and constructs the first reference image list L0 and the second reference image list L1 from the decoded image cache according to the image order value POC.
- the first reference image list and the second reference image list constitute the reference image list.
- the reference image list includes the first decoded image and the second decoded image.
- the first decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 0
- the second decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 1.
- the reference frame can be marked as a "short-term reference frame", "non-reference frame” or "long-term reference frame".
- Step S802 If there is an image with isInsertFlag equal to 1 in the decoded image cache, and the corresponding POC number appears in L0 and L1, the image is inserted after the corresponding reference image list.
- Step S803 Taking CU as a unit, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information analyzed in the bitstream to obtain a reconstructed block.
- Step S804 After the whole frame prediction is completed, the LMCS step is performed.
- Step S805 Perform conventional filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of bitstream analysis. Meanwhile, store the result of the luminance component after conventional filtering in the decoded image buffer, record the POC number corresponding to the video frame to be decoded, and set isInsertFlag to 1.
- Step S806 Perform NN filtering on the corresponding blocks in the reconstructed image after LMCS according to the result of bitstream analysis.
- Step S807 Adapt NN and traditional filtering according to the result of bitstream analysis.
- Step S808 Perform ALF filtering on the reconstructed frame of the above process.
- Step S809 the decoded image is marked as a "short-term reference frame" and stored in the decoded image cache, and isInsertFlag is set to 0.
- FIG14 is a flow chart of video decoding provided by this example. As shown in the figure, it at least includes the following steps:
- Step S901 construct a reference image list.
- the decoding end obtains the image order value POC of the reference frame of the video frame to be decoded from the code stream, and constructs the first reference image list L0 and the second reference image list L1 from the decoded image cache according to the image order value POC.
- the first reference image list and the second reference image list constitute the reference image list.
- the reference image list includes the first decoded image and the second decoded image.
- the first decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 0
- the second decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 1.
- the reference frame can be marked as a "short-term reference frame", "non-reference frame” or "long-term reference frame".
- Step S902 If there is an image with isInsertFlag equal to 1 in the decoded image cache, and the corresponding POC number appears in L0 and L1, the image is inserted after the corresponding reference image list.
- Step S903 Taking CU as a unit, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information parsed in the bitstream to obtain a reconstructed block.
- Step S904 After the whole frame prediction is completed, the LMCS step is performed.
- Step S905 Perform DBF filtering on the reconstructed image after LMCS. Meanwhile, store the result after conventional filtering in the decoded image buffer, record the POC number corresponding to the video frame to be decoded, and set isInsertFlag to 1.
- Step S906 performing SAO filtering on the reconstructed image after the decoded image is cached.
- Step S907 Perform ALF filtering operation.
- Step S908 The decoded image is marked as a "short-term reference frame" and stored in the decoded image cache, and isInsertFlag is set to 0.
- images corrected by different NN filters can be used as reference frames for subsequent frames, increasing the diversity of reference frames.
- FIG15 is a flowchart of video decoding provided by this example. As shown in the figure, it at least includes the following steps:
- Step S1001 construct a reference image list.
- the decoding end obtains the image order value POC of the reference frame of the video frame to be decoded from the bitstream parsing, and constructs the first reference image list L0 and the second reference image list L1 from the decoded image cache according to the image order value POC.
- the first reference image list and the second reference image list constitute the reference image list.
- the reference image list includes the first decoded image and the second decoded image.
- the first decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 0
- the second decoded image corresponds to the image with the value of the isInsertFlag identifier equal to 1.
- the reference frame can be marked as a "short-term reference frame", "non-reference frame” or "long-term reference frame".
- Step S1002 At this time, if there is an image in the decoded image cache with isInsertFlag equal to 1, and the corresponding POC number is in L0, If it appears in L1, the picture is inserted after the corresponding reference picture list.
- Step S1003 Taking CU as a unit, prediction, transformation, quantization, inverse quantization and inverse transformation are performed according to the information of bitstream analysis to obtain a reconstructed block.
- Step S1004 After the whole frame prediction is completed, the LMCS step is performed.
- Step S1005 According to the result of code stream analysis, the corresponding block in the reconstructed image after LMCS is subjected to neural network filtering, and the first neural network filter NN1 is used at this time. At the same time, the result after NN1 filtering is stored in the decoded image cache, the POC number corresponding to the video frame to be decoded is recorded, and isInsertFlag is set to 1.
- Step S1006 According to the result of bitstream analysis, a neural network filter is performed on the corresponding block in the reconstructed image after LMCS, and the second neural network filter NN2 is used at this time.
- Step S1007 Adapt multiple neural network filtering results according to the results of bitstream analysis.
- Step S1008 Perform ALF filtering on the reconstructed frame of the above process
- Step S1009 the decoded image is marked as a "short-term reference frame" and stored in the decoded image cache, and isInsertFlag is set to 0.
- FIG16 is a schematic diagram of the structure of a video processing device provided by an embodiment of the present application.
- the video processing device 2000 includes a memory 2100 and a processor 2200.
- the number of the memory 2100 and the processor 2200 can be one or more.
- FIG16 takes one memory 2100 and one processor 2200 as an example.
- the memory 2100 and the processor 2200 can be connected via a bus or other means.
- the memory 2100 is a computer-readable storage medium that can be used to store software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the methods provided in any embodiment of the present application.
- the processor 2200 implements the above method by running the software programs, instructions, and modules stored in the memory 2100.
- the memory 2100 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application required for at least one function.
- the memory 2100 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk storage device, a flash memory device or other non-volatile solid-state storage device.
- the memory 2100 further includes a memory remotely arranged relative to the processor 2200, and these remote memories may be connected to the device via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network and a combination thereof.
- An embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are used to execute the video processing device method provided in any embodiment of the present application.
- An embodiment of the present application also provides a computer program product, including a computer program or computer instructions, which are stored in a computer-readable storage medium.
- a processor of a computer device reads the computer program or computer instructions from the computer-readable storage medium, and the processor executes the computer program or computer instructions, so that the computer device executes a video processing device method provided in any embodiment of the present application.
- the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation.
- Some physical components or all physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit.
- a processor such as a central processing unit, a digital signal processor or a microprocessor
- Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium).
- computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules or other data).
- Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer.
- communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.
- a component can be, but is not limited to, a process running on a processor, a processor, an object, an executable file, an execution thread, a program, or a computer.
- applications running on a computing device and a computing device can be components.
- One or more components may reside in a process or an execution thread, and a component may be located on a computer or distributed between two or more computers.
- these components may be executed from various computer-readable media having various data structures stored thereon.
- Components may communicate, for example, through a local or remote process based on a signal having one or more data packets (e.g., data from two components interacting with another component between a local system, a distributed system, or a network, such as the Internet interacting with other systems through signals).
- a signal having one or more data packets (e.g., data from two components interacting with another component between a local system, a distributed system, or a network, such as the Internet interacting with other systems through signals).
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请实施例提供一种视频解码方法、视频处理设备、计算机可读存储介质及计算机程序产品,视频解码方法包括:获取待解码视频帧的参考帧信息;根据所述参考帧信息和补充帧信息,得到所述待解码视频帧的参考图像列表;根据所述参考图像列表解码待解码视频帧,得到所述待解码视频帧的第一重建图像和已解码图像。
Description
相关申请的交叉引用
本申请基于申请号为202211263037.4、申请日为2022年10月14日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本申请实施例涉及视频处理技术领域,尤其是一种视频解码方法、视频处理设备、计算机可读存储介质及计算机程序产品。
随着神经网络(Neural Network,NN)的不断发展,其非线性拟合能力越来越强。因此,在视频处理诸如视频编码或视频解码中基于神经网络的视频处理技术被广泛应用。
相关技术中,基于神经网络的环路滤波处理是提升视频编解码性能的有效途径之一,神经网络滤波模式和传统滤波模式的选择和修正都是以块为单位的。采用上述方式得到的图像整体性能可能较好,但存在局部性能变差的导致局部失真的情形,而参考帧的失真易造成误差的传递;此外,对于帧间预测过程,一些视频帧的参考图像列表中存在重复参考帧,缺乏多样性,导致帧间预测效果不佳。因此,在视频处理过程中,如何进一步提升视频图像质量是一个亟待讨论和解决的问题。
发明内容
本申请实施例提供一种视频解码方法、视频处理设备、计算机可读存储介质与计算机程序产品,旨在提升视频图像质量。
第一方面,本申请实施例提供一种视频解码方法,所述方法包括:获取待解码视频帧的参考帧信息;根据所述参考帧信息和补充帧信息,得到所述待解码视频帧的参考图像列表;根据所述参考图像列表解码待解码视频帧,得到所述待解码视频帧的第一重建图像和已解码图像。
第二方面,本申请实施例提供一种视频解码方法,应用于包括时间层标识的视频帧,所述方法包括:在待解码视频帧时间层标识小于预设阈值时,对所述待解码视频帧执行如第一方面任意一项所述的视频解码方法。
第三方面,本申请实施例提供一种视频处理设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面或第二方面任意一项所述的视频解码方法。
第四方面,本申请实施例提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如第一方面或第二方面任意一项所述的视频解码方法。
第五方面,本申请实施例提供一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如第一方面或第二方面任意一项所述的视频解码方法。
图1是相关技术提供的视频编码流程示意图;
图2是相关技术提供的视频解码流程示意图;
图3是相关技术提供的环路滤波流程示意图;
图4是相关技术提供的随机接入视频编码配置示意图;
图5是相关技术提供的low-delay B视频编码配置示意图;
图6是相关技术提供的基于NN编码视频编码的流程示意图;
图7是相关技术提供的基于NN编码视频解码的流程示意图;
图8是本申请一实施例提供的视频解码方法的应用场景系统架构示意图;
图9是本申请一实施例提供的视频解码方法的流程图;
图10是本申请一实施例提供的视频解码方法的流程图;
图11是本示例提供的参考帧插入不同位置的示意图;
图12是本示例提供的替换参考帧的流程图;
图13是本示例提供的将亮度分量存入解码图像缓存的流程图;
图14是本示例提供的视频解码的流程图;
图15是本示例提供的视频解码的流程图;
图16是本申请一实施例提供的视频处理设备的结构示意图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本申请实施例中的具体含义。
本申请实施例中,“进一步地”、“示例性地”或者“可选地”等词用于表示作为例子、例证或说明,不应被解释为比其它实施例或设计方案更优选或更具有优势。使用“进一步地”、“示例
性地”或者“可选地”等词旨在以具体方式呈现相关概念。
图1是相关技术提供的视频编码流程示意图。如图1所示,ITU-T和ISO/IEC联合视频项目开发的新一代视频编码标准H.266/VVC编码框架包括帧内预测、帧间预测、变换、量化、环路滤波以及熵编码等功能模块,视频编码流程至少包括下述步骤:
步骤S101:将输入的视频帧图像进行块划分,形成编码树单元(Coding Tree Unit,CTU)。
步骤S102:将划分好的CTU送至帧内/帧间预测模块进行预测编码。其中帧内预测模块主要用于去除图像的空间相关性,通过编码后的重构块信息来预测当前像素块以去除空间冗余信息;帧间预测模块主要用于去除图像的时间相关性,通过将已编码的图像作为当前帧的参考图像,来获取各个块的运动信息,从而去除时间冗余。
步骤S103:再将得到的预测值与原始块进行相减得到残差值,然后对残差进行变换和量化,以去除频域相关性,对数据进行有损压缩。变换编码将图像从空域信号变换至频域,将能量集中至低频区域。量化模块可以减小图像编码的动态范围。
步骤S104:最后再把所有的编码参数和残差进行熵编码形成二进制流进行存储或传输,熵编码模块的输出数据即原始视频压缩后的码流。
步骤S105:将预测值和反量化、反变换后的残差进行相加得到块重建值,最终形成重建图像。
步骤S106:重建图像经过环路滤波器进行滤波操作,存入图像缓存作为以后的参考图像。H.266/VVC中环路滤波技术包括亮度映射与色度缩放(Luma Mapping With Chroma Scaling,LMCS)、去方块滤波(Deblocking Filter,DBF)、像素自适应补偿(SampleAdaptive Offset,SAO)以及自适应环路滤波(Adaptive LoopFilter,ALF)。LMCS通过对动态范围内信息重新分配码字提高压缩效率;DBF用于降低方块效应;SAO用于改善振铃效应;ALF可以减少解码误差。
图2是相关技术提供的视频解码流程示意图。如图2所示,视频解码流程至少包括下述步骤:
步骤S201:解析码流获取预测模式得到预测值。
步骤S202:对码流解析得到的残差进行反变换反量化。
步骤S203:将预测值和反量化、反变换后的残差进行相加得到块重建值,最终形成重建图像。
步骤S204:重建图像经过环路滤波器进行滤波操作,存入图像缓存作为以后的参考图像。
图3是相关技术提供的环路滤波流程示意图。如图所示,重建图像先经过LMCS模块,LMCS模块能够在编码前分段线性地改变输入视频信号幅度分布的动态范围从而提高编码效率,在解码端逆向复原;然后经过DBF模块,DBF模块增加了更长的滤波器和一个专门为高动态视频设计的亮度自适应滤波模式;接着经过SAO模块,能够减少高频分量的损失,同时不降低编码的效率,从像素域入手对振铃区域进行补偿;最后经过ALF模块,基于亮度和色度分别采用菱形滤波器,对每个块,从所传递的多组滤波器中选择一个滤波器使用。
随着神经网络(Neural Networks,NN)的不断发展,其非线性拟合能力越来越强大,由于神经网络可以学习到原始域与重建域之间的映射关系,因此在视频编解码领域中,基于NN的视频编码是未来视频编码的发展方向。其中,使用NN进行环路滤波是提升编码性能的有
效途径之一。目前以新一代视频编码标准H.266/VVC为基础的NNVC支持两种网络结构,两种网络的切换以及开启、关闭可以通过序列参数集(SPS)中的语法元素控制。
输入视频帧是构成视频的基本单位,一个视频序列可以包括多个视频帧。输入视频帧可以是实时采集的视频帧,例如可以是通过终端的摄像头实时获取的视频帧,也可以是存储的视频对应的视频帧。输入视频帧可以是I帧、P帧或者B帧,其中I帧为帧内预测帧,P帧为前向预测帧,B帧为双向预测帧。
参考帧,是在对待解码视频帧进行编码时所要参考的视频帧。参考帧是对可作为参考帧的视频帧对应的编码数据重建得到的视频帧。待解码视频帧对应的参考帧根据帧间预测类型的不同,可为前向参考帧或双向参考帧,待编码视频帧对应的当前参考帧的个数可为一个或多个。例如,当待编码视频帧为P帧,则对应的参考帧可以为1个。当待编码视频帧为B帧,则对应的参考帧可以为2个。待编码视频帧对应的参考帧可以是根据参考关系得到的,参考关系根据各个视频编解码标准可以不同。
参考帧管理是视频编码中的一项关键技术,主要负责管理解码图像缓存(Decoded Picture Buffer,DPB),并从中选取最优的参考方案,创建参考图像队列。DPB是视频编码中用来存放解码图像的缓存。为了去除时间冗余,当前编码帧可使用DPB中已解码的图像作为参考帧,通过帧间预测,仅传输帧间预测残差,进而提升编码效率。
进一步地,存储当前帧的前向参考帧的列表被称为前向参考图像列表L0,在本申请实施例中也称为第一参考图像列表;存储当前帧的后向参考帧的列表被称为后向预测参考图像列表L1,在本申请实施例中也称为第二参考图像列表。若待解码视频帧是单向预测,则候选列表只有L0列表;若待解码视频帧是双向预测,则候选列表有L0和L1。
在视频编码过程中,通常根据编码配置(Config)决策各个帧使用的参考帧。reference_pictures_L0/L1:分别指示L0和L1中参考帧与当前帧的距离;ref_pics_L0/L1:指示当前帧L0/L1中最多可使用的参考帧数量。ref_pics_active_L0,ref_pics_active_L1:指示L0/L1允许使用的参考帧数量。编码端会将各个帧使用的参考帧POC号传输到码流中。
在视频的解码端,对码流解析获取当前帧的参考帧。
常用的视频编码配置包括随机接入(Random Access,RA)配置和低延迟(loW-delay B)配置。
图4是相关技术提供的随机接入视频编码配置示意图。如图4所示,图中每个矩形表示一个帧,携带有参考帧信息与编码/解码序号,参考帧信息可以为图像顺序值(Picture Order Count,POC),表示解码后视频帧的播放顺序,编码/解码序号表示编/解码处理过程中的视频帧顺序。例如,POC为32的P帧为前向预测帧,对应的参考帧为POC为0的I帧;POC为16的B帧为双向预测帧,对应的参考帧为2个,分别为POC为0的I帧与POC为32的P帧,以此类推。
图5是相关技术提供的low-delay B视频编码配置示意图。如图5所示,第一幅编码图像为I图像,其余编码图像为只具有显示顺序上过去的参考图像的B图像或P图像,图像的显示顺序和解码顺序相同。箭头表示图像之间的参考关系,箭头指向的是参考图像。每帧图像仅参考播放顺序在当前编码图像之前的重建帧,视频序列按播放顺序编解码,不必等待编码
顺序在当前图像后面而播放顺序在前面的图像的编解码,时延相对更小,因此称为低延时结构适合直播、视频通话等对时延要求较高的场景。
图6是相关技术提供的基于NN编码视频编码的流程示意图。如图6所示,视频编码流程至少包括下述步骤:
步骤S301:构建参考帧列表。编码端根据cfg中指示的参考帧POC差构建参考列表L0、L1。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S302:以cu为单位,进行预测、变换量化、反量化反变换得到重建块。
步骤S303:整帧预测结束之后,进行LMCS步骤。
步骤S304:对LMCS后的重建图像进行传统滤波。
步骤S305:对LMCS后的重建图像进行NN滤波。
步骤S306:根据原始图像对NN和传统滤波进行适配和修正,确定修正相关语法元素。
步骤S307:对上述过程的NN滤波修正帧进行ALF滤波操作。
步骤S308:当前帧被标记为“短期参考帧”存入DPB中。
图7是相关技术提供的基于NN编码视频解码的流程示意图。如图7所示,视频解码流程至少包括下述步骤:
步骤S401:根据码流信息构建参考帧列表。根据码流中解析出的参考帧POC差构建参考列表L0、L1。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S402:以cu为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S403:整帧预测结束之后,进行LMCS步骤。
步骤S404:按照码流解析的结果对LMCS后的重建图像中对应的块进行传统滤波。
步骤S405:按照码流解析的结果对LMCS后的重建图像中对应的块进行NN滤波。
步骤S406:按照码流解析的结果对NN和传统滤波进行适配。
步骤S407:对上述过程的NN滤波修正帧进行ALF滤波操作。
步骤S408:当前帧被标记为“短期参考帧”存入DPB中。
相关技术中,NN滤波模块的模型参数可以预设,也可以通过码流进行传输。现有的NN滤波网络一般为离线网络,通过大量的数据离线训练模型。在编码视频图像时,NN滤波网络后图像可能优于传统滤波方案,也存在部分像素点NN滤波网络图像差于传统滤波方案。基于此,在基于NN的视频编码中,一般会将NN滤波后图像和传统滤波的重建值联合进行修正操作,根据原始图像得到一个均衡的滤波效果,并将修正相关的信息写入码流,传送至解码端。
现有的NN修正过程中,NN滤波模式和传统滤波模式的选择和修正都是以一个块为单位的。为了节省写入码流的信息,一般采用较大的块,例如64x64,128x128,256x256等。这样的做法对于一个块来说,整体性能可能较好,但是局部性能可能变差。在帧间预测结构中,经过修正后的重建帧会做为参考帧给后续帧使用。目前仍存在两个问题,一是由于经过神经网络滤波处理后的图像可能存在局部的失真,易造成误差的传递;二是对于帧间预测过程,一些视频帧的参考图像列表中存在重复参考帧,缺乏多样性,导致帧间预测效果不佳。目前
这些问题尚无有效解决方案。
基于此,本申请实施例提供一种视频解码方法、视频处理设备、计算机可读存储介质与计算机程序产品,视频解码方法基于混合编码框架,提供多种环路滤波方式处理后的参考帧供后续帧使用,增加参考帧的多样性,能够改善细节质量的问题,整体提升视频图像质量。
下面结合附图,对本申请实施例做进一步阐述。
图8是本申请一实施例提供的视频解码方法的应用场景系统架构示意图。如图8所示,在该应用环境中,包括终端110以及服务器120。终端110或服务器120可以通过编码器进行视频编码,或者通过解码器进行视频解码。终端110或服务器120也可以通过处理器运行视频编码程序进行视频编码,或者通过处理器运行视频解码程序进行视频解码。服务器120通过输入接口接收到终端110发送的编码数据后,可直接传递至处理器进行解码,也可存储至数据库中等待后续解码。服务器120在通过处理器对原始视频帧编码得到编码数据后,可直接通过输出接口发送至终端110,也可将编码数据存储至数据库中等待后续传递。
视频解码方法可以在终端110或服务器120中完成,终端110可将输入视频帧采用视频编码方法进行编码后将编码数据发送至服务器120,也可从服务器120接收编码数据进行解码后生成解码视频帧。服务器120可以对视频帧进行编码,此时视频编码方法在服务器120完成,如果服务器120需要对编码数据进行解码,则视频解码方法在服务器120完成。当然,服务器120接收终端110发送的编码数据后,可将编码数据发送到对应的接收终端中,由接收终端进行解码。可以理解的是,编码端和解码端可以是同一端或者不同端,上述计算机设备,比如终端或服务器,可以是编码端也可以是解码端。
终端110和服务器120通过网络连接。本申请实施例的终端110可以是与图像以及视频播放相关的设备,例如:手机、平板电脑、计算机、笔记本电脑、可穿戴设备、车载设备、液晶显示器、阴极射线管显示器、全息成像显示器或投影等其它终端设备等,本申请实施例并不限定。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
需要说明的是,本申请实施例提供的视频解码方法基于NN的视频编码框架,因此,可参考图6或图7中的流程架构,在此不做赘述。
图9是本申请一实施例提供的视频处理例如视频解码方法的流程图。如图9所示,该视频解码方法应用于视频处理设备。在图9的实施例中,该视频解码方法可以包括但不限于步骤S1000、步骤S2000以及步骤S3000。
步骤S1000:获取待解码视频帧的参考帧信息。
在一实施例中,通过对视频编码码流进行解析,得到待解码视频帧对应的参考帧信息,参考帧信息包括一个参考帧的图像顺序值POC。示例性地,待解码视频帧的参考帧只有一个,则参考帧信息包括一个POC,该POC指示该参考帧的帧位置。以图4所示的为例,假设POC为32对应的帧为当前帧,POC为0对应的帧为当前帧对应的参考帧,那么当前帧的参考帧只有一个,即POC为0对应的帧。相应的,参考帧信息包括POC=0。
在另一实施例中,通过对视频编码码流进行解析,得到待解码视频帧对应的参考帧信息,参考帧信息包括多个参考帧的图像顺序值POC。示例性地,待解码视频帧的参考帧有两帧,则参考帧信息包括两个POC,该POC指示该参考帧的帧位置。以图4所示的为例,假设POC
为16对应的帧为当前帧,POC为0对应的帧以及POC为32对应的帧均为当前帧对应的参考帧,那么当前帧对应的参考帧有两个,即POC为0对应的帧和POC为32对应的帧。相应的,参考帧信息包括POC=0和POC=32。
参考帧用于对待解码视频帧进行重建,参考帧信息还可以是图像顺序值以外的其他能够代表图像播放顺序或图像所处播放位置的信息,本实施例在此不做限制。
可以理解的是,参考帧可以是当前帧的前向帧,也可以是当前帧的后向帧,还可以同时包括当前帧的前向帧和后向帧。参考帧可以是一个,也可以是多个。
步骤S2000:根据参考帧信息和补充帧信息,得到待解码视频帧的参考图像列表。
在一实施例中,根据参考帧信息,得到待解码视频帧对应的参考帧的帧位置。具体地,根据参考帧的图像顺序值POC,得到待解码视频帧对应的参考帧的帧位置。再根据补充帧信息isInsertFlag,得到参考帧对应的图像。即,根据参考帧的POC,得到参考帧对应的帧位置,再根据参考帧的isInsertFlag,得到参考帧对应的图像,最后根据上述内容,从图像缓存中找到/提取对应的图像。通过上述方式,能够得到待解码视频帧对应的参考帧的所有图像,从而组成参考图像列表。
在一可行的实施方式中,待解码视频帧对应的参考帧信息,通过解析视频编码码流得到。
示例性地,假设POC为16对应的帧为当前帧,POC为0对应的帧以及为POC为32对应的帧均为当前帧对应的参考帧。通过解析视频编码码流,得到当前帧对应的参考帧信息,具体包括POC=0和POC=32;再根据isInsertFlag分别找到POC=0的帧位置所包括的图像,如isInsertFlag=0和isInsertFlag=1,根据不同的POC可以构建不同的图像子列表,如第一参考图像列表L0={POC=0,isInsertFlag=0,isInsertFlag=1};第二参考图像列表L1={POC=32,isInsertFlag=0,isInsertFlag=1}。参考图像列表包括第一参考图像列表和第二参考图像列表。
第一参考图像列表L0和第二参考图像列表L1可以用不同的划分方式。例如,将参考帧对应的POC值小于待解码视频帧对应的POC的图像配置为第一参考图像列表L0,将参考帧对应的POC值大于待解码视频帧对应的POC的图像配置为第二参考图像列表L1;也可以将参考帧对应的POC值大于待解码视频帧对应的POC的图像配置为第一参考图像列表L0,将参考帧对应的POC值小于待解码视频帧对应的POC的图像配置为第二参考图像列表L1;还可以根据其它指定的方式划分,在此不做限制。
示例性地,假设POC为16对应的帧为当前帧,POC为0对应的帧以及为POC为32对应的帧均为当前帧对应的参考帧。根据参考帧信息POC和所述补充帧信息isInsertFlag得到所有参考帧图像。
将所有参考帧图像中,根据isInsertFlag的不同分成不同的解码图像。示例性地,将由isInsertFlag=0的图像组成第一解码图像。第一解码图像,根据不同的POC分到不同的参考列表,例如将POC=0,分到第一参考图像列表,即第一参考图像列表L0={POC=0,isInsertFlag=0};POC=32,分到第二参考图像列表,即第二参考图像列表L1={POC=32,isInsertFlag=0}。即第一解码图像的图像顺序值等于所述参考帧的图像顺序值之一,且所述第一解码图像的补充帧信息等于第一补充帧信息(即isInsertFlag=0)。
进一步地,将由isInsertFlag=1的图像组成第二解码图像。第二解码通图像,根据不同的
POC分到不同的参考列表,例如将POC=0,分到第一参考图像列表,POC=32,分到第二参考图像列表。第二解码图像可以位于参考图像列表的不同位置,例如位于第一参考图像列表或第二参考图像列表的首位,第一参考图像列表或第二参考图像列表的末尾,第一参考图像列表或第二参考图像列表的次末尾。也可以根据视频编码码流的指示信息,配置到指定位置。还可以替换第一解码图像所在位置。
步骤S3000:根据参考图像列表解码待解码视频帧,得到待解码视频帧的第一重建图像和已解码图像。
在一实施方式中,根据参考图像列表,选择合适的参考图像进行视频处理如解码处理,得到待解码视频帧的重构图像或还原图像。重构图像至少包括两幅,示例性地分别为第一重建图像和已解码图像。重构图像如第一重建图像和已解码图像均存储于图像缓存中,可以作为下一帧(即下一时刻的当前帧)参考图像,用于还原下一帧(即下一时刻的当前帧)图像。已解码图像也可以作为解码图像输出。
在另一实施方式中,为便于查找和区分不同的重构图像,可以给重构图像配置补充帧信息。示例性地,第一重建图像的第一补充帧信息如isInsertFlag=1,已解码图像的第二补充帧信息如isInsertFlag=0。
在一实施方式中,第一重建图像和已解码图像不相同。示例性的,第一重建图像不经环路滤波处理,而已解码图像经环路滤波处理。环路滤波处理可以包括去块环路滤波(Deblocking Filter,DBF);样本自适应补偿环路滤波(Sample Adaptive Offset,SAO);基于色度缩放的亮度映射(Luma Mapping With Chroma Scaling,LMCS);基于神经网络的环路滤波(Neural Networks Filter,NNF);自适应环路滤波(Adaptive LoopFilter,ALF)。
在另一实施方式中,第一重建图像和已解码图像经不同的环路滤波处理。示例性地,第一重建图像经基于色度缩放的亮度映射LMCS处理,已解码图像经LMCS、去块环路滤波DBF、样本自适应补偿环路滤波SAO、基于神经网络的环路滤波NNF、自适应环路滤波处理。第一重建图像和已解码图像可以有不同的环路滤波处理,在此不做限制。
本实施例的技术方案不仅适用于待解码视频帧,也适用于待处理视频/图像帧,或目标视频/图像帧,或当前视频/图像帧。
神经网络环路滤波模式和传统滤波模式的选择和修正以一个块为单位的,经过神经网络滤波处理后图像的局部性能变差,若继续使用经神经网络环路滤波的已解码图像作为后续帧的参考帧,会造成局部的失真。传统滤波技术的图像质量稳定。本实施例通过将经传统滤波处理后的第一重建图像存储在解码图像缓存中作为参考帧,可以增加参考帧的多样性,提升帧的预测重建图像质量。
因经过ALF滤波后的传统滤波重建图像的局部细节表现比未经过ALF滤波后的传统滤波重建图像会差,本实施例通过将未经过ALF滤波的重建图像作为第一重建图像存储在解码图像缓存中作为后续帧的参考帧,可增加后续帧的预测重建图像质量。
图10是本申请一实施例提供的视频解码方法的流程图。如图10所示,该视频解码方法应用于视频处理设备。在图10的实施例中,该视频解码方法可以包括但不限于步骤S1000、步骤S2000、步骤S3000以及步骤S4000。
前述实施例中的步骤S1000、步骤S2000、步骤S3000适用于本实施例,在此不赘述。
步骤S4000:根据第一重建图像的亮度分量和已解码图像的色度分量,得到第三重建图像。
在一实施例中,第三重建图像存储于图像缓存中,为便于查找、标识第三重建图像,对第三重建图像配置第二补充帧信息如isInsertFlag=0。
本申请一实施例提供一种视频处理例如视频解码方法应用于包括时间层标识的视频帧,若待解码视频帧时间层标识小于预设阈值,对待解码视频帧执行上述如图9或图10所阐述的视频处理方法。示例性地,如图4所示,32个帧从上到下划分为6个时间层,图上方的帧处于低时间层,图下方的帧处于高时间层,由于高时间层的帧作为参考帧的几率相比于低时间层要小得多,高时间层存储DBF后的重建帧的提升没有低时间层那么大,因此考虑到资源的合理使用,在划分当前时间层TL时,可以对时间层TL小于阈值T的低时间层帧存储DBF后的重建帧,可节省不必要存储开销。
下面将通过六个具体示例详细说明本申请实施例提供的视频解码方法的应用过程。
示例一:
由于参考帧的顺序会影响传输时的参考帧ID号,对压缩结果有一定的影响,因此本示例对其中一种插入位置进行说明。
图11是本示例提供的参考帧插入不同位置的示意图。本示例提供的视频解码方法至少包括如下步骤:
步骤S501:构建参考图像列表。
需要说明的是,解码端从码流解析得到的待解码视频帧的参考帧的图像顺序值POC,根据图像顺序值POC,从解码图像缓存构建第一参考图像列表L0和第二参考图像列表L1,第一参考图像列表和第二参考图像列表构成了参考图像列表。参考图像列表中包含第一解码图像与第二解码图像,第一解码图像对应isInsertFlag标识符的值等于0的图像,第二解码图像对应isInsertFlag标识符的值等于1的图像。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
在本示例中,参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S502:此时解码图像缓存中若存在isInsertFlag等于1的图像,且对应POC号在L0、L1中出现,则将该图像插入到对应参考图像列表的首位/末位/次末位等多种位置,或者可以通过特定的代价灵活决定插入图像的位置。
步骤S503:以编码单元(CodingUnit,CU)为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S504:整帧预测结束之后,进行LMCS步骤。
步骤S505:按照码流解析的结果对LMCS后的重建图像中对应的块进行传统滤波。同时将滤波后的重建图像作为第一重建图像存储在解码图像缓存中,记录待解码视频帧的图像顺序值,并且将补充帧信息设置为第二补充帧信息。
步骤S506:按照码流解析的结果对LMCS后的重建图像中对应的块进行NN滤波,为后续传统滤波和NN滤波的重建帧做准备。
步骤S507:按照码流解析的结果对NN滤波和传统滤波进行适配。
步骤S508:对S107步骤的重建帧进行ALF滤波操作。
步骤S509:已解码图像被标记为“短期参考帧”,并且将补充帧信息设置为第一补充帧信息。将已解码图像存入解码图像缓存中。
示例二:
由于插入参考帧会增大运动搜索的范围,会有时间复杂度上的增加,所以本示例中将插入参考帧的操作改为替换参考帧,在一定程度上降低时间复杂度。
图12是本示例提供的替换参考帧的流程图。如图所示,至少包括以下步骤:
步骤S601:构建参考图像列表。
需要说明的是,解码端从码流解析得到的待解码视频帧的参考帧的图像顺序值POC,根据图像顺序值POC,从解码图像缓存构建第一参考图像列表L0和第二参考图像列表L1,第一参考图像列表和第二参考图像列表构成了参考图像列表。参考图像列表中包含第一解码图像与第二解码图像,第一解码图像对应isInsertFlag标识符的值等于0的图像,第二解码图像对应isInsertFlag标识符的值等于1的图像。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S602:此时解码图像缓存中若存在isInsertFlag等于1的图像,且对应POC号在L0、L1中出现,则将该图像替换原本isInsertFlag等于0的参考图像。
步骤S603:以CU为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S604:整帧预测结束之后,进行LMCS步骤。
步骤S605:按照码流解析的结果对LMCS后的重建图像中对应的块进行传统滤波。同时将传统滤波后的结果存储在解码图像缓存中,记录待解码视频帧对应的POC号,并且将isInsertFlag置为1。
步骤S606:按照码流解析的结果对LMCS后的重建图像中对应的块进行NN滤波。
步骤S607:按照码流解析的结果对NN和传统滤波进行适配。
步骤S608:对上述过程的重建帧进行ALF滤波操作。
步骤S609:已解码图像被标记为“短期参考帧”存入解码图像缓存中,并且将isInsertFlag置为0。
示例三:
由于高时间层作为参考帧的几率相比于低时间层要小得多,所以高时间层存储DBF后的重建帧的提升没有低时间层那么大。为了节省存储开销,本示例仅在低时间层使用。
步骤S701:构建参考图像列表。
需要说明的是,解码端从码流解析得到的待解码视频帧的参考帧的图像顺序值POC,根据图像顺序值POC,从解码图像缓存构建第一参考图像列表L0和第二参考图像列表L1,第一参考图像列表和第二参考图像列表构成了参考图像列表。参考图像列表中包含第一解码图像与第二解码图像,第一解码图像对应isInsertFlag标识符的值等于0的图像,第二解码图像对应isInsertFlag标识符的值等于1的图像。
步骤S702:此时若当前时间层TL小于阈值T(T∈[0,5],可根据需求自行设置),则判断解码图像缓存中是否存在isInsertFlag等于1的图像,且对应POC号在L0、L1中出现,则将该图像插入到对应参考图像列表的后面。
步骤S703:以CU为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S704:整帧预测结束之后,进行LMCS步骤。
步骤S705:按照码流解析的结果对LMCS后的重建图像中对应的块进行传统滤波。同时将传统滤波后的结果存储在解码图像缓存构中,记录待解码视频帧对应的POC号,并且将isInsertFlag置为1。
步骤S706:按照码流解析的结果对LMCS后的重建图像中对应的块进行NN滤波。
步骤S707:按照码流解析的结果对NN和传统滤波进行适配。
步骤S708:对上述过程的重建帧进行ALF滤波操作。
步骤S709:已解码图像被标记为“短期参考帧”存入解码图像缓存中,并且将isInsertFlag置为0。
示例四:
由于色度传统滤波上的性能相比NN后的结果差很多,因此不适合使用传统滤波后的色度作为参考帧。因此本示例仅将传统滤波后的亮度分量存入解码图像缓存,色度分量保持为NN与传统适配后的结果。
图13是本示例提供的将亮度分量存入解码图像缓存的流程图。如图所示,至少包括以下步骤:
步骤S801:构建参考图像列表。
需要说明的是,解码端从码流解析得到的待解码视频帧的参考帧的图像顺序值POC,根据图像顺序值POC,从解码图像缓存构建第一参考图像列表L0和第二参考图像列表L1,第一参考图像列表和第二参考图像列表构成了参考图像列表。参考图像列表中包含第一解码图像与第二解码图像,第一解码图像对应isInsertFlag标识符的值等于0的图像,第二解码图像对应isInsertFlag标识符的值等于1的图像。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S802:此时解码图像缓存中若存在isInsertFlag等于1的图像,且对应POC号在L0、L1中出现,则将该图像插入到对应参考图像列表的后面。
步骤S803:以CU为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S804:整帧预测结束之后,进行LMCS步骤。
步骤S805:按照码流解析的结果对LMCS后的重建图像中对应的块进行传统滤波。同时将传统滤波后中亮度分量的结果存储在解码图像缓存中,记录待解码视频帧对应的POC号,并且将isInsertFlag置为1。
步骤S806:按照码流解析的结果对LMCS后的重建图像中对应的块进行NN滤波。
步骤S807:按照码流解析的结果对NN和传统滤波进行适配。
步骤S808:对上述过程的重建帧进行ALF滤波操作。将ALF后,色度分量的结果存储到步骤S805中解码图像缓存中待解码视频帧对应的POC号,并且将isInsertFlag置为1。
步骤S809:已解码图像被标记为“短期参考帧”存入解码图像缓存中,并且将isInsertFlag置为0。
示例五:
图14是本示例提供的视频解码的流程图。如图所示,至少包括以下步骤:
步骤S901:构建参考图像列表。
需要说明的是,解码端从码流解析得到的待解码视频帧的参考帧的图像顺序值POC,根据图像顺序值POC,从解码图像缓存构建第一参考图像列表L0和第二参考图像列表L1,第一参考图像列表和第二参考图像列表构成了参考图像列表。参考图像列表中包含第一解码图像与第二解码图像,第一解码图像对应isInsertFlag标识符的值等于0的图像,第二解码图像对应isInsertFlag标识符的值等于1的图像。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S902:此时解码图像缓存中若存在isInsertFlag等于1的图像,且对应POC号在L0、L1中出现,则将该图像插入到对应参考图像列表的后面。
步骤S903:以CU为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S904:整帧预测结束之后,进行LMCS步骤。
步骤S905:对LMCS后的重建图像进行DBF滤波。同时将传统滤波后的结果存储在解码图像缓存中,记录待解码视频帧对应的POC号,并且将isInsertFlag置为1。
步骤S906:对解码图像缓存后的重建图像进行SAO滤波。
步骤S907:进行ALF滤波操作。
步骤S908:已解码图像被标记为“短期参考帧”存入解码图像缓存中,并且将isInsertFlag置为0。
示例六:
当视频编码过程中采用了多个神经网络滤波器时,可以将不同NN滤波修正后的图像作为后续帧的参考帧,增加参考帧的多样性。
图15是本示例提供的视频解码的流程图。如图所示,至少包括以下步骤:
步骤S1001:构建参考图像列表。
需要说明的是,解码端从码流解析得到的待解码视频帧的参考帧的图像顺序值POC,根据图像顺序值POC,从解码图像缓存构建第一参考图像列表L0和第二参考图像列表L1,第一参考图像列表和第二参考图像列表构成了参考图像列表。参考图像列表中包含第一解码图像与第二解码图像,第一解码图像对应isInsertFlag标识符的值等于0的图像,第二解码图像对应isInsertFlag标识符的值等于1的图像。其中参考帧可以标记为“短期参考帧”,“非参考帧”或“长期参考帧”。
步骤S1002:此时解码图像缓存若存在isInsertFlag等于1的图像,且对应POC号在L0、
L1中出现,则将该图像插入到对应参考图像列表的后面。
步骤S1003:以CU为单位,根据码流解析的信息进行预测、变换量化、反量化反变换得到重建块。
步骤S1004:整帧预测结束之后,进行LMCS步骤。
步骤S1005:按照码流解析的结果对LMCS后的重建图像中对应的块进行神经网络滤波,此时采用第一神经网络滤波器NN1。同时将NN1滤波后的结果存储在解码图像缓存中,记录待解码视频帧对应的POC号,并且将isInsertFlag置为1。
步骤S1006:按照码流解析的结果对LMCS后的重建图像中对应的块进行神经网络滤波,此时采用第二神经网络滤波器NN2。
步骤S1007:按照码流解析的结果对多个神经网络滤波结果进行适配。
步骤S1008:对上述过程的重建帧进行ALF滤波操作
步骤S1009:已解码图像被标记为“短期参考帧”存入解码图像缓存中,并且将isInsertFlag置为0。
图16是本申请一实施例提供的视频处理设备的结构示意图。如图16所示,该视频处理设备2000包括存储器2100、处理器2200。存储器2100、处理器2200的数量可以是一个或多个,图16中以一个存储器2100和一个处理器2200为例,存储器2100和处理器2200可以通过总线或其他方式连接。
存储器2100作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请任一实施例提供的方法对应的程序指令/模块。处理器2200通过运行存储在存储器2100中的软件程序、指令以及模块实现上述方法。
存储器2100可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序。此外,存储器2100可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件或其他非易失性固态存储器件。在一些实例中,存储器2100进一步包括相对于处理器2200远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
本申请一实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,该计算机可执行指令用于执行如本申请任一实施例提供的视频处理设备方法。
本申请一实施例还提供了一种计算机程序产品,包括计算机程序或计算机指令,该计算机程序或计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取计算机程序或计算机指令,处理器执行计算机程序或计算机指令,使得计算机设备执行如本申请任一实施例提供的视频处理设备方法。
本申请实施例描述的系统架构以及应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着系统架构的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、设备中
的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。
在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程或执行线程中,部件可位于一个计算机上或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自于自与本地系统、分布式系统或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地或远程进程来通信。
以上参照附图说明了本申请的一些实施例,并非因此局限本申请的权利范围。本领域技术人员不脱离本申请的范围和实质内所作的任何修改、等同替换和改进,均应在本申请的权利范围之内。
Claims (15)
- 一种视频解码方法,包括:获取待解码视频帧的参考帧信息;根据所述参考帧信息和补充帧信息,得到所述待解码视频帧的参考图像列表;根据所述参考图像列表解码待解码视频帧,得到所述待解码视频帧的第一重建图像和已解码图像。
- 根据权利要求1所述的方法,还包括:将所述待解码视频帧的第一重建图像和所述已解码图像存储于解码图像缓存中,并对所述待解码视频帧的第一重建图像和所述已解码图像配置补充帧信息。
- 根据权利要求1或2所述的方法,其中:所述第一重建图像和所述已解码图像不同;所述第一重建图像不经环路滤波处理,或,所述第一重建图像与所述已解码图像经过不同的环路滤波处理。
- 根据权利要求3所述的方法,其中,所述环路滤波处理至少包括以下之一:去块环路滤波DBF;样本自适应补偿环路滤波SAO;基于色度缩放的亮度映射LMCS;基于神经网络的环路滤波NNF。
- 根据权利要求1、2、4中任一项所述的方法,其中,所述获取待解码视频帧的参考帧信息包括:解析视频编码码流,得到所述待解码视频帧对应的参考帧信息,其中,所述参考帧信息包括至少一个所述参考帧的图像顺序值。
- 根据权利要求5所述的方法,其中,所述根据所述参考帧信息和所述补充帧信息,得到所述待解码视频帧的参考图像列表,包括:根据所述参考帧的图像顺序值,得到第一参考图像列表和第二参考图像列表。
- 根据权利要求6所述的方法,其中:所述第一参考图像列表包括第一解码图像;所述第二参考图像列表包括第一解码图像;其中,所述第一解码图像的图像顺序值等于所述参考帧的图像顺序值之一,且所述第一解码图像的补充帧信息等于第一补充帧信息。
- 根据权利要求7所述的方法,其中:所述第一参考图像列表还包括第二解码图像;和/或所述第二参考图像列表还包括第二解码图像;其中,所述第二解码图像的图像顺序值等于所述第一解码图像的图像顺序值,且所述第二解码图像的补充帧信息等于第二补充帧信息。
- 根据权利要求8所述的方法,其中,所述第二解码图像被配置为至少以下之一:位于所述第一参考图像列表或第二参考图像列表的首位;位于所述第一参考图像列表或第二参考图像列表的末尾;位于所述第一参考图像列表或第二参考图像列表的次末尾;第二解码图像替换第一解码图像;根据视频编码码流的指示信息,配置到指定位置。
- 根据权利要求3所述的方法,其中,所述将所述重建图像组存储于图像缓存中,并对所述重建图像配置对应的参考子帧信息,包括:将所述第一重建图像和所述已解码图像存储于解码图像缓存中,并对第一重建图像配置第一补充帧信息,对已解码图像配置第二补充帧信息。
- 根据权利要求2或10所述的方法,还包括:根据所述第一重建图像的亮度分量和所述已解码图像的色度分量,得到第三重建图像;将第三重建图像存储于图像缓存中,对第三重建图像配置第二补充帧信息。
- 一种视频解码方法,应用于包括时间层标识的视频帧,所述方法包括:在待解码视频帧时间层标识小于预设阈值时,对所述待解码视频帧执行如权利要求1至11中任意一项所述的视频解码方法。
- 一种视频处理设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至12中任意一项所述的视频解码方法。
- 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至12中任意一项的所述视频解码方法。
- 一种计算机程序产品,包括计算机程序或计算机指令,所述计算机程序或所述计算机 指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序或所述计算机指令,所述处理器执行所述计算机程序或所述计算机指令,使得所述计算机设备执行如权利要求1至12中任意一项所述的视频解码方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211263037.4A CN117939160A (zh) | 2022-10-14 | 2022-10-14 | 一种视频解码方法、视频处理设备、介质及产品 |
CN202211263037.4 | 2022-10-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024078148A1 true WO2024078148A1 (zh) | 2024-04-18 |
Family
ID=90668697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/114456 WO2024078148A1 (zh) | 2022-10-14 | 2023-08-23 | 一种视频解码方法、视频处理设备、介质及产品 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117939160A (zh) |
WO (1) | WO2024078148A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118338003A (zh) * | 2024-06-12 | 2024-07-12 | 北京欣博电子科技有限公司 | 视频解码方法、装置、计算机设备、可读存储介质和程序产品 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050157794A1 (en) * | 2004-01-16 | 2005-07-21 | Samsung Electronics Co., Ltd. | Scalable video encoding method and apparatus supporting closed-loop optimization |
US20140286392A1 (en) * | 2011-11-09 | 2014-09-25 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding image by using adaptive loop filter on frequency domain using conversion |
KR20160146591A (ko) * | 2015-06-11 | 2016-12-21 | 인텔렉추얼디스커버리 주식회사 | 적응적인 디블록킹 필터링에 관한 부호화/복호화 방법 및 장치 |
CN108769682A (zh) * | 2018-06-20 | 2018-11-06 | 腾讯科技(深圳)有限公司 | 视频编码、解码方法、装置、计算机设备和存储介质 |
CN110199521A (zh) * | 2016-12-23 | 2019-09-03 | 华为技术有限公司 | 用于有损视频编码的低复杂度混合域协同环内滤波器 |
US20220295116A1 (en) * | 2019-09-20 | 2022-09-15 | Intel Corporation | Convolutional neural network loop filter based on classifier |
-
2022
- 2022-10-14 CN CN202211263037.4A patent/CN117939160A/zh active Pending
-
2023
- 2023-08-23 WO PCT/CN2023/114456 patent/WO2024078148A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050157794A1 (en) * | 2004-01-16 | 2005-07-21 | Samsung Electronics Co., Ltd. | Scalable video encoding method and apparatus supporting closed-loop optimization |
US20140286392A1 (en) * | 2011-11-09 | 2014-09-25 | Sk Telecom Co., Ltd. | Method and apparatus for encoding/decoding image by using adaptive loop filter on frequency domain using conversion |
KR20160146591A (ko) * | 2015-06-11 | 2016-12-21 | 인텔렉추얼디스커버리 주식회사 | 적응적인 디블록킹 필터링에 관한 부호화/복호화 방법 및 장치 |
CN110199521A (zh) * | 2016-12-23 | 2019-09-03 | 华为技术有限公司 | 用于有损视频编码的低复杂度混合域协同环内滤波器 |
CN108769682A (zh) * | 2018-06-20 | 2018-11-06 | 腾讯科技(深圳)有限公司 | 视频编码、解码方法、装置、计算机设备和存储介质 |
US20220295116A1 (en) * | 2019-09-20 | 2022-09-15 | Intel Corporation | Convolutional neural network loop filter based on classifier |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118338003A (zh) * | 2024-06-12 | 2024-07-12 | 北京欣博电子科技有限公司 | 视频解码方法、装置、计算机设备、可读存储介质和程序产品 |
Also Published As
Publication number | Publication date |
---|---|
CN117939160A (zh) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7483035B2 (ja) | ビデオ復号方法並びにその、ビデオ符号化方法、装置、コンピュータデバイス及びコンピュータプログラム | |
TWI847998B (zh) | 用加權平均進行區塊式雙向預測 | |
US10390015B2 (en) | Unification of parameters derivation procedures for local illumination compensation and cross-component linear model prediction | |
JP6716611B2 (ja) | スライスレベルのイントラブロックコピーおよび他のビデオコーディングの改善 | |
KR102545525B1 (ko) | 참조 화상 리스트 변경 정보를 조건부로 시그널링하는 기법 | |
RU2613738C2 (ru) | Сигнализация информации состояния для буфера декодированных картинок и списков опорных картинок | |
US9414086B2 (en) | Partial frame utilization in video codecs | |
US9094690B2 (en) | Inter-prediction method and video encoding/decoding method using the inter-prediction method | |
EP4246975A1 (en) | Video decoding method and apparatus, video coding method and apparatus, and device | |
GB2501125A (en) | Providing adaptation parameters to a decoder by including an identifier to a relevant characteristic set in a bit stream portion. | |
US20160073135A1 (en) | Method and apparatus for video error concealment using reference frame selection rules | |
KR20160070808A (ko) | 서브-레이어 참조 예측 종속성에 기초한 인터-레이어 rps 도출을 위한 시스템들 및 방법들 | |
US11496754B2 (en) | Video encoder, video decoder, and corresponding method of predicting random access pictures | |
WO2024078148A1 (zh) | 一种视频解码方法、视频处理设备、介质及产品 | |
US20150350646A1 (en) | Adaptive syntax grouping and compression in video data | |
CN113259671B (zh) | 视频编解码中的环路滤波方法、装置、设备及存储介质 | |
US11622105B2 (en) | Adaptive block update of unavailable reference frames using explicit and implicit signaling | |
KR101366288B1 (ko) | 비디오 신호의 디코딩 방법 및 장치 | |
US12114006B2 (en) | Method and apparatus for constructing motion information list in video encoding and decoding and device | |
CN106686380B (zh) | 采用基于多块的流水线的增强型数据处理设备及操作方法 | |
CN113938679B (zh) | 图像类型的确定方法、装置、设备及存储介质 | |
KR20240072202A (ko) | 비디오 코딩 및 디코딩 | |
CN115866297A (zh) | 视频处理方法、装置、设备及存储介质 | |
CN114727116A (zh) | 编码方法及装置 | |
CN117616751A (zh) | 动态图像组的视频编解码 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23876353 Country of ref document: EP Kind code of ref document: A1 |