WO2023039859A1 - 视频编解码方法、设备、系统、及存储介质 - Google Patents
视频编解码方法、设备、系统、及存储介质 Download PDFInfo
- Publication number
- WO2023039859A1 WO2023039859A1 PCT/CN2021/119164 CN2021119164W WO2023039859A1 WO 2023039859 A1 WO2023039859 A1 WO 2023039859A1 CN 2021119164 W CN2021119164 W CN 2021119164W WO 2023039859 A1 WO2023039859 A1 WO 2023039859A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- current block
- characteristic information
- intra
- prediction mode
- flag
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 215
- 230000006870 function Effects 0.000 claims description 92
- 230000004913 activation Effects 0.000 claims description 82
- 241000023320 Luma <angiosperm> Species 0.000 claims description 75
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 claims description 75
- 230000008569 process Effects 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 29
- 238000012549 training Methods 0.000 claims description 24
- 230000014509 gene expression Effects 0.000 claims description 17
- 238000009795 derivation Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 26
- 238000013139 quantization Methods 0.000 description 26
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 21
- 238000001914 filtration Methods 0.000 description 13
- 239000011159 matrix material Substances 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 11
- 238000012216 screening Methods 0.000 description 11
- 230000009466 transformation Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 8
- 238000007906 compression Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 7
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
Definitions
- the present application relates to the technical field of video coding and decoding, and in particular to a video coding and decoding method, device, system, and storage medium.
- Digital video technology can be incorporated into a variety of video devices, such as digital televisions, smartphones, computers, e-readers, or video players, among others.
- video devices implement video compression technology to enable more effective transmission or storage of video data.
- Prediction methods include inter-frame prediction and intra-frame prediction, wherein the intra-frame prediction is to predict the current block based on the decoded adjacent blocks in the same frame image.
- the current intra-frame prediction modes are all based on the reconstructed values around the current block to predict the predicted value of the current block. exact question.
- the embodiment of the present application provides a video encoding and decoding method, device, system, and storage medium, and proposes an intra-frame prediction mode based on an autoencoder. When the correlation is not large, an accurate prediction of the current block can be achieved.
- the present application provides a video coding method, including:
- the intra prediction mode of the current block is an intra prediction mode based on an autoencoder, decoding the code stream to obtain feature information of the current block;
- the embodiment of the present application provides a video decoding method, including:
- N is a positive integer
- the N first intra-frame prediction modes include the intra-frame based on the self-encoder predictive model
- the intra prediction mode of the current block is an intra prediction mode based on an autoencoder, then obtain an autoencoder corresponding to the current block, and the autoencoder includes an encoding network and a decoding network;
- the present application provides a video encoder, configured to execute the method in the above first aspect or various implementations thereof.
- the encoder includes a functional unit configured to execute the method in the above first aspect or its implementations.
- the present application provides a video decoder, configured to execute the method in the above second aspect or various implementations thereof.
- the decoder includes a functional unit configured to execute the method in the above second aspect or its various implementations.
- a video encoder including a processor and a memory.
- the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above first aspect or its various implementations.
- a sixth aspect provides a video decoder, including a processor and a memory.
- the memory is used to store a computer program
- the processor is used to call and run the computer program stored in the memory, so as to execute the method in the above second aspect or its various implementations.
- a video codec system including a video encoder and a video decoder.
- the video encoder is configured to execute the method in the above first aspect or its various implementations
- the video decoder is configured to execute the method in the above second aspect or its various implementations.
- the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes any one of the above-mentioned first to second aspects or any of the implementations thereof. method.
- a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
- a computer program product including computer program instructions, the computer program instructions cause a computer to execute any one of the above first to second aspects or the method in each implementation manner.
- a computer program which, when running on a computer, causes the computer to execute any one of the above-mentioned first to second aspects or the method in each implementation manner thereof.
- a code stream is provided, and the code stream is generated based on the method in the first aspect above.
- the decoding end determines the intra prediction mode of the current block by decoding the code stream; if the intra prediction mode of the current block is the intra prediction mode based on the self-encoder , then decode the code stream to obtain the characteristic information of the current block; obtain the pixel value of the reconstructed pixel points around the current block; input the characteristic information of the current block and the pixel value of the reconstructed pixel points around the current block into the corresponding self
- this application adds an intra-frame prediction mode based on an autoencoder to provide more options for intra-frame prediction.
- the prediction block of the current block is determined according to the characteristic information of the current block and the pixel values of the reconstructed pixels around the current block. For the current fast original When the correlation between the value and the reconstructed value around the current block is not large, since the prediction not only considers the pixel value of the reconstructed pixel around the current block, but also considers the feature information of the current block, the current block can be realized Accurate predictions to improve the accuracy of intra predictions.
- FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application
- Fig. 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application
- Fig. 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
- FIG. 4 is a schematic diagram of reference pixels involved in the embodiment of the present application.
- FIG. 5 is a schematic diagram of 35 intra prediction modes of HEVC
- FIG. 6 is a schematic diagram of 67 intra prediction modes of VVC
- FIG. 7 is a schematic diagram of a MIP intra prediction mode
- FIG. 8 is a schematic diagram of a network structure of an autoencoder involved in an embodiment of the present application.
- FIG. 9A is a schematic diagram of a curve of an activation function involved in the embodiment of the present application.
- FIG. 9B is another schematic diagram of the activation function involved in the embodiment of the present application.
- FIG. 10 is a schematic flowchart of a video decoding method provided in an embodiment of the present application.
- FIG. 11 is a schematic diagram of a prediction involved in the embodiment of the present application.
- FIG. 12 is another schematic flowchart of a video decoding method provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of a prediction involved in the embodiment of the present application.
- FIG. 14 is another schematic flowchart of a video decoding method provided by an embodiment of the present application.
- FIG. 15 is a schematic diagram of a prediction involved in the embodiment of the present application.
- FIG. 16 is another schematic flowchart of a video decoding method provided in an embodiment of the present application.
- FIG. 17 is a schematic diagram of a prediction involved in the embodiment of the present application.
- FIG. 18 is a schematic flowchart of a video encoding method provided in an embodiment of the present application.
- Fig. 19 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
- Fig. 20 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
- Fig. 21 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
- Fig. 22 is a schematic block diagram of a video codec system provided by an embodiment of the present application.
- the application can be applied to the field of image codec, video codec, hardware video codec, dedicated circuit video codec, real-time video codec, etc.
- the solution of the present application can be combined with audio and video coding standards (audio video coding standard, referred to as AVS), for example, H.264/audio video coding (audio video coding, referred to as AVC) standard, H.265/high efficiency video coding ( High efficiency video coding (HEVC for short) standard and H.266/versatile video coding (VVC for short) standard.
- the solutions of the present application may operate in conjunction with other proprietary or industry standards, including ITU-TH.261, ISO/IECMPEG-1Visual, ITU-TH.262 or ISO/IECMPEG-2Visual, ITU-TH.263 , ISO/IECMPEG-4Visual, ITU-TH.264 (also known as ISO/IECMPEG-4AVC), including scalable video codec (SVC) and multi-view video codec (MVC) extensions.
- SVC scalable video codec
- MVC multi-view video codec
- FIG. 1 is a schematic block diagram of a video encoding and decoding system involved in an embodiment of the present application. It should be noted that FIG. 1 is only an example, and the video codec system in the embodiment of the present application includes but is not limited to what is shown in FIG. 1 .
- the video codec system 100 includes an encoding device 110 and a decoding device 120 .
- the encoding device is used to encode (can be understood as compression) the video data to generate a code stream, and transmit the code stream to the decoding device.
- the decoding device decodes the code stream generated by the encoding device to obtain decoded video data.
- the encoding device 110 in the embodiment of the present application can be understood as a device having a video encoding function
- the decoding device 120 can be understood as a device having a video decoding function, that is, the embodiment of the present application includes a wider range of devices for the encoding device 110 and the decoding device 120, Examples include smartphones, desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, and the like.
- the encoding device 110 may transmit the encoded video data (such as code stream) to the decoding device 120 via the channel 130 .
- Channel 130 may include one or more media and/or devices capable of transmitting encoded video data from encoding device 110 to decoding device 120 .
- channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded video data directly to decoding device 120 in real-time.
- encoding device 110 may modulate the encoded video data according to a communication standard and transmit the modulated video data to decoding device 120 .
- the communication medium includes a wireless communication medium, such as a radio frequency spectrum.
- the communication medium may also include a wired communication medium, such as one or more physical transmission lines.
- the channel 130 includes a storage medium that can store video data encoded by the encoding device 110 .
- the storage medium includes a variety of local access data storage media, such as optical discs, DVDs, flash memory, and the like.
- the decoding device 120 may acquire encoded video data from the storage medium.
- channel 130 may include a storage server that may store video data encoded by encoding device 110 .
- the decoding device 120 may download the stored encoded video data from the storage server.
- the storage server may store the encoded video data and may transmit the encoded video data to the decoding device 120, such as a web server (eg, for a website), a file transfer protocol (FTP) server, and the like.
- FTP file transfer protocol
- the encoding device 110 includes a video encoder 112 and an output interface 113 .
- the output interface 113 may include a modulator/demodulator (modem) and/or a transmitter.
- the encoding device 110 may include a video source 111 in addition to the video encoder 112 and the input interface 113 .
- the video source 111 may include at least one of a video capture device (for example, a video camera), a video archive, a video input interface, a computer graphics system, wherein the video input interface is used to receive video data from a video content provider, and the computer graphics system Used to generate video data.
- a video capture device for example, a video camera
- a video archive for example, a video archive
- a video input interface for example, a video archive
- video input interface for example, a video input interface
- computer graphics system used to generate video data.
- the video encoder 112 encodes the video data from the video source 111 to generate a code stream.
- Video data may include one or more pictures or a sequence of pictures.
- the code stream contains the encoding information of an image or image sequence in the form of a bit stream.
- Encoding information may include encoded image data and associated data.
- the associated data may include a sequence parameter set (SPS for short), a picture parameter set (PPS for short) and other syntax structures.
- SPS sequence parameter set
- PPS picture parameter set
- An SPS may contain parameters that apply to one or more sequences.
- a PPS may contain parameters applied to one or more images.
- the syntax structure refers to a set of zero or more syntax elements arranged in a specified order in the code stream.
- the video encoder 112 directly transmits encoded video data to the decoding device 120 via the output interface 113 .
- the encoded video data can also be stored on a storage medium or a storage server for subsequent reading by the decoding device 120 .
- the decoding device 120 includes an input interface 121 and a video decoder 122 .
- the decoding device 120 may include a display device 123 in addition to the input interface 121 and the video decoder 122 .
- the input interface 121 includes a receiver and/or a modem.
- the input interface 121 can receive encoded video data through the channel 130 .
- the video decoder 122 is used to decode the encoded video data to obtain decoded video data, and transmit the decoded video data to the display device 123 .
- the display device 123 displays the decoded video data.
- the display device 123 may be integrated with the decoding device 120 or external to the decoding device 120 .
- the display device 123 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- plasma display a plasma display
- OLED organic light emitting diode
- FIG. 1 is only an example, and the technical solutions of the embodiments of the present application are not limited to FIG. 1 .
- the technology of the present application may also be applied to one-sided video encoding or one-sided video decoding.
- Fig. 2 is a schematic block diagram of a video encoder involved in an embodiment of the present application. It should be understood that the video encoder 200 can be used to perform lossy compression on images, and can also be used to perform lossless compression on images.
- the lossless compression may be visually lossless compression or mathematically lossless compression.
- the video encoder 200 can be applied to image data in luminance-chrominance (YCbCr, YUV) format.
- the YUV ratio can be 4:2:0, 4:2:2 or 4:4:4, Y means brightness (Luma), Cb (U) means blue chroma, Cr (V) means red chroma, U and V are expressed as chroma (Chroma) for describing color and saturation.
- 4:2:0 means that every 4 pixels have 4 luminance components
- 2 chroma components YYYYCbCr
- 4:2:2 means that every 4 pixels have 4 luminance components
- 4 Chroma component YYYYCbCrCbCr
- 4:4:4 means full pixel display (YYYYCbCrCbCrCbCrCbCr).
- the video encoder 200 reads video data, and divides a frame of image into several coding tree units (coding tree units, CTUs) for each frame of image in the video data.
- CTB can be called “Tree block", “Largest Coding unit” (LCU for short) or “coding tree block” (CTB for short).
- LCU Large Coding unit
- CTB coding tree block
- Each CTU may be associated with a pixel block of equal size within the image. Each pixel may correspond to one luminance (luma) sample and two chrominance (chrominance or chroma) samples.
- each CTU may be associated with one block of luma samples and two blocks of chroma samples.
- a CTU size is, for example, 128 ⁇ 128, 64 ⁇ 64, 32 ⁇ 32 and so on.
- a CTU can be further divided into several coding units (Coding Unit, CU) for coding, and the CU can be a rectangular block or a square block.
- the CU can be further divided into a prediction unit (PU for short) and a transform unit (TU for short), so that coding, prediction, and transformation are separated, and processing is more flexible.
- a CTU is divided into CUs in a quadtree manner, and a CU is divided into TUs and PUs in a quadtree manner.
- the video encoder and video decoder can support various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, video encoders and video decoders may support 2N ⁇ 2N or N ⁇ N PU sizes for intra prediction, and support 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, NxN or similarly sized symmetric PUs for inter prediction. The video encoder and video decoder may also support asymmetric PUs of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
- the video encoder 200 may include: a prediction unit 210, a residual unit 220, a transform/quantization unit 230, an inverse transform/quantization unit 240, a reconstruction unit 250, and a loop filter unit 260. Decoded image cache 270 and entropy encoding unit 280. It should be noted that the video encoder 200 may include more, less or different functional components.
- the current block may be called a current coding unit (CU) or a current prediction unit (PU).
- a predicted block may also be called a predicted image block or an image predicted block, and a reconstructed image block may also be called a reconstructed block or an image reconstructed image block.
- the prediction unit 210 includes an inter prediction unit 211 and an intra estimation unit 212 . Because there is a strong correlation between adjacent pixels in a video frame, the intra-frame prediction method is used in video coding and decoding technology to eliminate the spatial redundancy between adjacent pixels. Due to the strong similarity between adjacent frames in video, the inter-frame prediction method is used in video coding and decoding technology to eliminate time redundancy between adjacent frames, thereby improving coding efficiency.
- the inter-frame prediction unit 211 can be used for inter-frame prediction.
- the inter-frame prediction can refer to image information of different frames.
- the inter-frame prediction uses motion information to find a reference block from the reference frame, and generates a prediction block according to the reference block to eliminate temporal redundancy;
- Frames used for inter-frame prediction may be P frames and/or B frames, P frames refer to forward predictive frames, and B frames refer to bidirectional predictive frames.
- the motion information includes the reference frame list where the reference frame is located, the reference frame index, and the motion vector.
- the motion vector can be an integer pixel or a sub-pixel. If the motion vector is sub-pixel, then it is necessary to use interpolation filtering in the reference frame to make the required sub-pixel block.
- the reference frame found according to the motion vector A block of whole pixels or sub-pixels is called a reference block.
- Some technologies will directly use the reference block as a prediction block, and some technologies will further process the reference block to generate a prediction block. Reprocessing and generating a prediction block based on a reference block can also be understood as taking the reference block as a prediction block and then processing and generating a new prediction block based on the prediction block.
- the intra-frame estimation unit 212 only refers to the information of the same frame of image to predict the pixel information in the current code image block for eliminating spatial redundancy.
- a frame used for intra prediction may be an I frame.
- the white 4 ⁇ 4 block is the current block
- the gray pixels on the left row and upper column of the current block are reference pixels of the current block, and the intra prediction uses these reference pixels to predict the current block.
- These reference pixels may all be available, that is, all have been encoded and decoded. Some parts may also be unavailable, for example, the current block is the leftmost of the whole frame, then the reference pixel on the left of the current block is unavailable.
- the lower left part of the current block has not been encoded and decoded, so the reference pixel at the lower left is also unavailable.
- the available reference pixel or some value or some method can be used for filling, or no filling is performed.
- H.264/AVC has 8 angle prediction modes and 1 non-angle prediction mode
- H.265/HEVC extends to 33 angles forecast mode and 2 non-angle forecast modes.
- the intra prediction modes used by HEVC include planar mode (Planar), DC and 33 angle modes, a total of 35 prediction modes.
- the intra-frame modes used by VVC include Planar, DC and 65 angle modes, with a total of 67 prediction modes.
- a prediction matrix (Matrix based intra prediction, MIP) prediction mode for the brightness component, there is a prediction matrix (Matrix based intra prediction, MIP) prediction mode based on training, and for the chroma component, there is a CCLM prediction mode.
- MIP will select W reconstructed pixels in the row above the block and H reconstructed pixels in the left column as input . If the pixels at these positions have not been reconstructed, the pixels at the unreconstructed positions will be set to a default value, for example, for a 10bit pixel, the default filling value is 512.
- MIP generates predicted values mainly based on three steps, which are reference pixel averaging, matrix-vector multiplication, and linear interpolation upsampling.
- MIP works on blocks with a size of 4x4 to 64x64.
- the MIP mode will select the appropriate prediction matrix according to the length of the rectangle side; for a rectangle with a short side of 4, there are 16 sets of matrix parameters to choose from; for a short side For the rectangle of 8, there are 8 sets of matrix parameters for selection; for other rectangles, there are 6 sets of matrix parameters for selection.
- MIP will use the optional matrix for prediction, and the index of the matrix with the lowest cost will be encoded into the code stream for the decoder to read and modify the matrix parameters for prediction.
- the intra-frame prediction will be more accurate, and it will be more in line with the demand for the development of high-definition and ultra-high-definition digital video.
- the residual unit 220 may generate a residual block of the CU based on the pixel blocks of the CU and the prediction blocks of the PUs of the CU. For example, residual unit 220 may generate a residual block for a CU such that each sample in the residual block has a value equal to the difference between the samples in the pixel blocks of the CU, and the samples in the PUs of the CU. Corresponding samples in the predicted block.
- Transform/quantization unit 230 may quantize the transform coefficients. Transform/quantization unit 230 may quantize transform coefficients associated with TUs of a CU based on quantization parameter (QP) values associated with the CU. Video encoder 200 may adjust the degree of quantization applied to transform coefficients associated with a CU by adjusting the QP value associated with the CU.
- QP quantization parameter
- Inverse transform/quantization unit 240 may apply inverse quantization and inverse transform to the quantized transform coefficients, respectively, to reconstruct a residual block from the quantized transform coefficients.
- the reconstruction unit 250 may add samples of the reconstructed residual block to corresponding samples of one or more prediction blocks generated by the prediction unit 210 to generate a reconstructed image block associated with the TU. By reconstructing the sample blocks of each TU of the CU in this way, the video encoder 200 can reconstruct the pixel blocks of the CU.
- Loop filtering unit 260 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
- the loop filtering unit 260 includes a deblocking filtering unit and a sample adaptive compensation/adaptive loop filtering (SAO/ALF) unit, wherein the deblocking filtering unit is used for deblocking, and the SAO/ALF unit Used to remove ringing effects.
- SAO/ALF sample adaptive compensation/adaptive loop filtering
- the decoded image buffer 270 may store reconstructed pixel blocks.
- Inter prediction unit 211 may use reference pictures containing reconstructed pixel blocks to perform inter prediction on PUs of other pictures.
- intra estimation unit 212 may use the reconstructed pixel blocks in decoded picture cache 270 to perform intra prediction on other PUs in the same picture as the CU.
- Entropy encoding unit 280 may receive the quantized transform coefficients from transform/quantization unit 230 . Entropy encoding unit 280 may perform one or more entropy encoding operations on the quantized transform coefficients to generate entropy encoded data.
- Fig. 3 is a schematic block diagram of a video decoder involved in an embodiment of the present application.
- the video decoder 300 includes: an entropy decoding unit 310 , a prediction unit 320 , an inverse quantization/transformation unit 330 , a reconstruction unit 340 , a loop filter unit 350 and a decoded image buffer 360 . It should be noted that the video decoder 300 may include more, less or different functional components.
- the video decoder 300 can receive code streams.
- the entropy decoding unit 310 may parse the codestream to extract syntax elements from the codestream. As part of parsing the codestream, the entropy decoding unit 310 may parse the entropy-encoded syntax elements in the codestream.
- the prediction unit 320 , the inverse quantization/transformation unit 330 , the reconstruction unit 340 and the loop filter unit 350 can decode video data according to the syntax elements extracted from the code stream, that is, generate decoded video data.
- the prediction unit 320 includes an intra estimation unit 321 and an inter prediction unit 322 .
- Intra estimation unit 321 may perform intra prediction to generate a predictive block for a PU. Intra estimation unit 321 may use an intra prediction mode to generate a prediction block for a PU based on pixel blocks of spatially neighboring PUs. Intra estimation unit 321 may also determine the intra prediction mode of the PU from one or more syntax elements parsed from the codestream.
- the inter prediction unit 322 may construct a first reference picture list (list 0) and a second reference picture list (list 1) according to the syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 310 may parse the motion information for the PU. Inter prediction unit 322 may determine one or more reference blocks for the PU according to the motion information of the PU. Inter prediction unit 322 may generate a predictive block for the PU from one or more reference blocks for the PU.
- Inverse quantization/transform unit 330 may inverse quantize (ie, dequantize) transform coefficients associated with a TU. Inverse quantization/transform unit 330 may use QP values associated with CUs of the TU to determine the degree of quantization.
- inverse quantization/transform unit 330 may apply one or more inverse transforms to the inverse quantized transform coefficients in order to generate a residual block associated with the TU.
- Reconstruction unit 340 uses the residual blocks associated with the TUs of the CU and the prediction blocks of the PUs of the CU to reconstruct the pixel blocks of the CU. For example, the reconstruction unit 340 may add the samples of the residual block to the corresponding samples of the prediction block to reconstruct the pixel block of the CU to obtain the reconstructed image block.
- Loop filtering unit 350 may perform deblocking filtering operations to reduce blocking artifacts of pixel blocks associated with a CU.
- Video decoder 300 may store the reconstructed picture of the CU in decoded picture cache 360 .
- the video decoder 300 may use the reconstructed picture in the decoded picture buffer 360 as a reference picture for subsequent prediction, or transmit the reconstructed picture to a display device for presentation.
- the basic process of video encoding and decoding is as follows: at the encoding end, a frame of image is divided into blocks, and for the current block, the prediction unit 210 uses intra-frame prediction or inter-frame prediction to generate a prediction block of the current block.
- the residual unit 220 may calculate a residual block based on the predicted block and the original block of the current block, that is, a difference between the predicted block and the original block of the current block, and the residual block may also be referred to as residual information.
- the residual block can be transformed and quantized by the transformation/quantization unit 230 to remove information that is not sensitive to human eyes, so as to eliminate visual redundancy.
- the residual block before being transformed and quantized by the transform/quantization unit 230 may be called a time domain residual block, and the time domain residual block after being transformed and quantized by the transform/quantization unit 230 may be called a frequency residual block or a frequency-domain residual block.
- the entropy coding unit 280 receives the quantized variation coefficients output by the variation quantization unit 230 , and may perform entropy coding on the quantized variation coefficients to output a code stream.
- the entropy coding unit 280 can eliminate character redundancy according to the target context model and the probability information of the binary code stream.
- the entropy decoding unit 310 can analyze the code stream to obtain the prediction information of the current block, the quantization coefficient matrix, etc., and the prediction unit 320 uses intra prediction or inter prediction for the current block based on the prediction information to generate a prediction block of the current block.
- the inverse quantization/transformation unit 330 uses the quantization coefficient matrix obtained from the code stream to perform inverse quantization and inverse transformation on the quantization coefficient matrix to obtain a residual block.
- the reconstruction unit 340 adds the predicted block and the residual block to obtain a reconstructed block.
- the reconstructed blocks form a reconstructed image, and the loop filtering unit 350 performs loop filtering on the reconstructed image based on the image or based on the block to obtain a decoded image.
- the encoding end also needs similar operations to the decoding end to obtain the decoded image.
- the decoded image may also be referred to as a reconstructed image, and the reconstructed image may be a subsequent frame as a reference frame for inter-frame prediction.
- the block division information determined by the encoder as well as mode information or parameter information such as prediction, transformation, quantization, entropy coding, and loop filtering, etc., are carried in the code stream when necessary.
- the decoding end analyzes the code stream and analyzes the existing information to determine the same block division information as the encoding end, prediction, transformation, quantization, entropy coding, loop filtering and other mode information or parameter information, so as to ensure the decoding image obtained by the encoding end It is the same as the decoded image obtained by the decoder.
- the above is the basic process of the video codec under the block-based hybrid coding framework. With the development of technology, some modules or steps of the framework or process may be optimized. This application is applicable to the block-based hybrid coding framework.
- the basic process of the video codec but not limited to the framework and process.
- the current intra-frame prediction modes are all based on the reconstructed values around the current block to predict the predicted value of the current block. exact question.
- the embodiment of the present application provides an intra prediction mode based on an autoencoder.
- the The intra prediction mode can realize accurate prediction of the current block.
- FIG. 8 is a schematic diagram of a network structure of an autoencoder involved in an embodiment of the present application. As shown in FIG. 8 , the autoencoder includes an encoding network and a decoding network.
- the encoding network includes 4 fully connected layers (FCL for short) and 4 activation functions, one of which is followed by an activation function, which is used to process the characteristic information output by the fully connected layer. After nonlinear transformation, it is input into the next fully connected layer.
- the output of the last activation function in the encoding network is the output of the encoding network.
- Each fully connected layer includes 128 nodes, the first 3 activation functions are leaky relu activation functions, and the last activation function is sigmoid activation function.
- Fig. 8 is only an example of an encoding network.
- the encoding network of this application includes but is not limited to that shown in Fig.
- connection layer 8 it includes more or less fully connected layers and activation functions than in Fig. 8, and each fully The nodes included in the connection layer are not limited to 128 nodes, and the nodes included in each connection layer may be the same or different.
- the fully connected layer in Figure 8 can be replaced with a convolutional layer, and the leaky relu activation function can be replaced with other activation functions, such as relu, elu and other activation functions.
- the decoding network includes 4 fully connected layers and 3 activation functions. Each fully connected layer in the first 3 fully connected layers is followed by an activation function, which is used to process the characteristic information output by the fully connected layer. After non-linear transformation, it is input into the next fully connected layer. The output of the last fully connected layer in the decoding network is used as the output of the decoding network.
- the network structure of the decoding network is symmetrical to that of the encoding network.
- Each fully connected layer in the decoding network includes 128 nodes, and the 3 activation functions are leaky relu activation functions. It should be noted that FIG. 8 is only an example of a decoding network.
- the encoding network of this application includes but is not limited to that shown in FIG.
- connection layer are not limited to 128 nodes, and the nodes included in each connection layer may be the same or different.
- the fully connected layer in Figure 8 can be replaced with a convolutional layer, and the leaky relu activation function can be replaced with other activation functions, such as relu, elu and other activation functions.
- the original pixel value of the current block is input into the coding network, and the last activation function of the coding network outputs the characteristic information of the current block, which is denoted as Side Information (SI), and the characteristic information It is a low-dimensional feature map (feature map). Then, the last activation function of the encoding network outputs the characteristic information of the current block, and the pixel values of the reconstructed pixels around the current block are input into the decoding network, and after the processing of the decoding network, the predicted block of the current block is output.
- SI Side Information
- feature map feature map
- the encoding end determines the prediction block of the current block, it needs to input the characteristic information of the current block output by the encoding network and the pixel values of the reconstructed pixels around the current block into the decoding network, and the decoding network outputs the current block’s prediction block.
- the encoding end in order for the decoding end to accurately determine the predicted block of the current block, the encoding end needs to carry the characteristic information of the current block output by the above encoding network in the code stream and send it to the decoding end.
- the decoder parses the characteristic information of the current block from the code stream, outputs the characteristic information of the current block and the pixel values of the reconstructed pixels around the current block from the decoding network of the encoder, and obtains the predicted block of the current block, and then according to The predicted block of the current block and the residual block of the current block analyzed in the code stream are used to obtain the reconstructed block of the current block.
- the characteristic information output by the encoding network needs to be written into the code stream.
- the characteristic information output by the encoding network needs to be valued, and the characteristic information after the value is written into the code stream.
- the value of the characteristic information output by the encoding network needs to satisfy the number of continuous distribution within a limited range.
- the range of element values in the feature information output by the activation function of the last layer of the encoding network is [a, b], where a and b are integers.
- Example 1 a is 0, b is 1, that is, the value range of the element value in the characteristic information output by the last layer activation function of the encoding network is [0, 1], so the characteristic information output by the last layer activation function
- the rounding result is equal to 0 or 1, which can be represented by 1 bit, for example, bit0 represents a value of 0, and bit1 represents a value of 1.
- x is the input of the activation function of the last layer, that is, the characteristic value output by the layer before the activation function of the last layer in the encoding network
- S(x) is the characteristic information output by the activation function of the last layer.
- the value of each element in the characteristic information output by the encoding network can be limited between 0 and 1, and in subsequent rounding , which is rounded to 0 or 1, which is convenient for encoding characteristic information into the code stream.
- the expression of the activation function of the last layer includes but is not limited to the above formula (1), and it can also be any other expression that constrains the output of the encoding network to be between [0, 1] Mode.
- Example 2 a is -1, b is 1, that is, the value range of the element value in the characteristic information output by the last layer activation function of the encoding network is [-1, 1], so the output of the last layer activation function
- the rounded result is -1, 0 or 1.
- x is the input of the activation function of the last layer, that is, the characteristic value output by the layer before the activation function of the last layer in the encoding network
- S(x) is the characteristic information output by the activation function of the last layer
- the value of each element in the characteristic information output by the encoding network can be limited between -1 and 1, and in When rounding, the rounding result is -1, 0 or 1, which is convenient for encoding characteristic information into the code stream.
- the expression of the activation function of the last layer includes but is not limited to the above formula (2), and it can also be any other expression that constrains the output of the encoding network to be between [-1, 1]. expression.
- the characteristic information output by the activation function of the last layer may also be enlarged or reduced.
- the result output by the above formula (1) by multiplying the result output by the above formula (1) by 2, the limited range of the characteristic information output by the encoding network can be changed from 0 to 1 to 0 to 2, and then the characteristic information within 0 to 2 can be rounded to an integer.
- the result is 0, 1 and 2.
- the result output by the above formula (1) can also be multiplied by 2 and then subtracted by 1, so that the limit range of the characteristic information output by the encoding network can be changed from 0 to 1 to -1 to 1.
- take Integer results are -1, 0 and 1.
- the self-encoder of the embodiment of the present application limits the element values in the characteristic information output by the last layer of the encoding network activation function within a certain range, for example, within the range of [0, 1] or [-1, 1] Inside, it is convenient for the feature information output by the activation function of the last layer to be rounded and encoded.
- the embodiment of the present application proposes two different expressions for the activation function of the last layer, which has a simple operation process and a small amount of calculation, and ensures that the characteristic information output by the activation function of the last layer is constrained within a certain range, improving the Encoding efficiency of the encoding network.
- the encoding network and decoding network of the self-encoder in the embodiment of the present application are optimized and trained synchronously.
- the specific process is to input the training encoding block into the encoding network, and the encoding network outputs the characteristic information of the training encoding block, and obtains the value of the characteristic information Afterwards, the pixel values of the reconstructed pixel points around the training coding block are input into the decoding network, and the decoding network outputs the prediction block of the training coding block.
- the loss is calculated, and the weight of each layer in the self-encoder is updated in reverse according to the loss, and the training of the self-encoder is completed.
- the update process of the weight of each layer is realized by deriving the output of each layer.
- the feature information output by the encoding network needs to be rounded, and the process of rounding is not derivable.
- the embodiment of the present application solves this technical problem in the following two ways:
- the first method is to add a uniformly distributed noise to the characteristic information output by the activation function of the last layer of the encoding network.
- the value range of the noise can be -0.5 to 0.5, thereby simulating the rounding process.
- noise is added to the characteristic information output by the activation function of the last layer in the range of 0 to 1.
- the value range of the characteristic information is -0.5 ⁇ 1.5, as the input of the decoding network.
- the forward calculation is to directly use the rounded discrete value as the input of the decoding network, that is, the characteristic information output by the activation function of the last layer is directly rounded during the forward calculation.
- Input the decoding network for example as shown in the following formula (3):
- B(x) is the feature information after rounding
- S(x) is the feature information output by the activation function of the last layer
- ⁇ means rounding
- B'(x) represents the value after derivation of the characteristic information S(x) output by the activation function of the last layer.
- the above two methods are used to improve the training accuracy of the self-encoder, and then when the self-encoder is used for intra-frame prediction, the accuracy of intra-frame prediction can be guaranteed .
- the present application also trains various self-encoders for prediction blocks of different sizes, for example, for 32 ⁇ 32, 32 ⁇ 16, 32 ⁇ 8, 32 ⁇ 4, 16 ⁇ 16, 16 ⁇ 8, 16 ⁇ 4, 8 ⁇ 8, 8 ⁇ 4, 4 ⁇ 4 and other blocks are trained to obtain the autoencoders corresponding to the above blocks.
- the above-mentioned blocks of various shapes may be luminance blocks including luminance components, and then train an autoencoder for predicting prediction values of luminance blocks of different shapes.
- the above-mentioned blocks of each shape may be chroma blocks including chroma components, and then train an autoencoder for predicting prediction values of chroma blocks of different shapes.
- the above-mentioned blocks of each shape include luma components and chroma components, and then train an autoencoder for predicting luma prediction values and chroma prediction values of blocks of different shapes.
- the DIV2K data set can be used as a training set to train autoencoders corresponding to blocks of different shapes.
- the feature information output by the encoding network is an N ⁇ M feature vector, for example, the feature vector is a 1 ⁇ 2 vector, such as (0,1); for another example, the feature vector is a 1 ⁇ 3 vector, For example (0,1,1).
- the autoencoder corresponding to the block of a ⁇ b can be trained during training, and when the block of b ⁇ a is predicted in the frame, it can be Use the autoencoder corresponding to the block of a ⁇ b.
- the b ⁇ a block is transposed into an a ⁇ b size block, and the transposed a ⁇ b size block is input into the autoencoder to obtain the a ⁇ b size prediction block output by the autoencoder.
- a prediction block of size a ⁇ b is transposed into a prediction block of size b ⁇ a, which is used as a prediction block of a block of b ⁇ a.
- the embodiment of the present application can train the autoencoders corresponding to blocks of different shapes.
- the autoencoder corresponding to the shape blocks can be selected, thereby ensuring the accuracy of different shapes. The accuracy with which the block is intra-predicted.
- the network structure and training process of the autoencoder are introduced above. Based on this, the video decoding method provided by the embodiment of the present application is introduced below with reference to FIG. 10 and taking the decoding end as an example.
- FIG. 10 is a schematic flowchart of a video decoding method provided by an embodiment of the present application, and the embodiment of the present application is applied to the video decoder shown in FIG. 1 and FIG. 2 .
- the method of the embodiment of the present application includes:
- S401 Decode a code stream, and determine an intra prediction mode of a current block.
- the current block is also referred to as a current decoding block, a current decoding unit, a decoding block, a block to be decoded, a current block to be decoded, and the like.
- the implementation of determining the intra prediction mode of the current block in S401 above includes but is not limited to the following:
- the decoding end decodes the code stream to obtain the first flag, and the first flag is used to indicate whether the current sequence is allowed to use the intra prediction mode based on the self-encoder;
- S401-A2 Determine the intra prediction mode of the current block according to the first flag.
- the intra prediction mode of the current block is not Autoencoder-based intra prediction mode.
- the value of the first flag is equal to the first value (such as 1), it means that the current sequence allows the use of the intra prediction mode based on the self-encoder.
- the first value such as 1
- the intra prediction mode of the current block is the intra prediction mode based on the self-encoder. That is to say, in this implementation, if the value of the first flag is equal to the first value, it means that the current sequence allows the use of the intra prediction mode based on the self-encoder, and the intra prediction modes of the decoded blocks in the current sequence are all is an intra prediction mode based on an autoencoder.
- the code stream includes the first flag and the second flag.
- the decoding end decodes the code stream to obtain the second flag, and the second flag is used Indicates whether the current block uses the intra-frame prediction mode based on the self-encoder, and determines the intra-frame prediction mode of the current block according to the second flag.
- the intra prediction mode of the current block is an intra prediction mode based on an autoencoder; if the value of the second flag is not equal to the value 1, it means that The intra prediction mode of the current block is an intra prediction mode other than the autoencoder-based intra prediction mode.
- the decoder may determine the intra prediction mode of the current block according to the first flag decoded from the code stream, or determine the intra prediction mode of the current block according to the first flag and the second flag.
- S401-B2. Determine the intra prediction mode of the current block according to the second flag.
- the code stream includes a second flag that directly indicates whether the current block uses the intra-frame prediction mode based on the self-encoder.
- the decoding end decodes the code stream to obtain the second flag, Determine the intra-frame prediction mode of the current block directly according to the second flag, for example, when the value of the second flag is 1 (for example, 1), it means that the intra-frame prediction mode of the current block is the intra-frame prediction mode based on the self-encoder
- the value of the second flag is not equal to the value 1, it indicates that the intra prediction mode of the current block is an intra prediction mode other than the intra prediction mode based on the self-encoder.
- the first flag is not written in the code stream, but the second flag is written directly, which is used to indicate whether the current block uses the intra prediction mode based on the autoencoder, thereby saving codewords and reducing the decoding end. decoding burden.
- the above-mentioned second flag may be a newly added flag in the code stream.
- the above-mentioned second flag is an existing indication flag used to indicate the intra prediction mode in the code stream.
- This embodiment extends the value of the indication flag of the intra prediction mode, and adds an indication based on self-encoding The value of the intra prediction mode of the device.
- the value of the existing intra-frame prediction mode indicator is extended.
- the value of the intra-frame prediction mode indicator is 1, it means that the intra-frame prediction mode of the current block is a frame based on the self-encoder Intra-prediction mode.
- This method does not need to additionally add a field representing the second flag in the code stream, thereby saving codewords and improving decoding efficiency.
- the embodiment of the present application does not limit the specific writing positions of the first flag and the second flag in the code stream.
- the first flag is included in the sequence-level parameter syntax element.
- the first flag is added to the sequence-level parameter syntax, and the changes in the sequence-level parameter syntax (Sequence parameter set RBSP syntax) are shown in Table 1:
- sps_ae_enabled_flag represents the first flag.
- the above-mentioned second flag is included in the coding unit syntax element.
- step S402 after decoding the code stream and obtaining the intra prediction mode of the current block, the following step S402 is performed.
- the intra-frame prediction mode of the current block is the intra-frame prediction mode based on the self-encoder, decode the code stream to obtain feature information of the current block.
- the encoder uses autoencoder to perform intra-frame prediction on the current block
- the first feature information is rounded , to obtain the feature information of the current block, and encode the feature information of the current block into the code stream, so that the decoder can determine the prediction block of the current block according to the feature information of the current block.
- the decoder executes the above S401, and if it is determined that the intra prediction mode of the current block is the intra prediction mode based on the self-encoder, it continues to decode the code stream to obtain the feature information of the current block.
- the embodiment of the present application does not limit the specific position of the feature information of the current block in the code stream.
- the feature information of the current block is located in the coding unit syntax element, and the decoding end decodes the coding unit syntax element to obtain the feature of the current block information.
- the rounded feature information of the current block and the pixel values of the reconstructed pixels around the current block are input into the decoding network of the autoencoder, and the decoding network outputs the predicted block of the current block.
- the decoder predicts the prediction block of the current block based on the intra prediction mode of the self-encoder, it also needs to obtain the pixel values of the reconstructed pixels around the current block.
- the reconstructed pixels around the current block include pixels in n rows above the current block and/or pixels in m columns in the left layer, where n and m are both positive integers, and n and m may or may not be equal.
- Each row of pixels in the above n rows of pixels may be continuous or discontinuous.
- Each column of pixel points in the m columns of pixel points may be continuous or discontinuous.
- the above n rows of pixels may or may not be adjacent, and the m columns of pixels may or may not be adjacent.
- blocks of different shapes may correspond to different autoencoders. Therefore, the decoder may select the autoencoder corresponding to the current block from multiple autoencoders according to the size of the current block. Next, as shown in Figure 11, input the obtained feature information of the current block and the pixel values of the reconstructed pixels around the current block into the decoding network of the autoencoder corresponding to the current block, and obtain the prediction of the current block output by the decoding network piece.
- the decoder After obtaining the predicted block of the current block according to the above method, the decoder decodes the code stream to obtain the residual block of the current block, and adds the predicted block to the residual block to obtain the reconstructed block of the current block.
- the intra prediction mode of the current block is determined by decoding the code stream; if the intra prediction mode of the current block is the intra prediction mode based on the self-encoder, the code stream is decoded to obtain the current block feature information; obtain the pixel value of the reconstructed pixel around the current block; input the characteristic information of the current block and the pixel value of the reconstructed pixel around the current block into the decoding network of the autoencoder corresponding to the current block, and obtain the decoding network Output the predicted block for the current block. That is, the embodiment of the present application adds an autoencoder-based intra-frame prediction mode to provide more options for intra-frame prediction.
- the prediction block of the current block is determined according to the characteristic information of the current block and the pixel values of the reconstructed pixels around the current block.
- the prediction since the prediction not only considers the pixel value of the reconstructed pixel around the current block, but also considers the feature information of the current block, the current block can be realized Accurate prediction, improve the accuracy of intra prediction.
- the current block includes a luma component and/or a chroma component
- the above-mentioned second flag is used to indicate whether the luma component and/or the chroma component of the current block use an intra prediction mode based on an autoencoder.
- FIG. 12 is another schematic flowchart of the video decoding method provided by the embodiment of the present application.
- the second flag is used to indicate whether the luminance component of the current block uses the intra prediction mode based on the self-encoder as an example. As shown in Figure 12, including:
- S501 Decode a code stream, and determine an intra prediction mode of a luminance component of a current block.
- the implementation of determining the intra prediction mode of the luminance component of the current block in S501 above includes but is not limited to the following:
- the decoding end decodes the code stream to obtain the first flag, and the first flag is used to indicate whether the current sequence is allowed to use the intra prediction mode based on the self-encoder;
- S501-A2 Determine the intra prediction mode of the luminance component of the current block according to the first flag.
- the intra-frame prediction mode of the luminance component of the current block can be determined.
- the prediction mode is not based on the autoencoder's intra prediction mode.
- the value of the first flag is equal to the first value (for example, 1), it means that the current sequence allows the use of the intra prediction mode based on the self-encoder.
- the first flag it is determined that the current Implementations of the intra prediction mode of the luma component of a block include but are not limited to the following:
- the luminance component of the current block has an intra prediction mode based on the self-encoder, then it is determined that the intra prediction mode of the luminance component of the current block is based on the self-encoder The encoder's intra prediction mode.
- the code stream includes the first flag and the second flag.
- the decoding end decodes the code stream to obtain the second flag, and the second flag is used Indicates whether the luminance component of the current block uses the intra prediction mode based on the self-encoder, and determines the intra prediction mode of the luminance component of the current block according to the second flag.
- the value of the second flag is 1 (for example, 1)
- the value of the second flag is not equal to the value 1
- the intra-frame prediction mode of the luminance component of the current block is an intra-frame prediction mode other than the intra-frame prediction mode based on the self-encoder.
- the decoder can determine the intra prediction mode of the luminance component of the current block according to the first flag decoded from the code stream, or determine the intra prediction mode of the luminance component of the current block according to the first flag and the second flag. Intra prediction mode.
- the code stream includes the second flag that directly indicates whether the luminance component of the current block uses the intra prediction mode based on the self-encoder, and if the above-mentioned first flag is not included, the decoding end decodes the code stream to obtain the first Two flags, directly according to the second flag, determine the intra prediction mode of the luminance component of the current block, for example, when the value of the second flag is 1 (for example, 1), it indicates the intra prediction of the luminance component of the current block The mode is the intra-frame prediction mode based on the self-encoder. If the value of the second flag is not equal to the value 1, it means that the intra-frame prediction mode of the brightness component of the current block is other than the intra-frame prediction mode based on the self-encoder. other intra prediction modes.
- the first flag is not written in the code stream, but the second flag is written directly, which is used to indicate whether the luminance component of the current block uses the intra prediction mode based on the self-encoder, thereby saving codewords, Reduce the decoding burden on the decoding end.
- the embodiment of the present application does not limit the specific writing positions of the first flag and the second flag in the code stream.
- the first flag is included in the sequence-level parameter syntax element.
- the above-mentioned second flag is included in the coding unit syntax element.
- the luminance component of the current block uses the intra prediction mode based on the self-encoder, decode the code stream to obtain luminance characteristic information of the current block.
- the luminance feature information of the current block carries a syntax element corresponding to the luminance component of the current block.
- the decoding end decodes the syntax element corresponding to the brightness component of the current block; from the syntax element corresponding to the brightness component of the current block, the brightness characteristic information of the current block is obtained.
- the brightness characteristic information of the current block is carried in the coding unit syntax element.
- the coding unit syntax element further includes a second flag intra_ae_flag.
- the coding unit syntax (Coding unit syntax) code stream at the decoding end obtains the coding unit syntax element, and reads the second flag intra_ae_flag from it.
- the intra_ae_flag is a coding unit level control flag used to indicate whether the luminance component of the current block uses Autoencoder-based intra prediction mode. If the intra_ae_flag is 1, it indicates the intra prediction mode used in the self-encoder of the luminance component of the current block, and further reads the luminance feature information sideinfo[] of the current block.
- si_size indicates how many feature information elements need to be encoded.
- sideinfo[] represents the brightness feature information
- sideinfo[] uses a 1-bit fixed-length code (u(1)) to decode, it is for the situation when the value of sideinfo[] is 0 or 1, and it can also be decoded based on the context model .
- multi-bit codewords can be used for decoding.
- the relevant syntax table of Coding unit syntax is changed as shown in the table 3 shows:
- abssideinfo[] is the absolute value of sideinfo[], and when it is not 0, its symbol is further decoded.
- context models can also be used.
- more bit codewords may be used to represent the luminance feature information sideinfo[] of the current block.
- the reconstructed pixels around the current block include pixels in n rows above the current block and/or pixels in m columns in the left layer, where n and m are both positive integers, and n and m may or may not be equal.
- Each row of pixels in the above n rows of pixels may be continuous or discontinuous.
- Each column of pixel points in the m columns of pixel points may be continuous or discontinuous.
- the aforementioned n rows of pixels may be adjacent or may not be adjacent, and the m columns of pixels may be adjacent or may not be adjacent.
- the autoencoders corresponding to blocks of different shapes may be different, and the autoencoders corresponding to the chrominance component and the luma component may also be different. Therefore, the decoder can select from multiple Among the autoencoders, the autoencoder corresponding to the luminance component of the current block is selected.
- the decoder can select from multiple Among the autoencoders, the autoencoder corresponding to the luminance component of the current block is selected.
- Luma prediction block input the brightness characteristic information of the current block obtained above and the brightness values of the reconstructed pixels around the current block.
- the intra prediction mode of the luminance component of the current block is determined by decoding the code stream; if the intra prediction mode of the luminance component of the current block is an intra prediction mode based on an autoencoder, the decoding Code stream to obtain the brightness characteristic information of the current block; obtain the brightness value of the reconstructed pixel points around the current block; input the brightness characteristic information of the current block and the brightness value of the reconstructed pixel points around the current block into the brightness component corresponding to the current block
- the brightness prediction block of the current block output by the decoding network is obtained.
- the embodiment of the present application adds an autoencoder-based intra prediction mode during luma intra prediction, enriching luma component intra prediction modes. If it is determined that the intra prediction mode of the luminance component of the current block is an intra prediction mode based on the self-encoder, determine the luminance prediction block of the current block according to the luminance characteristic information of the current block and the luminance values of reconstructed pixels around the current block , in the case that the original value of the current fast luminance component has little correlation with the reconstructed value around the current block, since the prediction not only considers the pixel values of the reconstructed pixels around the current block, but also considers the The luminance feature information can realize accurate prediction of the luminance component of the current block, and improve the accuracy of intra-frame prediction of the luminance component.
- FIG. 14 is another schematic flowchart of the video decoding method provided by the embodiment of the present application.
- the second flag is used as an example to indicate whether the chroma component of the current block uses an intra prediction mode based on an autoencoder. As shown in Figure 14, including:
- the implementation of determining the intra prediction mode of the chrominance component of the current block in S601 above includes but is not limited to the following:
- S601 includes the following S601-A1 and S601-A2:
- the decoding end decodes the code stream to obtain the first flag, and the first flag is used to indicate whether the current sequence is allowed to use the intra prediction mode based on the self-encoder;
- S601-A2 Determine the intra prediction mode of the chrominance component of the current block according to the first flag.
- Intra prediction modes are not autoencoder-based intra prediction modes.
- the value of the first flag is equal to the first value (for example, 1), it means that the current sequence allows the use of the intra prediction mode based on the self-encoder.
- the first value for example, 1
- determine the current The implementation of the intra prediction mode of the chrominance component of the block includes but is not limited to the following:
- the intra-frame prediction mode of the chroma component of the current block is determined as Autoencoder-based intra prediction mode.
- the code stream includes the first flag and the second flag.
- the decoding end decodes the code stream to obtain the second flag, and the second flag is used Indicates whether the chroma component of the current block uses the intra prediction mode based on the self-encoder, and according to the second flag, determines the intra prediction mode of the chroma component of the current block.
- the value of the second flag is a value of 1 (for example, 1)
- the intra prediction mode of the chrominance component of the current block is an intra prediction mode based on the self-encoder
- the value of the second flag is not equal to the value 1
- the intra-frame prediction mode of the chroma component of the current block is an intra-frame prediction mode other than the intra-frame prediction mode based on the self-encoder.
- the decoder can determine the intra prediction mode of the chroma component of the current block according to the first flag decoded from the code stream, or determine the chroma component of the current block according to the first flag and the second flag The intra prediction mode for the component.
- S601 includes the following S601-B1 and S601-B2:
- the code stream includes the second flag that directly indicates whether the chrominance component of the current block uses the intra prediction mode based on the self-encoder, and when the above-mentioned first flag is not included, the decoding end decodes the code stream to obtain
- the second flag directly determines the intra prediction mode of the chroma component of the current block according to the second flag, for example, when the value of the second flag is 1 (for example, 1), it means that the chroma component of the current block is
- the intra-frame prediction mode is the intra-frame prediction mode based on the self-encoder. If the value of the second flag is not equal to the value 1, it means that the intra-frame prediction mode of the chroma component of the current block is the intra-frame prediction mode based on the self-encoder.
- the first flag is not written in the code stream, but the second flag is written directly, which is used to indicate whether the chroma component of the current block uses the intra prediction mode based on the self-encoder, thereby saving codewords , reducing the decoding burden on the decoder.
- the embodiment of the present application does not limit the specific writing positions of the first flag and the second flag in the code stream.
- the first flag is included in the sequence-level parameter syntax element.
- the above-mentioned second flag is included in the coding unit syntax element.
- the chrominance component of the current block uses the intra prediction mode based on the self-encoder, decode the code stream to obtain chrominance characteristic information of the current block.
- the chroma feature information of the current block carries syntax elements corresponding to the chroma components of the current block.
- the decoding end obtains the chrominance characteristic information of the current block by decoding the syntax element corresponding to the chroma component of the current block; in the syntax element corresponding to the chroma component of the current block.
- the chroma feature information of the current block is carried in the coding unit syntax element.
- the coding unit syntax element further includes a second flag intra_ae_flag.
- the coding unit syntax (Coding unit syntax) code stream at the decoding end obtains the coding unit syntax element, and reads the second flag intra_ae_flag from it.
- the intra_ae_flag is a coding unit-level control flag used to indicate whether the chroma component of the current block is Use autoencoder based intra prediction mode. If the intra_ae_flag is 1, it means that the chroma component of the current block is used in the intra prediction mode of the self-encoder, and the chroma feature information sideinfo[] of the current block is further read.
- si_size indicates how many feature information elements need to be encoded.
- sideinfo_cb[] represents cb chroma feature information
- sideinfo_cr[] represents cr chroma feature information
- sideinfo_cb[] and sideinfo_cr[] are decoded using 1-bit fixed-length code (u(1)), which is for sideinfo_cb[] and sideinfo_cr[]
- u(1) 1-bit fixed-length code
- multi-bit codewords may be used for decoding, for example, refer to Table 3 above.
- the reconstructed pixels around the current block include pixels in n rows above the current block and/or pixels in m columns in the left layer, where n and m are both positive integers, and n and m may or may not be equal.
- Each row of pixels in the above n rows of pixels may be continuous or discontinuous.
- Each column of pixel points in the m columns of pixel points may be continuous or discontinuous.
- the aforementioned n rows of pixels may be adjacent or may not be adjacent, and the m columns of pixels may be adjacent or may not be adjacent.
- the autoencoders corresponding to blocks of different shapes may be different, and the autoencoders corresponding to the chroma components may also be different. Therefore, the decoding end can according to the size of the chroma components of the current block, From the multiple autoencoders, select the autoencoder corresponding to the chrominance component of the current block. Next, as shown in Figure 15, input the chroma characteristic information of the current block obtained above and the chroma values of the reconstructed pixels around the current block into the autoencoder corresponding to the chroma component of the current block, and obtain the output of the decoding network The chroma prediction block for the current block.
- the intra prediction mode of the chroma component of the current block is determined by decoding the code stream; if the intra prediction mode of the chroma component of the current block is an intra prediction mode based on a self-encoder, Then decode the code stream to obtain the chroma characteristic information of the current block; obtain the chroma value of the reconstructed pixels around the current block; input the chroma characteristic information of the current block and the chroma value of the reconstructed pixels around the current block into the current
- the chrominance prediction block of the current block output by the decoding network is obtained.
- the embodiment of the present application adds an autoencoder-based intra-frame prediction mode during chroma intra-frame prediction, enriching the chroma component intra-frame prediction modes. If it is determined that the intra prediction mode of the chroma component of the current block is the intra prediction mode based on the self-encoder, the current block is determined according to the chroma characteristic information of the current block and the chrominance values of the reconstructed pixels around the current block Chroma prediction block, in the case that the original value of the current fast chroma component has little correlation with the reconstructed value around the current block, because the prediction not only considers the pixel values of the reconstructed pixels around the current block, but also Considering the chrominance characteristic information of the current block, accurate prediction of the chrominance component of the current block can be realized, and the accuracy of intra-frame prediction of the chrominance component can be improved.
- Fig. 16 is another schematic flowchart of the video decoding method provided by the embodiment of the present application.
- the second flag is used to indicate whether the luminance component and the chrominance component of the current block use the intra prediction mode based on the self-encoder as example. As shown in Figure 16, including:
- the implementation of determining the intra prediction mode of the luminance component and the chrominance component of the current block in S701 above includes but is not limited to the following:
- S701 includes the following S701-A1 and S701-A2:
- the decoding end decodes the code stream to obtain the first flag, and the first flag is used to indicate whether the current sequence is allowed to use the intra prediction mode based on the self-encoder;
- the luminance component and chrominance of the current block can be determined None of the intra prediction modes of the components are based on the intra prediction mode of the autoencoder.
- the value of the first flag is equal to the first value (for example, 1), it means that the current sequence allows the use of the intra prediction mode based on the self-encoder.
- the first value for example, 1
- the implementation methods of the intra prediction mode of the luma component and chroma component of the block include but are not limited to the following:
- the intra prediction mode of is the intra prediction mode based on the autoencoder.
- the code stream includes the first flag and the second flag.
- the decoding end decodes the code stream to obtain the second flag, and the second flag is used Indicates whether the luma component and chroma component of the current block use the intra prediction mode based on the self-encoder, and according to the second flag, determine the intra prediction mode of the luma component and the chroma component of the current block.
- the value of the second flag is a value of 1 (for example, 1)
- the intra prediction modes of the luma component and the chrominance component of the current block are both based on the intra prediction mode of the self-encoder
- the value of the second flag is If the value is not equal to the value 1, it means that the intra prediction mode of the luma component and chrominance component of the current block is an intra prediction mode other than the intra prediction mode based on the self-encoder.
- the decoder can determine the intra prediction mode of the luma component and chrominance component of the current block according to the first flag decoded from the code stream, or determine the current block according to the first flag and the second flag Intra prediction mode for luma and chroma components.
- S701 includes the following S701-B1 and S701-B2:
- the code stream includes the second flag that directly indicates whether the luma component and chrominance component of the current block use the intra prediction mode based on the self-encoder.
- the decoder decodes the Stream, obtain the second flag, directly according to the second flag, determine whether the luma component and chrominance component of the current block use the intra prediction mode based on the self-encoder.
- the value of the second flag is 1 (for example, 1)
- the intra-frame prediction modes of the luma component and chroma component of the current block are both based on the intra-frame prediction mode of the self-encoder
- the value of the second flag When the value is not equal to the value 1, it means that the intra prediction modes of the luma component and chrominance component of the current block are other intra prediction modes except the intra prediction mode based on the self-encoder.
- the first flag is not written in the code stream, but the second flag is written directly, which is used to indicate whether the luma component and chrominance component of the current block use the intra prediction mode based on the self-encoder, and then Save codewords and reduce the decoding burden on the decoding end.
- the embodiment of the present application does not limit the specific writing positions of the first flag and the second flag in the code stream.
- the first flag is included in the sequence-level parameter syntax element.
- the above-mentioned second flag is included in the coding unit syntax element.
- both the luma component and the chrominance component of the current block use the intra prediction mode based on the self-encoder, decode the code stream to obtain the luma characteristic information and the chrominance characteristic information of the current block.
- the specific writing position of the luminance characteristic information and the chrominance characteristic information of the current block in the code stream may be carried at any position in the code stream.
- the chroma feature information of the current block carries syntax elements corresponding to the luma component and the chroma component of the current block.
- the decoding end decodes the syntax elements corresponding to the luma component and the chroma component of the current block; from the syntax elements corresponding to the luma component and the chroma component of the current block, the luma feature information and the chroma feature information of the current block are obtained.
- the luma characteristic information and chrominance characteristic information of the current block are carried in the coding unit syntax element.
- the coding unit syntax element further includes a second flag intra_ae_flag.
- the coding unit syntax (Coding unit syntax) code stream at the decoding end obtains the coding unit syntax element, and reads the second flag intra_ae_flag from it.
- the intra_ae_flag is a coding unit-level control flag used to indicate the luminance component and color Whether the degree component uses autoencoder-based intra prediction mode. If the intra_ae_flag is 1, it means that the luma component and chroma component of the current block are used in the intra prediction mode of the self-encoder, and the luma feature information and chroma feature information of the current block are further read.
- the reconstructed pixels around the current block include pixels in n rows above the current block and/or pixels in m columns in the left layer, where n and m are both positive integers, and n and m may or may not be equal.
- Each row of pixels in the above n rows of pixels may be continuous or discontinuous.
- Each column of pixel points in the m columns of pixel points may be continuous or discontinuous.
- the aforementioned n rows of pixels may be adjacent or may not be adjacent, and the m columns of pixels may be adjacent or may not be adjacent.
- the pixel values of the reconstructed pixel points around the current block include chrominance values and brightness values.
- the autoencoders corresponding to blocks of different shapes may be different, and the autoencoders corresponding to luma components and chrominance components may also be different. Therefore, the decoding end may use the Size, from multiple autoencoders, select the autoencoder corresponding to the luma component of the current block, and the autoencoder corresponding to the chrominance component.
- the decoding end may use the Size, from multiple autoencoders, select the autoencoder corresponding to the luma component of the current block, and the autoencoder corresponding to the chrominance component.
- Figure 17 input the chroma characteristic information of the current block obtained above and the chroma values of the reconstructed pixels around the current block into the autoencoder corresponding to the chroma component of the current block, and obtain the output of the decoding network The chroma prediction block for the current block.
- the self-encoder corresponding to the chroma component and the luminance component of the current block is the same, at this time, the chroma characteristic information and luminance characteristic information of the current block, and the pixel values of the reconstructed pixels around the current block can be input
- the chroma prediction block and the luma prediction block of the current block output by the decoding network are obtained.
- the intra prediction mode of the luminance component and the chrominance component of the current block is determined by decoding the code stream; if the intra prediction modes of the luminance component and the chrominance component of the current block are both based on the In the intra prediction mode of the current block, the code stream is decoded to obtain the luminance feature information and chrominance feature information of the current block; , input into the decoding network of the self-encoder, and obtain the luma prediction block and chrominance prediction block of the current block output by the decoding network. That is, in the embodiment of the present application, the luma component and the chrominance component of the current block can be indicated and predicted at the same time, and the prediction efficiency of the current block can be improved.
- FIG. 18 is a schematic flowchart of a video encoding method provided by an embodiment of the present application, and the embodiment of the present application is applied to the video encoder shown in FIG. 1 and FIG. 2 .
- the method of the embodiment of the present application includes:
- N is a positive integer
- the N first intra prediction modes include an intra prediction mode based on a self-encoder .
- the video encoder receives a video stream, which is composed of a series of image frames, performs video encoding for each frame of image in the video stream, and divides the image frames into blocks to obtain the current block.
- the current block is also referred to as a current coding block, a current image block, a coding block, a current coding unit, a current block to be coded, a current image block to be coded, and the like.
- the block divided by the traditional method includes not only the chrominance component of the current block position, but also the luminance component of the current block position.
- the separation tree technology can divide separate component blocks, such as a separate luma block and a separate chrominance block, where the luma block can be understood as only containing the luma component of the current block position, and the chrominance block can be understood as containing only the current block The chroma component of the position. In this way, the luma component and the chrominance component at the same position can belong to different blocks, and the division can have greater flexibility. If the separation tree is used in CU partitioning, some CUs contain both luma and chroma components, some CUs only contain luma components, and some CUs only contain chroma components.
- the current block in the embodiment of the present application only includes chroma components, which may be understood as a chroma block.
- the current block in this embodiment of the present application only includes a luma component, which may be understood as a luma block.
- the current block includes both luma and chroma components.
- the video encoder When the video encoder performs intra-frame prediction on the current block, it will try at least one intra-frame prediction mode in the N first intra-frame prediction modes, such as the intra-frame prediction mode based on the self-encoder, the DM mode, the DC mode (Intra_Chroma_DC ), horizontal mode (Intra_Chroma_Horizontal), vertical mode (Intra_Chroma_Vertical), bilinear (Bilinear) mode, PCM mode, and cross-component prediction mode (TSCPM, PMC, CCLM in VVC), etc.
- the intra-frame prediction mode based on the self-encoder
- the DM mode the DC mode (Intra_Chroma_DC ), horizontal mode (Intra_Chroma_Horizontal), vertical mode (Intra_Chroma_Vertical), bilinear (Bilinear) mode, PCM mode, and cross-component prediction mode (TSCPM, PMC, CCLM in VVC), etc.
- the video encoder determines the intra-frame prediction mode of the current block from the N first intra-frame prediction modes, including but not limited to the following:
- the video encoder determines the intra prediction mode of the current block from the N first intra prediction modes according to the characteristics of the current block, for example, if the pixel value of the current block is related to the pixel value of the surrounding reconstructed pixels If the reliability is small, the intra-frame prediction mode based on the autoencoder may be determined as the intra-frame prediction mode of the current block.
- the video encoder determines the intra-frame prediction mode of the current block from the N first intra-frame prediction modes in the following manner S8011.
- the rate-distortion cost corresponding to each first intra-frame prediction mode may be calculated by using an existing method for calculating the rate-distortion cost.
- the implementation methods include but not Limited to the following examples 1 and 2:
- Example 1 the above S8011 includes the following steps from S8011-A1 to S8011-A3:
- S8011-A2 Determine the first rate-distortion cost of the first intra-frame prediction mode according to the distortion between the predicted value and the original value of the current block, and the number of bits consumed when encoding the flag bit of the first intra-frame prediction mode;
- S8011-A Determine the intra prediction mode of the current block from the N first intra prediction modes according to the first rate-distortion cost.
- the first intra-frame prediction mode is used to predict the current block to obtain a predicted value of the current block, and the predicted value is the first The prediction value corresponding to an intra prediction mode.
- the predicted value corresponding to the first intra-frame prediction mode is the original value of the current block
- calculate the distortion D1 between the predicted value corresponding to the first intra-frame prediction mode and the original value of the current block is calculated and encode the The number of bits R2 consumed by the flag of the first frame prediction mode.
- the first rate-distortion cost J1 corresponding to each intra prediction mode among the N first intra prediction modes may be determined.
- the intra-frame prediction mode of the current block is determined from the N first intra-frame prediction modes.
- the first rate-distortion cost is determined by the distortion between the predicted value and the original value, and the number of bits consumed by encoding flag bits, compared to the distortion between the reconstruction value and the original value, and the entire intra-frame
- the number of encoded bits corresponding to the prediction mode determines the first rate-distortion cost, avoids calculating the reconstruction value and counting the number of bits in the entire encoding process, greatly reduces the amount of calculation, and improves the calculation speed of the first rate-distortion cost.
- the intra prediction mode of the current block is selected based on the first rate-distortion cost, the selection speed of the intra prediction mode can be effectively increased.
- the implementation methods of determining the intra-frame prediction mode of the current block from the N first intra-frame prediction modes include but are not limited to the following:
- the first intra-frame prediction mode with the smallest first rate-distortion cost among the N first intra-frame prediction modes is determined as the intra-frame prediction mode of the current block.
- the determination method has the advantages of simple process, small calculation amount and fast determination speed.
- the video encoder determines the intra prediction mode of the current block through the following steps S8011-A31 to S8011-A34:
- S8011-A32 Determine a reconstruction value corresponding to the second intra-frame prediction mode when the current block is encoded using the second intra-frame prediction mode.
- the second intra-frame prediction mode with the smallest second rate-distortion cost is determined as the intra-frame prediction mode of the current block.
- M second intra-frame prediction modes are roughly selected from the N first intra-frame prediction modes, and then the intra-frame prediction modes of the current block are finely selected from the M second intra-frame prediction modes.
- predictive mode Specifically, for each of the M second intra-frame prediction modes, add the prediction value corresponding to the second intra-frame prediction mode and the residual value to obtain the reconstruction value of the current block, The reconstruction value is recorded as the reconstruction value corresponding to the second intra prediction mode.
- the second rate-distortion cost J2 corresponding to each second intra-frame prediction mode from the M second intra-frame prediction modes may be determined.
- the second intra-frame prediction mode with the smallest second rate-distortion cost J2 is determined as the intra-frame prediction mode of the current block.
- coarse screening and fine screening are performed on the N first intra-frame prediction modes together to determine the intra-frame prediction mode of the current block.
- the above S8011 may also be implemented according to the method in Example 2 below.
- Example 2 in this example 2, first perform coarse screening on the first intra prediction mode other than the intra prediction mode based on the self-encoder, and combine the intra prediction mode based on the self-encoder with the coarsely screened first intra prediction mode Intra prediction modes are fine-screened together to increase the probability of using autoencoder-based intra prediction modes. That is, the above S8011 includes the following steps from S8011-B1 to S8011-B3:
- the third intra-frame prediction mode is the first intra-frame prediction mode except the intra-frame prediction mode based on the self-encoder among the N first intra-frame prediction modes;
- the first intra-frame prediction mode in the N first intra-frame prediction modes except the intra-frame prediction mode based on the self-encoder is recorded as the third intra-frame prediction mode, with a total of N- 1 third intra prediction mode.
- Q third intra-frame prediction modes with the smallest first rate-distortion cost are selected from the N-1 third intra-frame prediction modes.
- the prediction process and the rounding process of the encoding network are discarded, and the rounded value is predicted according to the preset rounding range of the first feature information
- P prediction values corresponding to the intra prediction mode based on the self-encoder are determined according to the P possible values of the first characteristic information.
- the preset rounding range of the first characteristic information may be 0, 1, or -1, 0, 1 and so on.
- determining the P predictive values corresponding to the intra prediction mode based on the self-encoder can be achieved by the following: according to the preset rounding of the first characteristic information Range, the P types of values of the first characteristic information output by the predictive coding network; the characteristic information under the P types of values and the pixel values of the reconstructed pixels around the current block are input into the decoding network to obtain the P types of output of the decoding network The predicted value under the value; determine the predicted value under the P kinds of values as the P predicted values corresponding to the intra prediction mode based on the self-encoder.
- the preset rounding range of the first characteristic information can be 0,1, and the first characteristic information is a 1 ⁇ 2 characteristic vector
- the first feature information is a 1 ⁇ n feature vector
- P 2 n
- n is a positive integer greater than or equal to 1.
- the above-mentioned R is smaller than P.
- the above-mentioned S8011-B3 selects R predicted values from the P predicted values, including the following methods:
- Method 1 randomly select R predicted values from P predicted values.
- R prediction values closest to the original value of the current block are selected from the P prediction values.
- Method 3 According to the distortion between the P predicted values and the original value of the current block, determine the fourth rate-distortion cost corresponding to the P predicted values; from the P predicted values, select R with the smallest fourth rate-distortion cost Predictive value.
- the fourth rate-distortion cost corresponding to each of the P predicted values may be equal to the distortion D1 between the predicted value and the original value of the current block.
- the fourth rate-distortion cost corresponding to the predicted value is equal to the distortion D1 between the predicted block and the original value of the current block, and the number of bits R1 consumed for encoding the flag bit of the intra-frame mode based on the self-encoder and. That is, rough screening out R predicted values from the P predicted values is performed as follows S8011-B4.
- Method 1 Compare the Q predictive values corresponding to the above Q third intra-frame prediction modes and the R predictive values corresponding to the intra-frame prediction modes based on the self-encoder with the original values of the current block, and the closest to the original The intra prediction mode corresponding to the preset value of the value is determined as the intra prediction mode of the current block.
- Method 2 Select the intra prediction mode of the current block through fine screening, that is, the above S8011-B4 includes the following steps from S8011-B41 to S8011-B43:
- the residual value corresponding to each of the Q predicted values is determined, and the residual value is added to the predicted value to obtain the reconstruction value corresponding to the preset value, and then Q reconstruction values are obtained.
- the residual value corresponding to each of the R predicted values is determined, and the residual value is added to the predicted value to obtain a reconstruction value corresponding to the predicted value, and then R reconstruction values are obtained.
- Q+R reconstruction values can be obtained, and for each reconstruction value in the Q+R reconstruction values, the distortion D3 between the reconstruction value and the original value of the current block is calculated, and the first reconstruction value corresponding to the reconstruction value is used.
- the sum of D3 and R3 is determined as the third rate-distortion cost corresponding to the reconstruction value for the number of bits R3 consumed when encoding the current block in an intra-frame prediction mode.
- the first intra-frame prediction mode with the smallest third rate-distortion cost is determined as the intra-frame prediction mode of the current block.
- the R predictive values based on the intra prediction mode of the autoencoder are added in the fine screening process of the intra prediction mode, and the selection probability of the intra prediction mode based on the autoencoder is increased.
- step S802 is performed.
- the intra prediction mode of the current block is the intra prediction mode based on an autoencoder, obtain an autoencoder corresponding to the current block, and the autoencoder includes an encoding network and a decoding network.
- the video encoder selects the autoencoder corresponding to the current block from different autoencoders according to the size of the current block.
- the original value of the current block (that is, the original pixel value) is input into the encoding network of the self-encoder, and the first characteristic information of the current block output by the encoding network is obtained.
- the first feature information and the pixel values of the reconstructed pixels around the current block are input into the decoding network to obtain the predicted block of the current block output by the decoding network.
- the current block includes luma components and/or chrominance components.
- the above S803 includes the following methods:
- Method 1 If it is determined that the luminance component of the current block uses the intra prediction mode based on the self-encoder, the original luminance value of the current block is input into the encoding network to obtain the first luminance characteristic information of the current block.
- Method 2 If it is determined that the chroma component of the current block uses the intra prediction mode based on the self-encoder, the original chroma value of the current block is input into the encoding network to obtain the first chroma characteristic information of the current block.
- Method 3 If it is determined that both the luma component and the chrominance component of the current block use the intra prediction mode based on the self-encoder, the original luma value and the original chrominance value of the current block are input into the encoding network to obtain the first Luminance characteristic information and first chroma characteristic information.
- the original luma value and original chrominance value of the current block can be input into the encoding network at the same time, or can be input into the encoding network one by one.
- the aforementioned S804 includes the following methods:
- Method 1 If the first characteristic information of the current block includes the first brightness characteristic information, input the first brightness characteristic information and the brightness values of reconstructed pixels around the current block into the decoding network to obtain the brightness prediction block of the current block.
- Mode 2 if the first characteristic information of the current block includes the first chroma characteristic information, then input the first chroma characteristic information and the chrominance values of the reconstructed pixels around the current block into the decoding network to obtain the chrominance of the current block degree prediction block.
- Method 3 If the first characteristic information of the current block includes the first luminance characteristic information and the first chrominance characteristic information, then the first luminance characteristic information and the first chrominance characteristic information, as well as the pixels of the reconstructed pixel points around the current block The value is input into the decoding network to obtain the luma prediction block and chrominance prediction block of the current block.
- S803 includes S803-A1 and S803-A2:
- the above S803-A1 includes the following methods:
- Way 1 If the first characteristic information of the current block includes the first brightness characteristic information, the value of the first brightness characteristic information is obtained to obtain the second brightness characteristic information of the current block.
- Mode 2 If the first characteristic information of the current block includes the first chrominance characteristic information, the value of the first chrominance characteristic information is obtained to obtain the second chrominance characteristic information of the current block.
- Mode 3 if the first characteristic information of the current block includes first luminance characteristic information and first chrominance characteristic information, the values of the first luminance characteristic information and the first chrominance characteristic information are respectively obtained to obtain the second characteristic information of the current block. Luma characteristic information and second chroma characteristic information.
- the aforementioned S803-A2 includes the following methods:
- Method 1 If the second characteristic information of the current block includes the second brightness characteristic information, input the second brightness characteristic information and the brightness values of reconstructed pixels around the current block into the decoding network to obtain the brightness prediction block of the current block.
- Method 2 if the second characteristic information of the current block includes the second chroma characteristic information, then input the second chroma characteristic information and the chrominance values of the reconstructed pixels around the current block into the decoding network to obtain the chrominance of the current block degree prediction block.
- Mode 3 if the second characteristic information of the current block includes the second brightness characteristic information and the second chrominance characteristic information, then the second brightness characteristic information and the second chrominance characteristic information, as well as the reconstructed pixels around the current block The value is input into the decoding network to obtain the luma prediction block and chrominance prediction block of the current block.
- the video encoding device writes the second characteristic information of the current block into the code stream.
- the second characteristic information of the current block includes second brightness characteristic information
- the second characteristic information of the current block includes second chroma characteristic information
- the second characteristic information of the current block includes second luminance characteristic information and second chrominance characteristic information
- the video coding device writes a first flag in the code stream, where the first flag is used to indicate whether the current sequence allows the use of an autoencoder-based intra prediction mode.
- the video encoding device if the value of the first flag is the first value, the video encoding device also writes a second flag in the code stream, and the second flag is used to indicate whether the current block uses the intra frame based on the autoencoder Prediction mode, the first value is used to indicate that the current sequence allows to use the intra prediction mode based on autoencoder.
- the video encoding device directly writes the second identifier into the code stream without writing the first identifier, so as to decode the codeword.
- the first flag is included in the sequence-level parameter code stream.
- the second flag is included in the coding unit syntax element.
- the second flag is used to indicate whether the luminance component and/or chrominance component of the current block uses an intra prediction mode based on an autoencoder.
- the intra-frame prediction mode of the current block is determined from the preset N first intra-frame prediction modes, N is a positive integer, and the N first intra-frame prediction modes include Intra-frame prediction mode based on self-encoder; if the intra-frame prediction mode of the current block is an intra-frame prediction mode based on self-encoder, then obtain the self-encoder corresponding to the current block, and the self-encoder includes encoding network and a decoding network; input the original value of the current block into the encoding network to obtain the first characteristic information of the current block output by the encoding network; combine the first characteristic information of the current block with the The pixel values of the reconstructed pixel points around the current block are input into the decoding network to obtain the predicted block of the current block output by the decoding network.
- the present application adds an intra-frame prediction mode based on an autoencoder, and enriches the intra-frame prediction mode. If it is determined that the intra prediction mode of the chrominance component of the current block is the intra prediction mode based on the self-encoder, the prediction block of the current block is determined according to the pixel value of the current block and the pixel value of the reconstructed pixel points around the current block, In the case that the original value of the current block has little correlation with the reconstructed value around the current block, since the prediction not only considers the pixel values of the reconstructed pixels around the current block, but also considers the feature information of the current block, Accurate prediction of the current block can be realized, and the accuracy of intra-frame prediction can be improved.
- sequence numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be used in this application.
- the implementation of the examples constitutes no limitation.
- the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. Specifically, A and/or B may mean: A exists alone, A and B exist simultaneously, and B exists alone.
- the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
- Fig. 19 is a schematic block diagram of a video decoder provided by an embodiment of the present application.
- the video decoder 10 includes:
- a mode determination unit 11 configured to decode the code stream and determine the intra prediction mode of the current block
- a feature determination unit 12 configured to decode the code stream to obtain feature information of the current block if the intra-frame prediction mode of the current block is an intra-frame prediction mode based on an autoencoder;
- An acquisition unit 13 configured to acquire pixel values of reconstructed pixel points around the current block
- the prediction unit 14 is configured to input the characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into the decoding network of the autoencoder corresponding to the current block, and obtain the output of the decoding network The prediction block of the current block.
- the mode determination unit 11 is specifically configured to decode the code stream to obtain a first flag, and the first flag is used to indicate whether the current sequence is allowed to use the intra prediction mode based on the self-encoder; according to the The first flag is used to determine the intra prediction mode of the current block.
- the mode determining unit 11 is specifically configured to decode the code stream to obtain a second flag if the value of the first flag is a first value, and the second flag is used to indicate that the current Whether the block uses the intra-frame prediction mode based on the self-encoder, the first value is used to indicate that the current sequence allows the use of the intra-frame prediction mode based on the self-encoder; according to the second flag, determine the current block Intra prediction mode.
- the mode determination unit 11 is specifically configured to decode the code stream to obtain a second flag, and the second flag is used to indicate whether the current block uses an intra prediction mode based on an autoencoder; according to The second flag determines the intra prediction mode of the current block.
- the first flag is included in a sequence-level parameter syntax element.
- the second flag is included in a coding unit syntax element.
- the second flag is used to indicate whether the luma component and/or chrominance component of the current block uses an intra prediction mode based on an autoencoder.
- the mode determination unit 11 is specifically configured to: if the second flag is used to indicate whether the luminance component and the chrominance component of the current block use an autoencoder-based intra prediction mode, then according to the second flag, to determine whether the luma component and chrominance component of the current block use the intra prediction mode based on the self-encoder;
- the second flag is used to indicate whether the luminance component of the current block uses an intra prediction mode based on an autoencoder, then according to the second flag, determine whether the luminance component of the current block uses an intra prediction mode based on an autoencoder Intra prediction mode;
- the second flag is used to indicate whether the chroma component of the current block uses an intra prediction mode based on an autoencoder, then according to the second flag, determine whether the chroma component of the current block uses an intra prediction mode based on an autoencoder Intra prediction mode.
- the feature determination unit 12 is specifically configured to decode the code stream to obtain the current block if both the luma component and the chrominance component of the current block use an intra-frame prediction mode based on an autoencoder luminance characteristic information and chromaticity characteristic information;
- the luminance component of the current block uses an intra-frame prediction mode based on an autoencoder, decoding the code stream to obtain luminance characteristic information of the current block;
- the code stream is decoded to obtain chroma characteristic information of the current block.
- the feature determining unit 12 is specifically configured to decode the syntax element corresponding to the luminance component of the current block to obtain the syntax element corresponding to the luminance component of the current block; In the syntax element, the brightness characteristic information of the current block is obtained.
- the feature determining unit 12 is specifically configured to decode the syntax element corresponding to the chroma component of the current block to obtain the syntax element corresponding to the chroma component of the current block; In the syntax element corresponding to the component, the chrominance characteristic information of the current block is obtained.
- the prediction unit 14 is specifically configured to combine the luminance characteristic information of the current block with the current
- the luminance values of the reconstructed pixels around the block are input into the decoding network to obtain the luminance prediction block of the current block, and the chromaticity characteristic information of the current block and the chromaticity of the reconstructed pixels around the current block
- the degree value is input into the decoding network to obtain the chroma prediction block of the current block;
- the characteristic information of the current block includes the luminance characteristic information of the current block
- the characteristic information of the current block includes the chroma characteristic information of the current block
- input the chroma characteristic information of the current block and the chrominance values of reconstructed pixels around the current block into the decoding network the chroma prediction block of the current block is obtained.
- an element value in the characteristic information of the current block is an integer.
- the characteristic information of the current block is obtained by rounding the characteristic information output by the activation function of the last layer of the encoding network of the autoencoder.
- the range of element values in the characteristic information output by the activation function of the last layer of the encoding network is [a, b], and the a and b are integers.
- the a is 0, and the b is 1.
- the expression of the activation function of the last layer of the encoding network is:
- the x is the input of the activation function of the last layer
- the S(x) is the characteristic information output by the activation function of the last layer.
- the a is -1, and the b is 1.
- the expression of the activation function of the last layer of the encoding network is:
- the x is the input of the activation function of the last layer
- the S(x) is the characteristic information output by the activation function of the last layer
- the n is a positive integer.
- the n is 10.
- the autoencoder adds noise to the original characteristic information output by the encoding network, and then inputs it into the decoding network.
- the original characteristic information output by the encoding network is input into the decoding network after taking values, and during the backward propagation, the Derivation operation is performed on the original characteristic information output by the encoding network to update the weight parameters in the encoding network.
- the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
- the video decoder 10 shown in FIG. 19 can execute the decoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the video decoder 10 are for realizing the above-mentioned decoding method and other methods. For the sake of brevity, the corresponding process will not be repeated here.
- Fig. 20 is a schematic block diagram of a video encoder provided by an embodiment of the present application.
- the video encoder 20 may include:
- the mode determination unit 21 is configured to determine the intra prediction mode of the current block from the preset N first intra prediction modes, where N is a positive integer, and the N first intra prediction modes include Intra prediction mode based on autoencoder;
- An acquisition unit 22 configured to acquire an autoencoder corresponding to the current block if the intra prediction mode of the current block is an autoencoder-based intraframe prediction mode, and the autoencoder includes an encoding network and a decoding network ;
- a characteristic determination unit 23 configured to input the original value of the current block into the encoding network, and obtain the first characteristic information of the current block output by the encoding network;
- a prediction unit 24 configured to input the first characteristic information of the current block and the pixel values of the reconstructed pixel points around the current block into the decoding network to obtain the prediction of the current block output by the decoding network piece.
- the prediction unit 24 is specifically configured to round the first characteristic information of the current block to obtain the second characteristic information of the current block; combine the second characteristic information with the current block
- the pixel values of the surrounding reconstructed pixels are input into the decoding network to obtain the predicted block of the current block output by the decoding network.
- the prediction unit 24 is further configured to write the second characteristic information of the current block into a code stream.
- the prediction unit 24 is further configured to write a first flag in the code stream, and the first flag is used to indicate whether the current sequence is allowed to use an intra prediction mode based on an autoencoder.
- the prediction unit 24 is further configured to write a second flag in the code stream if the value of the first flag is a first value, and the second flag is used to indicate the Whether the current block uses an autoencoder-based intra-frame prediction mode, and the first value is used to indicate that the current sequence allows using an autoencoder-based intra-frame prediction mode.
- the prediction unit 24 is further configured to write a second flag in the code stream, and the second flag is used to indicate whether the current block uses an intra prediction mode based on an autoencoder .
- the first flag is included in a sequence-level parameter syntax element.
- the second flag is included in a coding unit syntax element.
- the second flag is used to indicate whether the luma component and/or chrominance component of the current block uses an intra prediction mode based on an autoencoder.
- the mode determining unit 21 is specifically configured to determine the intra prediction mode of the current block from the N first intra prediction modes according to the rate-distortion cost.
- the mode determination unit 21 is specifically configured to determine a prediction value corresponding to the first intra-frame prediction mode when the first intra-frame prediction mode is used to encode the current block; according to the prediction The distortion between the value and the original value of the current block, and the number of bits consumed when encoding the flag bit of the first intra-frame prediction mode, determine the first rate-distortion cost of the first intra-frame prediction mode; Determine the intra-frame prediction mode of the current block from the N first intra-frame prediction modes according to the first rate-distortion cost.
- the mode determination unit 21 is specifically configured to select M second intra-frame prediction modes from the N first intra-frame prediction modes according to the first rate-distortion cost, where M is A positive integer smaller than the N; determine the reconstruction value corresponding to the second intra-frame prediction mode when the current block is encoded using the second intra-frame prediction mode; according to the reconstruction value and the current block Distortion between the original values of , and the number of bits consumed when using the second intra-frame prediction mode to encode the current block, determine the second rate-distortion cost of the second intra-frame prediction mode; Among the M second intra-frame prediction modes, the second intra-frame prediction mode with the smallest second rate-distortion cost is determined as the intra-frame prediction mode of the current block.
- the mode determination unit 21 is specifically configured to encode the flag of the third intra prediction mode according to the distortion between the predicted value corresponding to the third intra prediction mode and the original value of the current block
- the number of bits consumed during the bit time determines the first rate-distortion cost of the third intra-frame prediction mode
- the third intra-frame prediction mode is the N first intra-frame prediction mode except the one based on the self-encoder A first intra-frame prediction mode other than the intra-frame prediction mode
- the first rate-distortion cost select Q third intra-frame prediction modes from N-1 third intra-frame prediction modes, and the Q is a positive integer smaller than the N-1
- the preset rounding range of the first characteristic information determine P predictive values corresponding to the intra prediction mode based on the self-encoder, and from the P predictive values R predictive values are selected from among, the P and R are both positive integers, and the R is less than or equal to the P
- the Q predictive values corresponding to the Q third intra-frame prediction modes and
- the mode determination unit 21 is specifically configured to determine the Q reconstruction values corresponding to the Q prediction values, and the R reconstruction values corresponding to the R prediction values; according to the Q+R reconstruction values respectively Distortion between the original value of the current block and the number of bits consumed when encoding the current block using the first intra-frame prediction mode corresponding to the Q+R reconstruction values determines a third rate-distortion Cost: among the N first intra-frame prediction modes, the first intra-frame prediction mode with the smallest third rate-distortion cost is determined as the intra-frame prediction mode of the current block.
- the mode determining unit 21 is specifically configured to predict P types of values of the first characteristic information output by the encoding network according to the preset rounding range of the first characteristic information;
- the characteristic information under the values and the pixel values of the reconstructed pixels around the current block are input into the decoding network to obtain the predicted values under the P types of values output by the decoding network; the P types The predicted value under the value is determined as the P predicted values corresponding to the autoencoder-based intra prediction mode.
- the mode determination unit 21 is specifically configured to determine the P prediction values according to the distortion between the P prediction values and the original value of the current block. value corresponding to the fourth rate-distortion cost; from the P predictive values, select R predictive values with the smallest fourth rate-distortion cost.
- the feature determination unit 23 is specifically configured to input the original brightness value of the current block into the encoding network if it is determined that the brightness component of the current block uses an intra prediction mode based on an autoencoder , obtaining the first brightness characteristic information of the current block;
- the chroma component of the current block uses the intra prediction mode based on the self-encoder, then input the original chroma value of the current block into the encoding network to obtain the first chroma characteristic of the current block information;
- both the luminance component and the chrominance component of the current block use the intra prediction mode based on the self-encoder, then input the original luminance value and the original chrominance value of the current block into the encoding network to obtain the The first luminance characteristic information and the first chrominance characteristic information of the current block.
- the feature determination unit 23 is specifically configured to: if the first characteristic information of the current block includes the first brightness characteristic information, combine the first brightness characteristic information and the reconstructed The brightness value of the pixel is input into the decoding network to obtain the brightness prediction block of the current block;
- the first characteristic information of the current block includes the first chroma characteristic information
- the first characteristic information of the current block includes the first luminance characteristic information and the first chroma characteristic information, combining the first luminance characteristic information and the first chrominance characteristic information, and the The pixel values of the reconstructed pixels around the current block are input into the decoding network to obtain the luma prediction block and chrominance prediction block of the current block.
- the characteristic determining unit 23 is specifically configured to, if the first characteristic information of the current block includes the first brightness characteristic information, evaluate the first brightness characteristic information to obtain the current block second luminance characteristic information of the block;
- the first characteristic information of the current block includes the first chroma characteristic information, then evaluate the first chroma characteristic information to obtain second chroma characteristic information of the current block;
- the first characteristic information of the current block includes the first luminance characteristic information and the first chrominance characteristic information, then separately perform values for the first luminance characteristic information and the first chrominance characteristic information , to obtain second luminance characteristic information and second chrominance characteristic information of the current block.
- the prediction unit 24 is specifically configured to write the second brightness characteristic information into the code stream if the second characteristic information of the current block includes the second brightness characteristic information;
- the second characteristic information of the current block includes the second chroma characteristic information, write the second chroma characteristic information into the code stream;
- the second characteristic information of the current block includes the second luminance characteristic information and the second chroma characteristic information, write the second luminance characteristic information and the second chrominance characteristic information into the stream.
- the range of element values in the first characteristic information output by the activation function of the last layer of the encoding network is [a, b], and a and b are integers.
- the a is 0, and the b is 1.
- the expression of the activation function of the last layer of the encoding network is:
- the x is the input information of the activation function of the last layer
- the S(x) is the first characteristic information output by the activation function of the last layer.
- the a is -1, and the b is 1.
- the expression of the activation function of the last layer of the encoding network is:
- the x is the input information of the activation function of the last layer
- the S(x) is the first characteristic information output by the activation function of the last layer
- the n is a positive integer.
- the n is 10.
- the self-encoder adds noise to the first characteristic information output by the encoding network, and then inputs it into the decoding network.
- the autoencoder rounds the first characteristic information output by the encoding network during forward propagation and then inputs it into the decoding network;
- the derivation operation is performed on the first characteristic information output by the coding network to update the weight parameters in the coding network.
- the device embodiment and the method embodiment may correspond to each other, and similar descriptions may refer to the method embodiment. To avoid repetition, details are not repeated here.
- the video encoder 20 shown in FIG. 20 may correspond to the corresponding subject in the encoding method of the embodiment of the present application, and the aforementioned and other operations and/or functions of each unit in the video decoder 20 are for realizing the encoding
- the corresponding processes in each method, such as the method will not be repeated here.
- the functional unit may be implemented in the form of hardware, may also be implemented by instructions in the form of software, and may also be implemented by a combination of hardware and software units.
- each step of the method embodiment in the embodiment of the present application can be completed by an integrated logic circuit of the hardware in the processor and/or instructions in the form of software, and the steps of the method disclosed in the embodiment of the present application can be directly embodied as hardware
- the decoding processor is executed, or the combination of hardware and software units in the decoding processor is used to complete the execution.
- the software unit may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, and registers.
- the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
- Fig. 21 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
- the electronic device 30 may be the video encoder or video decoder described in the embodiment of the present application, and the electronic device 30 may include:
- a memory 33 and a processor 32 the memory 33 is used to store a computer program 34 and transmit the program code 34 to the processor 32 .
- the processor 32 can call and run the computer program 34 from the memory 33 to implement the method in the embodiment of the present application.
- the processor 32 can be used to execute the steps in the above-mentioned method 200 according to the instructions in the computer program 34 .
- the processor 32 may include, but is not limited to:
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field Programmable Gate Array
- the memory 33 includes but is not limited to:
- non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
- the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
- RAM Static Random Access Memory
- SRAM Static Random Access Memory
- DRAM Dynamic Random Access Memory
- Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
- SDRAM double data rate synchronous dynamic random access memory
- Double Data Rate SDRAM, DDR SDRAM double data rate synchronous dynamic random access memory
- Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous connection dynamic random access memory
- Direct Rambus RAM Direct Rambus RAM
- the computer program 34 can be divided into one or more units, and the one or more units are stored in the memory 33 and executed by the processor 32 to complete the present application.
- the one or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 34 in the electronic device 30 .
- the electronic device 30 may also include:
- a transceiver 33 the transceiver 33 can be connected to the processor 32 or the memory 33 .
- the processor 32 can control the transceiver 33 to communicate with other devices, specifically, can send information or data to other devices, or receive information or data sent by other devices.
- Transceiver 33 may include a transmitter and a receiver.
- the transceiver 33 may further include antennas, and the number of antennas may be one or more.
- bus system includes not only a data bus, but also a power bus, a control bus and a status signal bus.
- Fig. 22 is a schematic block diagram of a video codec system provided by an embodiment of the present application.
- the video codec system 40 may include: a video encoder 41 and a video decoder 42, wherein the video encoder 41 is used to execute the video encoding method involved in the embodiment of the present application, and the video decoder 42 is used to execute The video decoding method involved in the embodiment of the present application.
- the present application also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the computer can execute the methods of the above method embodiments.
- the embodiments of the present application further provide a computer program product including instructions, and when the instructions are executed by a computer, the computer executes the methods of the foregoing method embodiments.
- the present application also provides a code stream, which is generated according to the above encoding method.
- the code stream includes the above-mentioned first flag, or includes the first flag and the second flag.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
- the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a digital video disc (digital video disc, DVD)), or a semiconductor medium (such as a solid state disk (solid state disk, SSD)), etc.
- a magnetic medium such as a floppy disk, a hard disk, or a magnetic tape
- an optical medium such as a digital video disc (digital video disc, DVD)
- a semiconductor medium such as a solid state disk (solid state disk, SSD)
- the disclosed systems, devices and methods may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
- a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请提供一种视频编解码方法、设备、系统、及存储介质,若当前块的帧内预测模式为基于自编码器的帧内预测模式时,则解码码流,得到当前块的特征信息(S402);获取当前块周围已重建像素点的像素值(S403);将当前块的特性信息和当前块周围已重建像素点的像素值,输入当前块对应的自编码器的解码网络中,得到解码网络输出的当前块的预测块(S404)。即本申请增加了基于自编码器的帧内预测模式,为帧内预测提供更多选择。若当前块的帧内预测模式为基于自编码器的帧内预测模式时,根据当前块的特性信息和当前块周围已重建像素点的像素值,确定当前块的预测块,对于当前块的原始值与当前块周边的已重建值的相关性不大的情况下,由于预测时不仅考虑了当前块周围已重建像素点的像素值,还考虑了当前块的特征信息,可以实现对当前块的准确预测,提高帧内预测的准确性。
Description
本申请涉及视频编解码技术领域,尤其涉及一种视频编解码方法、设备、系统、及存储介质。
数字视频技术可以并入多种视频装置中,例如数字电视、智能手机、计算机、电子阅读器或视频播放器等。随着视频技术的发展,视频数据所包括的数据量较大,为了便于视频数据的传输,视频装置执行视频压缩技术,以使视频数据更加有效的传输或存储。
目前通过空间预测或时间预测来减少或消除视频数据中的冗余信息,以实现视频数据的压缩。预测方法包括帧间预测和帧内预测,其中帧内预测是基于同一帧图像中已经解码出的相邻块来预测当前块。
目前的帧内预测模式,均是基于当前块周边的已重建值,来预测当前块的预测值,若当前快的原始值与当前块周边的已重建值的相关性不大时,存在预测不准确的问题。
发明内容
本申请实施例提供了一种视频编解码方法、设备、系统、及存储介质,提出了一种基于自编码器的帧内预测模式,当当前快的原始值与当前块周边的已重建值的相关性不大时,可以实现对当前块的准确预测。
第一方面,本申请提供了一种视频编码方法,包括:
解码码流,确定当前块的帧内预测模式;
若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的特征信息;
获取所述当前块周围已重建像素点的像素值;
将所述当前块的特性信息和所述当前块周围已重建像素点的像素值,输入所述当前块对应的自编码器的解码网络中,得到所述解码网络输出的所述当前块的预测块。
第二方面,本申请实施例提供一种视频解码方法,包括:
从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,所述N为正整数,且所述N个第一帧内预测模式中包括基于自编码器的帧内预测模式;
若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则获取所述当前块对应的自编码器,所述自编码器包括编码网络和解码网络;
将所述当前块的原始值输入所述编码网络中,得到所述编码网络输出的所述当前块的第一特性信息;
将所述当前块的第一特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。
第三方面,本申请提供了一种视频编码器,用于执行上述第一方面或其各实现方式中的方法。具体地,该编码器包括用于执行上述第一方面或其各实现方式中的方法的功能单元。
第四方面,本申请提供了一种视频解码器,用于执行上述第二方面或其各实现方式中的方法。具体地,该解码器包括用于执行上述第二方面或其各实现方式中的方法的功能单元。
第五方面,提供了一种视频编码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行上述第一方面或其各实现方式中的方法。
第六方面,提供了一种视频解码器,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并 运行该存储器中存储的计算机程序,以执行上述第二方面或其各实现方式中的方法。
第七方面,提供了一种视频编解码系统,包括视频编码器和视频解码器。视频编码器用于执行上述第一方面或其各实现方式中的方法,视频解码器用于执行上述第二方面或其各实现方式中的方法。
第八方面,提供了一种芯片,用于实现上述第一方面至第二方面中的任一方面或其各实现方式中的方法。具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第九方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十方面,提供了一种计算机程序产品,包括计算机程序指令,该计算机程序指令使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十一方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第二方面中的任一方面或其各实现方式中的方法。
第十二方面,提供了一种码流,码流是基于上述第一方面的方法生成的。
基于以上技术方案,在视频编解码的帧内预测过程中,解码端通过解码码流,确定当前块的帧内预测模式;若当前块的帧内预测模式为基于自编码器的帧内预测模式时,则解码码流,得到当前块的特征信息;获取当前块周围已重建像素点的像素值;将当前块的特性信息和当前块周围已重建像素点的像素值,输入当前块对应的自编码器的解码网络中,得到解码网络输出的当前块的预测块。即本申请增加了基于自编码器的帧内预测模式,为帧内预测提供更多选择。若当前块的帧内预测模式为基于自编码器的帧内预测模式时,根据当前块的特性信息和当前块周围已重建像素点的像素值,确定当前块的预测块,对于当前快的原始值与当前块周边的已重建值的相关性不大的情况下,由于预测时不仅考虑了当前块周围已重建像素点的像素值,还考虑了当前块的特征信息,可以实现对当前块的准确预测,提高帧内预测的准确性。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图;
图2是本申请实施例涉及的视频编码器的示意性框图;
图3是本申请实施例涉及的视频解码器的示意性框图;
图4为本申请实施例涉及的参考像素示意图;
图5是HEVC的35种帧内预测模式示意图;
图6是VVC的67种帧内预测模式示意图;
图7是MIP帧内预测模式示意图;
图8为本申请实施例涉及的自编码器的一种网络结构示意图;
图9A为本申请实施例涉及的激活函数的一种曲线示意图;
图9B为本申请实施例涉及的激活函数的另一种曲线示意图;
图10为本申请实施例提供的视频解码方法的一种流程示意图;
图11为本申请实施例涉及的一种预测示意图;
图12为本申请实施例提供的视频解码方法的另一流程示意图;
图13为本申请实施例涉及的一种预测示意图;
图14为本申请实施例提供的视频解码方法的另一流程示意图;
图15为本申请实施例涉及的一种预测示意图;
图16为本申请实施例提供的视频解码方法的另一流程示意图;
图17为本申请实施例涉及的一种预测示意图;
图18为本申请实施例提供的视频编码方法的一种流程示意图;
图19是本申请一实施例提供的视频解码器的示意性框图;
图20是本申请一实施例提供的视频编码器的示意性框图;
图21是本申请实施例提供的电子设备的示意性框图;
图22是本申请实施例提供的视频编解码系统的示意性框图。
本申请可应用于图像编解码领域、视频编解码领域、硬件视频编解码领域、专用电路视频编解码领域、实时视频编解码领域等。例如,本申请的方案可结合至音视频编码标准(audio video coding standard,简称AVS),例如,H.264/音视频编码(audio video coding,简称AVC)标准,H.265/高效视频编码(high efficiency video coding,简称HEVC)标准以及H.266/多功能视频编码(versatile video coding,简称VVC)标准。或者,本申请的方案可结合至其它专属或行业标准而操作,所述标准包含ITU-TH.261、ISO/IECMPEG-1Visual、ITU-TH.262或ISO/IECMPEG-2Visual、ITU-TH.263、ISO/IECMPEG-4Visual,ITU-TH.264(还称为ISO/IECMPEG-4AVC),包含可分级视频编解码(SVC)及多视图视频编解码(MVC)扩展。应理解,本申请的技术不限于任何特定编解码标准或技术。
为了便于理解,首先结合图1对本申请实施例涉及的视频编解码系统进行介绍。
图1为本申请实施例涉及的一种视频编解码系统的示意性框图。需要说明的是,图1只是一种示例,本申请实施例的视频编解码系统包括但不限于图1所示。如图1所示,该视频编解码系统100包含编码设备110和解码设备120。其中编码设备用于对视频数据进行编码(可以理解成压缩)产生码流,并将码流传输给解码设备。解码设备对编码设备编码产生的码流进行解码,得到解码后的视频数据。
本申请实施例的编码设备110可以理解为具有视频编码功能的设备,解码设备120可以理解为具有视频解码功能的设备,即本申请实施例对编码设备110和解码设备120包括更广泛的装置,例如包含智能手机、台式计算机、移动计算装置、笔记本(例如,膝上型)计算机、平板计算机、机顶盒、电视、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机等。
在一些实施例中,编码设备110可以经由信道130将编码后的视频数据(如码流)传输给解码设备120。信道130可以包括能够将编码后的视频数据从编码设备110传输到解码设备120的一个或多个媒体和/或装置。
在一个实例中,信道130包括使编码设备110能够实时地将编码后的视频数据直接发射到解码设备120的一个或多个通信媒体。在此实例中,编码设备110可根据通信标准来调制编码后的视频数据,且将调制后的视频数据发射到解码设备120。其中通信媒体包含无线通信媒体,例如射频频谱,可选的,通信媒体还可以包含有线通信媒体,例如一根或多根物理传输线。
在另一实例中,信道130包括存储介质,该存储介质可以存储编码设备110编码后的视频数据。存储介质包含多种本地存取式数据存储介质,例如光盘、DVD、快闪存储器等。在该实例中,解码设备120可从该存储介质中获取编码后的视频数据。
在另一实例中,信道130可包含存储服务器,该存储服务器可以存储编码设备110编码后的视频数据。在此实例中,解码设备120可以从该存储服务器中下载存储的编码后的视频数据。可选的,该存储服务器可以存储编码后的视频数据且可以将该编码后的视频数据发射到解码设备120,例如web服务器(例如,用于网站)、文件传送协议(FTP)服务器等。
一些实施例中,编码设备110包含视频编码器112及输出接口113。其中,输出接口113可以包含调制器/解调器 (调制解调器)和/或发射器。
在一些实施例中,编码设备110除了包括视频编码器112和输入接口113外,还可以包括视频源111。
视频源111可包含视频采集装置(例如,视频相机)、视频存档、视频输入接口、计算机图形系统中的至少一个,其中,视频输入接口用于从视频内容提供者处接收视频数据,计算机图形系统用于产生视频数据。
视频编码器112对来自视频源111的视频数据进行编码,产生码流。视频数据可包括一个或多个图像(picture)或图像序列(sequence of pictures)。码流以比特流的形式包含了图像或图像序列的编码信息。编码信息可以包含编码图像数据及相关联数据。相关联数据可包含序列参数集(sequence parameter set,简称SPS)、图像参数集(picture parameter set,简称PPS)及其它语法结构。SPS可含有应用于一个或多个序列的参数。PPS可含有应用于一个或多个图像的参数。语法结构是指码流中以指定次序排列的零个或多个语法元素的集合。
视频编码器112经由输出接口113将编码后的视频数据直接传输到解码设备120。编码后的视频数据还可存储于存储介质或存储服务器上,以供解码设备120后续读取。
在一些实施例中,解码设备120包含输入接口121和视频解码器122。
在一些实施例中,解码设备120除包括输入接口121和视频解码器122外,还可以包括显示装置123。
其中,输入接口121包含接收器及/或调制解调器。输入接口121可通过信道130接收编码后的视频数据。
视频解码器122用于对编码后的视频数据进行解码,得到解码后的视频数据,并将解码后的视频数据传输至显示装置123。
显示装置123显示解码后的视频数据。显示装置123可与解码设备120整合或在解码设备120外部。显示装置123可包括多种显示装置,例如液晶显示器(LCD)、等离子体显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
此外,图1仅为实例,本申请实施例的技术方案不限于图1,例如本申请的技术还可以应用于单侧的视频编码或单侧的视频解码。
下面对本申请实施例涉及的视频编码框架进行介绍。
图2是本申请实施例涉及的视频编码器的示意性框图。应理解,该视频编码器200可用于对图像进行有损压缩(lossy compression),也可用于对图像进行无损压缩(lossless compression)。该无损压缩可以是视觉无损压缩(visually lossless compression),也可以是数学无损压缩(mathematically lossless compression)。
该视频编码器200可应用于亮度色度(YCbCr,YUV)格式的图像数据上。例如,YUV比例可以为4:2:0、4:2:2或者4:4:4,Y表示明亮度(Luma),Cb(U)表示蓝色色度,Cr(V)表示红色色度,U和V表示为色度(Chroma)用于描述色彩及饱和度。例如,在颜色格式上,4:2:0表示每4个像素有4个亮度分量,2个色度分量(YYYYCbCr),4:2:2表示每4个像素有4个亮度分量,4个色度分量(YYYYCbCrCbCr),4:4:4表示全像素显示(YYYYCbCrCbCrCbCrCbCr)。
例如,该视频编码器200读取视频数据,针对视频数据中的每帧图像,将一帧图像划分成若干个编码树单元(coding tree unit,CTU),在一些例子中,CTB可被称作“树型块”、“最大编码单元”(Largest Coding unit,简称LCU)或“编码树型块”(coding tree block,简称CTB)。每一个CTU可以与图像内的具有相等大小的像素块相关联。每一像素可对应一个亮度(luminance或luma)采样及两个色度(chrominance或chroma)采样。因此,每一个CTU可与一个亮度采样块及两个色度采样块相关联。一个CTU大小例如为128×128、64×64、32×32等。一个CTU又可以继续被划分成若干个编码单元(Coding Unit,CU)进行编码,CU可以为矩形块也可以为方形块。CU可以进一步划分为预测单元(prediction Unit,简称PU)和变换单元(transform unit,简称TU),进而使得编码、预测、变换分离,处理的时候更灵活。在一种示例中,CTU以四叉树方式划分为CU,CU以四叉树方式划分为TU、PU。
视频编码器及视频解码器可支持各种PU大小。假定特定CU的大小为2N×2N,视频编码器及视频解码器可支持 2N×2N或N×N的PU大小以用于帧内预测,且支持2N×2N、2N×N、N×2N、N×N或类似大小的对称PU以用于帧间预测。视频编码器及视频解码器还可支持2N×nU、2N×nD、nL×2N及nR×2N的不对称PU以用于帧间预测。
在一些实施例中,如图2所示,该视频编码器200可包括:预测单元210、残差单元220、变换/量化单元230、反变换/量化单元240、重建单元250、环路滤波单元260、解码图像缓存270和熵编码单元280。需要说明的是,视频编码器200可包含更多、更少或不同的功能组件。
可选的,在本申请中,当前块(current block)可以称为当前编码单元(CU)或当前预测单元(PU)等。预测块也可称为预测图像块或图像预测块,重建图像块也可称为重建块或图像重建图像块。
在一些实施例中,预测单元210包括帧间预测单元211和帧内估计单元212。由于视频的一个帧中的相邻像素之间存在很强的相关性,在视频编解码技术中使用帧内预测的方法消除相邻像素之间的空间冗余。由于视频中的相邻帧之间存在着很强的相似性,在视频编解码技术中使用帧间预测方法消除相邻帧之间的时间冗余,从而提高编码效率。
帧间预测单元211可用于帧间预测,帧间预测可以参考不同帧的图像信息,帧间预测使用运动信息从参考帧中找到参考块,根据参考块生成预测块,用于消除时间冗余;帧间预测所使用的帧可以为P帧和/或B帧,P帧指的是向前预测帧,B帧指的是双向预测帧。运动信息包括参考帧所在的参考帧列表,参考帧索引,以及运动矢量。运动矢量可以是整像素的或者是分像素的,如果运动矢量是分像素的,那么需要再参考帧中使用插值滤波做出所需的分像素的块,这里把根据运动矢量找到的参考帧中的整像素或者分像素的块叫参考块。有的技术会直接把参考块作为预测块,有的技术会在参考块的基础上再处理生成预测块。在参考块的基础上再处理生成预测块也可以理解为把参考块作为预测块然后再在预测块的基础上处理生成新的预测块。
帧内估计单元212只参考同一帧图像的信息,预测当前码图像块内的像素信息,用于消除空间冗余。帧内预测所使用的帧可以为I帧。例如图4所示,白色的4×4块是当前块,当前块左边一行和上面一列的灰色的像素为当前块的参考像素,帧内预测使用这些参考像素对当前块进行预测。这些参考像素可能已经全部可得,即全部已经编解码。也可能有部分不可得,比如当前块是整帧的最左侧,那么当前块的左边的参考像素不可得。或者编解码当前块时,当前块左下方的部分还没有编解码,那么左下方的参考像素也不可得。对于参考像素不可得的情况,可以使用可得的参考像素或某些值或某些方法进行填充,或者不进行填充。
帧内预测有多种预测模式,以国际数字视频编码标准H系列为例,H.264/AVC标准有8种角度预测模式和1种非角度预测模式,H.265/HEVC扩展到33种角度预测模式和2种非角度预测模式。
如图5所示,HEVC使用的帧内预测模式有平面模式(Planar)、DC和33种角度模式,共35种预测模式。
如图6所示,VVC使用的帧内模式有Planar、DC和65种角度模式,共67种预测模式。对于亮度分量有基于训练得到的预测矩阵(Matrix based intra prediction,MIP)预测模式,对于色度分量,有CCLM预测模式。
如图7所示,在MIP技术中,对于一个宽度为W,高度为H的矩形预测块,MIP会选取该块上方一行的W个重建像素点和左侧一列的H个重建像素点作为输入。如果这些位置的像素还未被重建,则未重建位置的像素会被置为默认值,例如对于10bit的像素,填充的默认值为512。MIP产生预测值主要基于三个步骤,分别是参考像素取均值,矩阵向量相乘和线性插值上采样。
MIP作用于4x4至64x64大小的块,对于一个长方形的预测块,MIP模式会根据矩形边长来选择合适的预测矩阵;对于短边为4的矩形,共有16套矩阵参数供选择;对于短边为8的矩形,共有8套矩阵参数供选择;其它矩形,共有6套矩阵参数供选择。MIP会利用供选择的矩阵进行预测,代价最小的一个矩阵的索引将会编入码流供解码端读取改矩阵参数用于预测。
需要说明的是,随着角度模式的增加,帧内预测将会更加精确,也更加符合对高清以及超高清数字视频发展的需求。
残差单元220可基于CU的像素块及CU的PU的预测块来产生CU的残差块。举例来说,残差单元220可产生 CU的残差块,使得残差块中的每一采样具有等于以下两者之间的差的值:CU的像素块中的采样,及CU的PU的预测块中的对应采样。
变换/量化单元230可量化变换系数。变换/量化单元230可基于与CU相关联的量化参数(QP)值来量化与CU的TU相关联的变换系数。视频编码器200可通过调整与CU相关联的QP值来调整应用于与CU相关联的变换系数的量化程度。
反变换/量化单元240可分别将逆量化及逆变换应用于量化后的变换系数,以从量化后的变换系数重建残差块。
重建单元250可将重建后的残差块的采样加到预测单元210产生的一个或多个预测块的对应采样,以产生与TU相关联的重建图像块。通过此方式重建CU的每一个TU的采样块,视频编码器200可重建CU的像素块。
环路滤波单元260可执行消块滤波操作以减少与CU相关联的像素块的块效应。
在一些实施例中,环路滤波单元260包括去块滤波单元和样点自适应补偿/自适应环路滤波(SAO/ALF)单元,其中去块滤波单元用于去方块效应,SAO/ALF单元用于去除振铃效应。
解码图像缓存270可存储重建后的像素块。帧间预测单元211可使用含有重建后的像素块的参考图像来对其它图像的PU执行帧间预测。另外,帧内估计单元212可使用解码图像缓存270中的重建后的像素块来对在与CU相同的图像中的其它PU执行帧内预测。
熵编码单元280可接收来自变换/量化单元230的量化后的变换系数。熵编码单元280可对量化后的变换系数执行一个或多个熵编码操作以产生熵编码后的数据。
图3是本申请实施例涉及的视频解码器的示意性框图。
如图3所示,视频解码器300包含:熵解码单元310、预测单元320、反量化/变换单元330、重建单元340、环路滤波单元350及解码图像缓存360。需要说明的是,视频解码器300可包含更多、更少或不同的功能组件。
视频解码器300可接收码流。熵解码单元310可解析码流以从码流提取语法元素。作为解析码流的一部分,熵解码单元310可解析码流中的经熵编码后的语法元素。预测单元320、反量化/变换单元330、重建单元340及环路滤波单元350可根据从码流中提取的语法元素来解码视频数据,即产生解码后的视频数据。
在一些实施例中,预测单元320包括帧内估计单元321和帧间预测单元322。
帧内估计单元321可执行帧内预测以产生PU的预测块。帧内估计单元321可使用帧内预测模式以基于空间相邻PU的像素块来产生PU的预测块。帧内估计单元321还可根据从码流解析的一个或多个语法元素来确定PU的帧内预测模式。
帧间预测单元322可根据从码流解析的语法元素来构造第一参考图像列表(列表0)及第二参考图像列表(列表1)。此外,如果PU使用帧间预测编码,则熵解码单元310可解析PU的运动信息。帧间预测单元322可根据PU的运动信息来确定PU的一个或多个参考块。帧间预测单元322可根据PU的一个或多个参考块来产生PU的预测块。
反量化/变换单元330可逆量化(即,解量化)与TU相关联的变换系数。反量化/变换单元330可使用与TU的CU相关联的QP值来确定量化程度。
在逆量化变换系数之后,反量化/变换单元330可将一个或多个逆变换应用于逆量化变换系数,以便产生与TU相关联的残差块。
重建单元340使用与CU的TU相关联的残差块及CU的PU的预测块以重建CU的像素块。例如,重建单元340可将残差块的采样加到预测块的对应采样以重建CU的像素块,得到重建图像块。
环路滤波单元350可执行消块滤波操作以减少与CU相关联的像素块的块效应。
视频解码器300可将CU的重建图像存储于解码图像缓存360中。视频解码器300可将解码图像缓存360中的重建图像作为参考图像用于后续预测,或者,将重建图像传输给显示装置呈现。
视频编解码的基本流程如下:在编码端,将一帧图像划分成块,针对当前块,预测单元210使用帧内预测或帧间预测产生当前块的预测块。残差单元220可基于预测块与当前块的原始块计算残差块,即预测块和当前块的原始块的差值,该残差块也可称为残差信息。该残差块经由变换/量化单元230变换与量化等过程,可以去除人眼不敏感的信息,以消除视觉冗余。可选的,经过变换/量化单元230变换与量化之前的残差块可称为时域残差块,经过变换/量化单元230变换与量化之后的时域残差块可称为频率残差块或频域残差块。熵编码单元280接收到变化量化单元230输出的量化后的变化系数,可对该量化后的变化系数进行熵编码,输出码流。例如,熵编码单元280可根据目标上下文模型以及二进制码流的概率信息消除字符冗余。
在解码端,熵解码单元310可解析码流得到当前块的预测信息、量化系数矩阵等,预测单元320基于预测信息对当前块使用帧内预测或帧间预测产生当前块的预测块。反量化/变换单元330使用从码流得到的量化系数矩阵,对量化系数矩阵进行反量化、反变换得到残差块。重建单元340将预测块和残差块相加得到重建块。重建块组成重建图像,环路滤波单元350基于图像或基于块对重建图像进行环路滤波,得到解码图像。编码端同样需要和解码端类似的操作获得解码图像。该解码图像也可以称为重建图像,重建图像可以为后续的帧作为帧间预测的参考帧。
需要说明的是,编码端确定的块划分信息,以及预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息等在必要时携带在码流中。解码端通过解析码流及根据已有信息进行分析确定与编码端相同的块划分信息,预测、变换、量化、熵编码、环路滤波等模式信息或者参数信息,从而保证编码端获得的解码图像和解码端获得的解码图像相同。
上述是基于块的混合编码框架下的视频编解码器的基本流程,随着技术的发展,该框架或流程的一些模块或步骤可能会被优化,本申请适用于该基于块的混合编码框架下的视频编解码器的基本流程,但不限于该框架及流程。
目前的帧内预测模式,均是基于当前块周边的已重建值,来预测当前块的预测值,若当前快的原始值与当前块周边的已重建值的相关性不大时,存在预测不准确的问题。
为了解决上述技术问题,本申请实施例提供一种基于自编码器的帧内预测模式,当当前快的原始值与当前块周边的已重建值的相关性不大时,该基于自编码器的帧内预测模式可以实现对当前块的准确预测。
图8为本申请实施例涉及的自编码器的一种网络结构示意图,如图8所示,该自编码器包括编码网络和解码网络。
如图8所示,编码网络包括4个全连接层(fully connected layer,简称FCL)和4个激活函数,其中一个全连接层后面跟随一个激活函数,用于对全连接层输出的特性信息进行非线性变换后输入下一个全连接层中。编码网络中最后一个激活函数的输出为该编码网络的输出。每个全连接层包括128个节点,前3个激活函数为leaky relu激活函数,最后一个激活函数为sigmoid激活函数。需要说明的是,图8只是编码网络的一种示例,本申请的编码网络包括但不限于图8所示,例如包括比图8更多或更少的全连接层和激活函数,每个全连接层包括的节点不限于128个节点,且各连接层包括的节点可以相同,也可以不同。可选的,图8中的全连接层可以替换为卷积层,leaky relu激活函数可以替换成其他激活函数,例如替换成relu,elu等激活函数。
如图8所示,解码网络包括4个全连接层和3个激活函数,前面3个全连接层中的每个全连接层后面跟随一个激活函数,用于对全连接层输出的特性信息进行非进行非线性变换后输入下一个全连接层中。解码网络中的最后一个全连接层的输出作为该解码网络的输出。解码网络的网络结构与编码网络的网络结构对称,解码网络中的每个全连接层包括128个节点,3个激活函数为leaky relu激活函数。需要说明的是,图8只是解码网络的一种示例,本申请的编码网络包括但不限于图8所示,例如包括比图8更多或更少的全连接层和激活函数,每个全连接层包括的节点不限于128个节点,且各连接层包括的节点可以相同,也可以不同。可选的,图8中的全连接层可以替换为卷积层,leaky relu激活函数可以替换成其他激活函数,例如替换成relu,elu等激活函数。
继续参照图8所示,在实际预测时,将当前块的原始像素值输入编码网络,该编码网络的最后一个激活函数输出 该当前块的特性信息,记为Side Information(SI),该特征信息为低维度的特征图(feature map)。接着,将编码网络的最后一个激活函数输出该当前块的特性信息,以及当前块周围已重建像素点的像素值输入解码网络中,经过解码网络的处理,输出当前块的预测块。
由图8可知,基于自编码器的帧内预测模式,在预测当前块的预测块时,不仅考虑了当前块周围已重建像素点的像素值,而且考虑了当前块的原始像素值,当当前快的原始值与当前块周边的已重建值的相关性不大时,可以实现对当前块的准确预测,进而提高帧内预测的准确性。
如图8所示,编码端在确定当前块的预测块时,需要将编码网络输出的当前块的特性信息,以及当前块周围已重建像素点的像素值输入解码网络,解码网络输出当前块的预测块。对于解码端,为了解码端准确确定出当前块的预测块,则需要编码端需要将上述编码网络输出的当前块的特性信息携带在码流中,发送给解码端。这样解码端从码流中解析出当前块的特性信息,将当前块的特性信息和当前块周围已重建像素点的像素值输出自编码器的解码网络中,得到当前块的预测块,进而根据当前块的预测块和码流中解析出的当前块的残差块,得到当前块的重建块。
也就是说,编码网络输出的特性信息需要写入码流,为了降低特性信息所占用的码子,则需要对编码网络输出的特性信息进行取值,将取值后的特性信息写入码流。
为了降低取值后的特性信息所占用的码子,则编码网络输出的特性信息的取值需要满足有限范围内的连续分布的数。例如,编码网络的最后一层激活函数输出的特性信息中的元素值的取值范围为[a,b],其中a和b为整数。
示例1,a为0,b为1,即编码网络的最后一层激活函数输出的特性信息中的元素值的取值范围为[0,1],这样对最后一层激活函数输出的特性信息进行取整时,取整结果等于0或1,可以使用1位bit表示,例如bit0表示数值0,bit1表示数值1。
可选的,编码网络的最后一层激活函数的表达式如公式(1)所示:
其中,x为最后一层激活函数的输入,即编码网络中最后一层激活函数之前的层输出的特性值,S(x)为最后一层激活函数输出的特性信息。
编码网络的最后一层激活函数采用上述公式(1)时,如图9A所示,可以使得编码网络输出的特性信息中的每个元素值均限制在0到1之间,在后续取整时,取整为0或1,便于将特性信息编入码流。
需要说明的是,在该示例一中,最后一层激活函数的表达式包括但不限于上述公式(1),还可以是其他任意将编码网络的输出约束在[0,1]之间的表达式。
示例二,a为-1,b为1,即编码网络的最后一层激活函数输出的特性信息中的元素值的取值范围为[-1,1],这样对最后一层激活函数输出的特性信息进行取整时,取整结果为-1,0或1。对于取整后的数,可以首先用一位二元符号表示取整后的数是否为0,若不是0,则再用一位二元符号表示正负。如编写或读取0表示0,编写或读取10表示-1,编写11表示1。
可选的,编码网络的最后一层激活函数的表达式如公式(2)所示:
其中,x为最后一层激活函数的输入,即编码网络中最后一层激活函数之前的层输出的特性值,S(x)为最后一层激活函数输出的特性信息,n为正整数,可选的n=10。
编码网络的最后一层激活函数的表达式采用上述公式(2)时,如图9B所示,可以使得编码网络输出的特性信息中的每个元素值均限制在-1到1之间,在取整时,取整结果为-1、0或1,便于将特性信息编入码流。
需要说明的是,在该示例二中,最后一层激活函数的表达式包括但不限于上述公式(2),还可以是其他任意将编码网络的输出约束在[-1,1]之间的表达式。
在一些实施例中,为了使编码网络的最后一个层激活函数的输出满足不同的编码方法,还可以将最后一个层激活函数输出的特性信息进行放大或缩小。例如,在上述公式(1)输出的结果上乘2,则可以使编码网络输出的特性信息的限制范围从0~1变为0~2,进而对0~2内特性信息进行取整,取整结果为0、1和2。可选的,还可以在上述公式(1)输出的结果上乘2后减1,则可以使编码网络输出的特性信息的限制范围从0~1变为-1~1,在取整时,取整结果为-1,0和1。
本申请实施例的自编码器,通过将编码网络的最后一层激活函数输出的特性信息中的元素值限制在一定的范围内例如,限制在[0,1]或[-1,1]范围内,便于最后一层激活函数输出的特性信息取整后进行编码。另外,本申请实施例提出了最后一层激活函数的两个不同的表达式,其实现运算过程简单,计算量小,在保证最后一层激活函数输出的特性信息约束在一定范围内,提高了编码网络的编码效率。
下面对自编码器的训练过程进行介绍。
本申请实施例的自编码器的编码网络和解码网络是同步进行优化训练的,具体过程是,将训练编码块输入编码网络中,编码网络输出该训练编码块的特性信息,对特性信息取值后,与该训练编码块周围已重建像素点的像素值输入解码网络中,解码网络输出训练编码块的预测块。根据解码网络输出的预测块与该训练编码块的预测块真值,计算损失,根据该损失反向更新自编码器中每一层的权重,完成自编码器的训练。而每一层权重的更新过程是通过对每一层输出进行求导来实现的,然而由上述可知,编码网络输出的特征信息需要进行取整,而取整这一过程是不可导的。本申请实施例通过如下两种方式,来解决该技术问题:
方式一,对编码网络最后一层激活函数输出的特性信息加上一个均匀分布的噪声,该噪声的取值范围可以为-0.5~0.5,从而模拟取整过程。以上述公式(1)所示的最后一层激活函数为例,将该最后一层激活函数输出的取值范围为0~1的特性信息上增加噪声,加噪后特性信息的取值范围为-0.5~1.5,作为解码网络的输入。
方式二,自编码器在训练过程中,正向计算是将取整后的离散值直接作为解码网络的输入,即正向计算时将最后一层激活函数输出的特性信息直接进行四舍五后输入解码网络,例如如下公式(3)所示:
其中,B(x)为取整后的特征信息,S(x)为最后一层激活函数输出的特性信息,∟┑表示取整。
反向计算求导时,对取整前的连续值进行求导,例如公式(4)所示,在反向计算导数时,直接使用最后一层激活函数输出的特性信息进行反向求导传播。
B′(x)=S′(x) (4)
其中,B’(x)表示对最后一层激活函数输出的特性信息S(x)进行求导后的值。
本申请实施例在自编码器训练过程中,通过上述两种方式进行处理,提高了自编码器的训练准确性,进而使用该自编码器进行帧内预测时,可以保证帧内预测的准确性。
在一些实施例中,本申请还针对不同大小的预测块分别训练出多种自编码器,例如对32×32,32×16,32×8,32×4,16×16,16×8,16×4,8×8,8×4,4×4等形状的块进行训练,得到上述各块对应的自编码器。
可选的,上述各形状的块可以是包括亮度分量的亮度块,进而训练得到用于预测不同形状的亮度块的预测值的自编码器。
可选的,上述各形状的块可以是包括色度分量的色度块,进而训练得到用于预测不同形状的色度块的预测值的自编码器。
可选的,上述各形状的块包括亮度分量和色度分量,进而训练得到用于预测不同形状的块的亮度预测值和色度预测值的自编码器。
本申请对训练自编码器的训练数据不做限制,可选的,可以使用DIV2K数据集作为训练集,来训练得到不同形状 块对应的自编码器。
本申请对自编码器的编码网络输出的特征信息的维度不做限制。可选的,编码网络输出的特征信息为N×M的特征向量,例如该特征向量为1×2的向量,例如为(0,1);再例如,该特征向量为1×3的向量,例如为(0,1,1)。
在一些实施例中,对于a×b和b×a这两种形状的块,在训练时可以只训练a×b的块对应的自编码器,b×a的块在帧内预测时,可以使用a×b的块对应的自编码器。具体是,对b×a的块转置成a×b大小的块,将转置后的a×b大小的块输入自编码器,得到自编码器输出的a×b大小的预测块,对a×b大小的预测块进行转置成b×a大小的预测块,作为b×a的块的预测块。
综上,本申请实施例可以训练得到不同形状的块对应的自编码器,在对不同形状的块进行帧内预测时,可以选择该形状块对应的自编码器,进而保证了对不同形状的块进行帧内预测的准确性。
上文对自编码器的网络结构和训练过程进行了介绍,基于此,下面结合图10,以解码端为例,对本申请实施例提供的视频解码方法进行介绍。
图10为本申请实施例提供的视频解码方法的一种流程示意图,本申请实施例应用于图1和图2所示视频解码器。如图10所示,本申请实施例的方法包括:
S401、解码码流,确定当前块的帧内预测模式。
在一些实施例中,当前块也称为当前解码块、当前解码单元、解码块、待解码块、待解码的当前块等。
上述S401中确定当前块的帧内预测模式的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,则上述S401包括如下S401-A1和S401-A2:
S401-A1、解码端解码码流,得到该第一标志,该第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式;
S401-A2、根据第一标志,确定当前块的帧内预测模式。
例如,若该第一标志的值等于第二数值(例如0)时,表示该当前序列不允许使用基于自编码器的帧内预测模式,此时,则可以确定当前块的帧内预测模式不是基于自编码器的帧内预测模式。
再例如,若该第一标志的值等于第一数值(例如1)时,表示当前序列允许使用基于自编码器的帧内预测模式,此时,上述S401-A2中根据第一标志,确定当前块的帧内预测模式的实现方式包括但不限于如下几种:
第一种可实现方式,若第一标志的值等于第一数值,则确定当前块的帧内预测模式为基于自编码器的帧内预测模式。也就是说,在该实现方式中,若第一标志的值等于第一数值,则表示当前序列允许使用基于自编码器的帧内预测模式,且当前序列中的解码块的帧内预测模式均为基于自编码器的帧内预测模式。
第二种可实现方式,码流中包括第一标志和第二标志,此时,若第一标志的值等于第一数值,则解码端解码码流,得到第二标志,该第二标志用于指示当前块是否使用基于自编码器的帧内预测模式,并根据该第二标志,确定当前块的帧内预测模式。例如第二标志的取值为数值1(例如1)时,表示该当前块的帧内预测模式为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1,则表示当前块的帧内预测模式为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
上述方式一中,解码端可以根据从码流中解码出的第一标志来确定当前块的帧内预测模式,或者,根据第一标志和第二标志来确定当前块的帧内预测模式。
方式二,若码流中包括第二标志,而不包括第一标志时,上述S401包括如下S401-B1和S401-B2:
S401-B1、解码码流,得到第二标志;
S401-B2、根据第二标志,确定当前块的帧内预测模式。
在该方式二中,码流中包括直接指示当前块是否使用基于自编码器的帧内预测模式的第二标志,而不包括上述第 一标志时,解码端解码码流,得到第二标志,直接根据该第二标志,确定当前块的帧内预测模式,例如,第二标志的取值为数值1(例如1)时,表示该当前块的帧内预测模式为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1时,则表示当前块的帧内预测模式为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
该方式二中,在码流中不写入第一标志,而是直接写入第二标志,用于指示当前块是否使用基于自编码器的帧内预测模式,进而节约码字,降低解码端的解码负担。
可选的,上述第二标志可以为码流中新增的标志。
可选的,上述第二标志为码流中已有的用于指示帧内预测模式的指示标志,本实施例对该帧内预测模式的指示标志的取值进行扩展,增加了指示基于自编码器的帧内预测模式的取值。例如,对已有的帧内预测模式的指示标志的取值进行扩展,当帧内预测模式的指示标志的取值为数值1时,表示当前块的帧内预测模式为基于自编码器的帧内预测模式。该方式不需要在码流中额外增加表示第二标志的字段,进而节约了码字,提高解码效率。
本申请实施例对该第一标志和第二标志在码流中具体写入位置不做限制。
可选的,第一标志包括在序列级参数语法元素中。
例如,以VVC Draft 10为例,第一标志添加在序列级参数语法中,序列级参数语法(Sequence parameter set RBSP syntax)的改动如表1所示:
表1
其中,sps_ae_enabled_flag表示第一标志。
可选的,上述第二标志包括在编码单元语法元素中。
根据上述方式,解码码流,得到当前块的帧内预测模式后,执行如下S402的步骤。
S402、若当前块的帧内预测模式为基于自编码器的帧内预测模式,则解码码流,得到当前块的特征信息。
由上述自编码器的工作原理可知,编码端使用自编码对当前块进行帧内预测时,自编码器的编码网络输出当前块的第一特征信息后,对该第一特征信息进行取整后,得到当前块的特征信息,并将当前块的特征信息编入码流,以使解码端根据该当前块的特征信息,确定当前块的预测块。
基于此,解码端执行上述S401,若确定出当前块的帧内预测模式为基于自编码器的帧内预测模式时,则继续解码码流,得到当前块的特征信息。
本申请实施例对当前块的特征信息在码流中的具体位置不做限制,可选的,当前块的特征信息位于编码单元语法元素中,解码端解码编码单元语法元素,得到当前块的特征信息。
S403、获取当前块周围已重建像素点的像素值。
由上述自编码器的工作原理可知,将取整后当前块的特征信息和当前块周围已重建像素点的像素值,输入自编码器的解码网络中,解码网络输出当前块的预测块。
基于此,解码端基于自编码器的帧内预测模式来预测当前块的预测块时,还需要获取当前块周围已重建像素点的像素值。
示例性,当前块周围已重建像素点包括当前块上方n行像素点和/或左层m列像素点,n与m均为正整数,n与m可以相等,也可以不相等。上述n行像素点中的每一行像素点,可以连续,也可以不连续。m列像素点中的每一列像素点,可以连续,也可以不连续。可选的,上述n行像素点可以相邻也可以不相邻,m列像素点可以相邻,也可以不 相邻。
S404、将当前块的特性信息和当前块周围已重建像素点的像素值,输入当前块对应的自编码器的解码网络中,得到解码网络输出的当前块的预测块。
在一些实施例中,不同形状的块对应的自编码器可能不同,因此,解码端可以根据当前块的大小,从多个自编码器中,选出当前块对应的自编码器。接着,如图11所示,将上述获得当前块的特征信息和当前块周围已重建像素点的像素值,输入当前块对应的自编码器的解码网络中,得到解码网络输出的当前块的预测块。
解码端根据上述方法得到当前块的预测块后,解码码流,得到当前块的残差块,预测块与残差块相加,得到当前块的重建块。
本申请实施例的解码方法,通过解码码流,确定当前块的帧内预测模式;若当前块的帧内预测模式为基于自编码器的帧内预测模式时,则解码码流,得到当前块的特征信息;获取当前块周围已重建像素点的像素值;将当前块的特性信息和当前块周围已重建像素点的像素值,输入当前块对应的自编码器的解码网络中,得到解码网络输出的当前块的预测块。即本申请实施例增加了基于自编码器的帧内预测模式,为帧内预测提供更多选择。若确定当前块的帧内预测模式为基于自编码器的帧内预测模式时,根据当前块的特性信息和当前块周围已重建像素点的像素值,确定当前块的预测块,在当前快的原始值与当前块周边的已重建值的相关性不大的情况下,由于预测时不仅考虑了当前块周围已重建像素点的像素值,还考虑了当前块的特征信息,可以实现对当前块的准确预测,提高帧内预测的准确性。
在一些实施例中,当前块包括亮度分量和/或色度分量,则上述第二标志用于指示当前块的亮度分量和/或色度分量是否使用基于自编码器的帧内预测模式。下面针对当前块的不同分量,对本申请实施例的解码方法进行介绍。
图12为本申请实施例提供的视频解码方法的另一流程示意图,本申请实施例以第二标志用于指示当前块的亮度分量是否使用基于自编码器的帧内预测模式为例。如图12所示,包括:
S501、解码码流,确定当前块的亮度分量的帧内预测模式。
上述S501中确定当前块的亮度分量的帧内预测模式的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,则上述S501包括如下S501-A1和S501-A2:
S501-A1、解码端解码码流,得到该第一标志,该第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式;
S501-A2、根据第一标志,确定当前块的亮度分量的帧内预测模式。
例如,若该第一标志的值等于第二数值(例如0)时,表示该当前序列不允许使用基于自编码器的帧内预测模式,此时,则可以确定当前块的亮度分量的帧内预测模式不是基于自编码器的帧内预测模式。
再例如,若该第一标志的值等于第一数值(例如1)时,表示当前序列允许使用基于自编码器的帧内预测模式,此时,上述S501-A2中根据第一标志,确定当前块的亮度分量的帧内预测模式的实现方式包括但不限于如下几种:
第一种可实现方式,若第一标志的值等于第一数值,且当前块的亮度分量具有基于自编码器的帧内预测模式,则确定当前块的亮度分量的帧内预测模式为基于自编码器的帧内预测模式。
第二种可实现方式,码流中包括第一标志和第二标志,此时,若第一标志的值等于第一数值,则解码端解码码流,得到第二标志,该第二标志用于指示当前块的亮度分量是否使用基于自编码器的帧内预测模式,并根据该第二标志,确定当前块的亮度分量的帧内预测模式。例如第二标志的取值为数值1(例如1)时,表示该当前块的亮度分量的帧内预测模式为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1,则表示当前块的亮度分量的帧内预测模式为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
上述方式一中,解码端可以根据从码流中解码出的第一标志来确定当前块的亮度分量的帧内预测模式,或者,根据第一标志和第二标志来确定当前块的亮度分量的帧内预测模式。
方式二,若码流中包括第二标志,而不包括第一标志时,上述S501包括如下S501-B1和S501-B2:
S501-B1、解码码流,得到第二标志;
S501-B2、根据第二标志,确定当前块的亮度分量是否使用基于自编码器帧内预测模式。
在该方式二中,码流中包括直接指示当前块的亮度分量是否使用基于自编码器的帧内预测模式的第二标志,而不包括上述第一标志时,解码端解码码流,得到第二标志,直接根据该第二标志,确定当前块的亮度分量的帧内预测模式,例如,第二标志的取值为数值1(例如1)时,表示该当前块的亮度分量的帧内预测模式为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1时,则表示当前块的亮度分量的帧内预测模式为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
该方式二中,在码流中不写入第一标志,而是直接写入第二标志,用于指示当前块的亮度分量是否使用基于自编码器的帧内预测模式,进而节约码字,降低解码端的解码负担。
本申请实施例对该第一标志和第二标志在码流中具体写入位置不做限制。
可选的,第一标志包括在序列级参数语法元素中。
可选的,上述第二标志包括在编码单元语法元素中。
S502、若当前块的亮度分量使用基于自编码器的帧内预测模式,则解码码流,得到当前块的亮度特性信息。
本申请实施例中,对当前块的亮度特征信息在码流中的具体写入位置不做限制,例如可以携带在码流中的任意位置处。
在一种示例中,当前块的亮度特征信息携带在当前块的亮度分量对应的语法元素。此时,解码端通过解码当前块的亮度分量对应的语法元素;在当前块的亮度分量对应的语法元素中,获得当前块的亮度特性信息。
在另一种示例中,当前块的亮度特征信息携带在编码单元语法元素中。
可选的,在该编码单元语法元素中还包括第二标志intra_ae_flag。
具体的,解码端编码单元语法(Coding unit syntax)码流,得到编码单元语法元素,并从中读取第二标志intra_ae_flag,该intra_ae_flag为编码单元级控制标志,用于表示当前块的亮度分量是否使用基于自编码器的帧内预测模式。若此intra_ae_flag为1,则表示当前块的亮度分量的使用于自编码器的帧内预测模式,则进一步读取当前块的亮度特征信息sideinfo[]。
在一种示例中,以VVC Draft 10为例,Coding unit syntax的相关语法表中改动如表2所示:
表2
上表2中,si_size表示需要编码多少个特征信息元素。sideinfo[]表示亮度特征信息,sideinfo[]使用1位定长码(u(1))解码,是针对sideinfo[]的取值为0或1时的情况,也可以采用基于上下文模型的方式解码。
可选的,若sideinfo取值范围更大时,可采用多位的码字来解码,例如sideinfo[]取值范围为-1,0,1时,Coding unit syntax的相关语法表中改动如表3所示:
表3
上述表3中,abssideinfo[]为sideinfo[]的绝对值,当其不为0时进一步解码它的符号。同理除定长码解码外,也可以使用上下文模型。
可选的,当sideinfo[]取值范围更大时,可以采用更多位码字来表示当前块的亮度特征信息sideinfo[]。
S503、获取当前块周围已重建像素点的亮度值。
示例性,当前块周围已重建像素点包括当前块上方n行像素点和/或左层m列像素点,n与m均为正整数,n与m可以相等,也可以不相等。上述n行像素点中的每一行像素点,可以连续,也可以不连续。m列像素点中的每一列像素点,可以连续,也可以不连续。可选的,上述n行像素点可以相邻也可以不相邻,m列像素点可以相邻,也可以不相邻。
S504、将当前块的亮度特性信息和当前块周围已重建像素点的亮度值,输入解码网络中,得到当前块的亮度预测块。
在一些实施例中,不同形状的块对应的自编码器可能不同,色度分量和亮度分量对应的自编码器可能也不相同,因此,解码端可以根据当前块的亮度分量的大小,从多个自编码器中,选出当前块的亮度分量对应的自编码器。接着,如图13所示,将上述获得当前块的亮度特性信息和当前块周围已重建像素点的亮度值,输入当前块的亮度分量对应的自编码器中,得到解码网络输出的当前块的亮度预测块。
本申请实施例的解码方法,通过解码码流,确定当前块的亮度分量的帧内预测模式;若当前块的亮度分量的帧内预测模式为基于自编码器的帧内预测模式时,则解码码流,得到当前块的亮度特征信息;获取当前块周围已重建像素点的亮度值;将当前块的亮度特性信息和当前块周围已重建像素点的亮度值,输入当前块的亮度分量对应的自编码器的解码网络中,得到解码网络输出的当前块的亮度预测块。即本申请实施例在亮度帧内预测时增加了基于自编码器的帧内预测模式,丰富了亮度分量帧内预测模式。若确定当前块的亮度分量的帧内预测模式为基于自编码器的帧内预测模式时,根据当前块的亮度特性信息和当前块周围已重建像素点的亮度值,确定当前块的亮度预测块,在当前快的亮度分量的原始值与当前块周边的已重建值的相关性不大的情况下,由于预测时不仅考虑了当前块周围已重建像素点的像素值,还考虑了当前块的亮度特征信息,可以实现对当前块的亮度分量的准确预测,提高亮度分量帧内预测的准确性。
图14为本申请实施例提供的视频解码方法的另一流程示意图,本申请实施例以第二标志用于指示当前块的色度分量是否使用基于自编码器的帧内预测模式为例。如图14所示,包括:
S601、解码码流,确定当前块的色度分量的帧内预测模式。
上述S601中确定当前块的色度分量的帧内预测模式的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,则上述S601包括如下S601-A1和S601-A2:
S601-A1、解码端解码码流,得到该第一标志,该第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式;
S601-A2、根据第一标志,确定当前块的色度分量的帧内预测模式。
例如,若该第一标志的值等于第二数值(例如0)时,表示该当前序列不允许使用基于自编码器的帧内预测模式,此时,则可以确定当前块的色度分量的帧内预测模式不是基于自编码器的帧内预测模式。
再例如,若该第一标志的值等于第一数值(例如1)时,表示当前序列允许使用基于自编码器的帧内预测模式,此时,上述S601-A2中根据第一标志,确定当前块的色度分量的帧内预测模式的实现方式包括但不限于如下几种:
第一种可实现方式,若第一标志的值等于第一数值,且当前块的色度分量具有基于自编码器的帧内预测模式,则 确定当前块的色度分量的帧内预测模式为基于自编码器的帧内预测模式。
第二种可实现方式,码流中包括第一标志和第二标志,此时,若第一标志的值等于第一数值,则解码端解码码流,得到第二标志,该第二标志用于指示当前块的色度分量是否使用基于自编码器的帧内预测模式,并根据该第二标志,确定当前块的色度分量的帧内预测模式。例如第二标志的取值为数值1(例如1)时,表示该当前块的色度分量的帧内预测模式为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1,则表示当前块的色度分量的帧内预测模式为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
上述方式一中,解码端可以根据从码流中解码出的第一标志来确定当前块的色度分量的帧内预测模式,或者,根据第一标志和第二标志来确定当前块的色度分量的帧内预测模式。
方式二,若码流中包括第二标志,而不包括第一标志时,上述S601包括如下S601-B1和S601-B2:
S601-B1、解码码流,得到第二标志;
S601-B2、根据第二标志,确定当前块的色度分量是否使用基于自编码器帧内预测模式。
在该方式二中,码流中包括直接指示当前块的色度分量是否使用基于自编码器的帧内预测模式的第二标志,而不包括上述第一标志时,解码端解码码流,得到第二标志,直接根据该第二标志,确定当前块的色度分量的帧内预测模式,例如,第二标志的取值为数值1(例如1)时,表示该当前块的色度分量的帧内预测模式为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1时,则表示当前块的色度分量的帧内预测模式为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
该方式二中,在码流中不写入第一标志,而是直接写入第二标志,用于指示当前块的色度分量是否使用基于自编码器的帧内预测模式,进而节约码字,降低解码端的解码负担。
本申请实施例对该第一标志和第二标志在码流中具体写入位置不做限制。
可选的,第一标志包括在序列级参数语法元素中。
可选的,上述第二标志包括在编码单元语法元素中。
S602、若当前块的色度分量使用基于自编码器的帧内预测模式,则解码码流,得到当前块的色度特性信息。
本申请实施例中,对当前块的色度特征信息在码流中的具体写入位置不做限制,例如可以携带在码流中的任意位置处。
在一种示例中,当前块的色度特征信息携带在当前块的色度分量对应的语法元素。此时,解码端通过解码当前块的色度分量对应的语法元素;在当前块的色度分量对应的语法元素中,获得当前块的色度特性信息。
在另一种示例中,当前块的色度特征信息携带在编码单元语法元素中。
可选的,在该编码单元语法元素中还包括第二标志intra_ae_flag。
具体的,解码端编码单元语法(Coding unit syntax)码流,得到编码单元语法元素,并从中读取第二标志intra_ae_flag,该intra_ae_flag为编码单元级控制标志,用于表示当前块的色度分量是否使用基于自编码器的帧内预测模式。若此intra_ae_flag为1,则表示当前块的色度分量的使用于自编码器的帧内预测模式,则进一步读取当前块的色度特征信息sideinfo[]。
在一种示例中,以VVC Draft 10为例,Coding unit syntax的相关语法表中改动如表4所示:
表4
上述表4中,si_size表示需要编码多少个特征信息元素。sideinfo_cb[]表示cb色度特征信息,sideinfo_cr[]表示cr色度特征信息,sideinfo_cb[]和sideinfo_cr[]使用1位定长码(u(1))解码,是针对sideinfo_cb[]和sideinfo_cr[]的取值为0或1时的情况,也可以采用基于上下文模型的方式解码。
可选的,若sideinfo_cb[]和sideinfo_cr[]取值范围更大时,可采用多位的码字来解码,示例性的可以参照上述表3。
S603、获取当前块周围已重建像素点的色度值。
示例性,当前块周围已重建像素点包括当前块上方n行像素点和/或左层m列像素点,n与m均为正整数,n与m可以相等,也可以不相等。上述n行像素点中的每一行像素点,可以连续,也可以不连续。m列像素点中的每一列像素点,可以连续,也可以不连续。可选的,上述n行像素点可以相邻也可以不相邻,m列像素点可以相邻,也可以不相邻。
S604、将当前块的色度特性信息和当前块周围已重建像素点的色度值,输入解码网络中,得到当前块的色度预测块。
在一些实施例中,不同形状的块对应的自编码器可能不同,色度分量和色度分量对应的自编码器可能也不相同,因此,解码端可以根据当前块的色度分量的大小,从多个自编码器中,选出当前块的色度分量对应的自编码器。接着,如图15所示,将上述获得当前块的色度特性信息和当前块周围已重建像素点的色度值,输入当前块的色度分量对应的自编码器中,得到解码网络输出的当前块的色度预测块。
本申请实施例的解码方法,通过解码码流,确定当前块的色度分量的帧内预测模式;若当前块的色度分量的帧内预测模式为基于自编码器的帧内预测模式时,则解码码流,得到当前块的色度特征信息;获取当前块周围已重建像素点的色度值;将当前块的色度特性信息和当前块周围已重建像素点的色度值,输入当前块的色度分量对应的自编码器 的解码网络中,得到解码网络输出的当前块的色度预测块。即本申请实施例在色度帧内预测时增加了基于自编码器的帧内预测模式,丰富了色度分量帧内预测模式。若确定当前块的色度分量的帧内预测模式为基于自编码器的帧内预测模式时,根据当前块的色度特性信息和当前块周围已重建像素点的色度值,确定当前块的色度预测块,在当前快的色度分量的原始值与当前块周边的已重建值的相关性不大的情况下,由于预测时不仅考虑了当前块周围已重建像素点的像素值,还考虑了当前块的色度特征信息,可以实现对当前块的色度分量的准确预测,提高色度分量帧内预测的准确性。
图16为本申请实施例提供的视频解码方法的另一流程示意图,本申请实施例以第二标志用于指示当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式为例。如图16所示,包括:
S701、解码码流,确定当前块的亮度分量和亮度分量和色度分量的帧内预测模式。
上述S701中确定当前块的亮度分量和色度分量的帧内预测模式的实现方式,包括但不限于如下几种:
方式一,若码流中包括第一标志,则上述S701包括如下S701-A1和S701-A2:
S701-A1、解码端解码码流,得到该第一标志,该第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式;
S701-A2、根据第一标志,确定当前块的亮度分量和色度分量的帧内预测模式。
例如,若该第一标志的值等于第二数值(例如0)时,表示该当前序列不允许使用基于自编码器的帧内预测模式,此时,则可以确定当前块的亮度分量和色度分量的帧内预测模式均不是基于自编码器的帧内预测模式。
再例如,若该第一标志的值等于第一数值(例如1)时,表示当前序列允许使用基于自编码器的帧内预测模式,此时,上述S701-A2中根据第一标志,确定当前块的亮度分量和色度分量的帧内预测模式的实现方式包括但不限于如下几种:
第一种可实现方式,若第一标志的值等于第一数值,且当前块的亮度分量和色度分量具有基于自编码器的帧内预测模式,则确定当前块的亮度分量和色度分量的帧内预测模式为基于自编码器的帧内预测模式。
第二种可实现方式,码流中包括第一标志和第二标志,此时,若第一标志的值等于第一数值,则解码端解码码流,得到第二标志,该第二标志用于指示当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式,并根据该第二标志,确定当前块的亮度分量和色度分量的帧内预测模式。例如第二标志的取值为数值1(例如1)时,表示该当前块的亮度分量和色度分量的帧内预测模式均为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1,则表示当前块的亮度分量和色度分量的帧内预测模式均为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
上述方式一中,解码端可以根据从码流中解码出的第一标志来确定当前块的亮度分量和色度分量的帧内预测模式,或者,根据第一标志和第二标志来确定当前块的亮度分量和色度分量的帧内预测模式。
方式二,若码流中包括第二标志,而不包括第一标志时,上述S701包括如下S701-B1和S701-B2:
S701-B1、解码码流,得到第二标志;
S701-B2、根据第二标志,确定当前块的亮度分量和色度分量是否使用基于自编码器帧内预测模式。
在该方式二中,码流中包括直接指示当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式的第二标志,而不包括上述第一标志时,解码端解码码流,得到第二标志,直接根据该第二标志,确定当前块的亮度分量和色度分量是否使用基于自编码器帧内预测模式。例如,第二标志的取值为数值1(例如1)时,表示该当前块的亮度分量和色度分量的帧内预测模式均为基于自编码器的帧内预测模式,若第二标志的取值不等于数值1时,则表示当前块的亮度分量和色度分量的帧内预测模式均为除基于自编码器的帧内预测模式之外的其他帧内预测模式。
该方式二中,在码流中不写入第一标志,而是直接写入第二标志,用于指示当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式,进而节约码字,降低解码端的解码负担。
本申请实施例对该第一标志和第二标志在码流中具体写入位置不做限制。
可选的,第一标志包括在序列级参数语法元素中。
可选的,上述第二标志包括在编码单元语法元素中。
S702、若当前块的亮度分量和色度分量均使用基于自编码器的帧内预测模式,则解码码流,得到当前块的亮度特征信息和色度特性信息。
本申请实施例中,对当前块的亮度特征信息和色度特征信息在码流中的具体写入位置不做限制,例如可以携带在码流中的任意位置处。
在一种示例中,当前块的色度特征信息携带在当前块的亮度分量和色度分量对应的语法元素。此时,解码端通过解码当前块的亮度分量和色度分量对应的语法元素;在当前块的亮度分量和色度分量对应的语法元素中,获得当前块的亮度特征信息和色度特性信息。
在另一种示例中,当前块的亮度特征信息和色度特性信息携带在编码单元语法元素中。
可选的,在该编码单元语法元素中还包括第二标志intra_ae_flag。
具体的,解码端编码单元语法(Coding unit syntax)码流,得到编码单元语法元素,并从中读取第二标志intra_ae_flag,该intra_ae_flag为编码单元级控制标志,用于表示当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式。若此intra_ae_flag为1,则表示当前块的亮度分量和色度分量的使用于自编码器的帧内预测模式,则进一步读取当前块的亮度特征信息和色度特征信息。
S703、获取当前块周围已重建像素点的像素值。
示例性,当前块周围已重建像素点包括当前块上方n行像素点和/或左层m列像素点,n与m均为正整数,n与m可以相等,也可以不相等。上述n行像素点中的每一行像素点,可以连续,也可以不连续。m列像素点中的每一列像素点,可以连续,也可以不连续。可选的,上述n行像素点可以相邻也可以不相邻,m列像素点可以相邻,也可以不相邻。
当前块周围已重建像素点的像素值中包括色度值和亮度值。
S704、将当前块的色度特性信息和当前块周围已重建像素点的像素值,输入解码网络中,得到当前块的亮度预测块和色度预测块。
在一些实施例中,不同形状的块对应的自编码器可能不同,亮度分量和色度分量对应的自编码器可能也不相同,因此,解码端可以根据当前块的亮度分量和色度分量的大小,从多个自编码器中,选出当前块的亮度分量对应的自编码器,和色度分量对应的自编码器。接着,如图17所示,将上述获得当前块的色度特性信息和当前块周围已重建像素点的色度值,输入当前块的色度分量对应的自编码器中,得到解码网络输出的当前块的色度预测块。将上述获得当前块的亮度特性信息和当前块周围已重建像素点的亮度值,输入当前块的亮度分量对应的自编码器中,得到解码网络输出的当前块的亮度预测块。
可选的,当前块的色度分量和亮度分量对应的自编码器相同,此时,可以将当前块的色度特性信息和亮度特征信息,以及当前块周围已重建像素点的像素值,输入当前块对应的自编码器中,得到解码网络输出的当前块的色度预测块和亮度预测块。
本申请实施例的解码方法,通过解码码流,确定当前块的亮度分量和色度分量的帧内预测模式;若当前块的亮度分量和色度分量的帧内预测模式均为基于自编码器的帧内预测模式时,则解码码流,得到当前块的亮度特征信息和色度特征信息;并将当前块的色度特性信息、亮度特征信息、以及当前块周围已重建像素点的像素值,输入自编码器的解码网络中,得到解码网络输出的当前块的亮度预测块和色度预测块。即本申请实施例,可以同时对当前块的亮度分量和色度分量进行指示和预测,可以提高当前块的预测效率。
上文对本申请实施例的解码方法进行介绍,在此基础上,下面对本申请实施例提供的编码方法进行介绍。
图18为本申请实施例提供的视频编码方法的一种流程示意图,本申请实施例应用于图1和图2所示视频编码器。如图18所示,本申请实施例的方法包括:
S801、从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,N为正整数,且N个第一帧内预测模式中包括基于自编码器的帧内预测模式。
在视频编码过程中,视频编码器接收视频流,该视频流由一系列图像帧组成,针对视频流中的每一帧图像进行视频编码,视频编码器对图像帧进行块划分,得到当前块。
在一些实施例中,当前块也称为当前编码块、当前图像块、编码块、当前编码单元、当前待编码块、当前待编码的图像块等。
在块划分时,传统方法划分后的块既包含了当前块位置的色度分量,又包含了当前块位置的亮度分量。而分离树技术(dual tree)可以划分单独分量块,例如单独的亮度块和单独的色度块,其中亮度块可以理解为只包含当前块位置的亮度分量,色度块理解为只包含当前块位置的色度分量。这样相同位置的亮度分量和色度分量可以属于不同的块,划分可以有更大的灵活性。如果分离树用在CU划分中,那么有的CU既包含亮度分量又包含色度分量,有的CU只包含亮度分量,有的CU只包含色度分量。
在一些实施例中,本申请实施例的当前块只包括色度分量,可以理解为色度块。
在一些实施例中,本申请实施例的当前块只包括亮度分量,可以理解为亮度块。
在一些实施例中,该当前块即包括亮度分量又包括色度分量。
视频编码器在对当前块进行帧内预测时,会尝试N个第一帧内预测模式中的至少一个帧内预测模式,例如基于自编码器的帧内预测模式、DM模式、DC模式(Intra_Chroma_DC),水平模式(Intra_Chroma_Horizontal),垂直模式(Intra_Chroma_Vertical),双线性(Bilinear)模式、PCM模式,以及跨分量的预测模式(TSCPM、PMC,VVC里的CCLM)等。
本实施例中,视频编码器从N个第一帧内预测模式中,确定当前块的帧内预测模式中方式包括但不限于如下几种:
方式一,视频编码器根据当前块的特征,从N个第一帧内预测模式中,确定当前块的帧内预测模式,例如,若当前块的像素值与周围已重建像素点的像素值相关性不大,则可以将基于自编码器的帧内预测模式,确定为当前块的帧内预测模式。
方式二,视频编码器通过如下方式S8011从N个第一帧内预测模式中,确定当前块的帧内预测模式。
S8011、根据率失真代价,从N个第一帧内预测模式中,确定当前块的帧内预测模式。
例如,计算N个第一帧内预测模式中每个帧内预测模式对应的率失真代价,将率失真代价最小的第一帧内预测模式,确定为当前块的帧内预测模式。
本实施例中,可以采用已有的计算率失真代价的方式,计算每一个第一帧内预测模式对应的率失真代价。可选的,为例降低计算量,则还可以通过粗筛选或粗筛选加细筛选的方式,从N个第一帧内预测模式中,确定当前块的帧内预测模式,实现方式包括但不限于如下几种示例1和示例2:
示例1,上述S8011包括如下S8011-A1至S8011-A3的步骤:
S8011-A1、确定使用第一帧内预测模式对当前块进行编码时,第一帧内预测模式对应的预测值;
S8011-A2、根据预测值与当前块的原始值之间的失真,以及编码第一帧内预测模式的标志位时所消耗的比特数,确定第一帧内预测模式的第一率失真代价;
S8011-A3、根据第一率失真代价,从N个第一帧内预测模式中,确定当前块的帧内预测模式。
具体的,针对N个第一帧内预测模式中的每一个第一帧内预测模式,使用该第一帧内预测模式对当前块进行预测,得到当前块的预测值,该预测值为该第一帧内预测模式对应的预测值。接着,将该第一帧内预测模式对应的预测值与 当前块原始值进行比较,计算该第一帧内预测模式对应的预测值与当前块的原始值之间的失真D1,同时计算编码该第一帧预测模式的标志位(flag)时所消耗的比特数R2。根据该失真D1和比特数R2,确定该第一帧内预测模式对应的第一率失真代价J1,例如,J1=D1+R1。根据上述方式,可以确定从N个第一帧内预测模式中,每一个帧内预测模式对应的第一率失真代价J1。最后,根据每一个帧内预测模式对应的第一率失真代价J1,从N个第一帧内预测模式中,确定当前块的帧内预测模式。
本申请实施例通过预测值与原始值之间的失真,以及编码标志位所消耗的比特数来确定第一率失真代价,相比于通过重建值与原始值之间的失真,以及整个帧内预测模式对应的编码比特数,确定第一率失真代价,避免计算重建值以及统计整个编码过程的比特数,大大降低了计算量,提升了第一率失真代价的计算速度。基于该第一率失真代价选择当前块的帧内预测模式时,可以有效提升帧内预测模式的选择速度。
上述S8011-A3中根据第一率失真代价,从N个第一帧内预测模式中,确定当前块的帧内预测模式的实现方式包括但不限于如下几种:
在一种可能的实现方式1中,根据第一率失真代价,将N个第一帧内预测模式中第一率失真代价最小的第一帧内预测模式,确定当前块的帧内预测模式。该确定方法过程简单,计算量小,确定速度快。
在一种可能的实现方式2中,视频编码器通过如下步骤S8011-A31至S8011-A34的方式,确定当前块的帧内预测模式:
S8011-A31、根据第一率失真代价,从N个第一帧内预测模式中选出M个第二帧内预测模式,M为小于N的正整数;该过程可以理解为粗筛选,即根据第一率失真代价,从N个第一帧内预测模式中粗选出M个第二帧内预测模式,进行候选的细筛选。
S8011-A32、确定使用第二帧内预测模式对当前块进行编码时,第二帧内预测模式对应的重建值。
S8011-A33、根据重建值与当前块的原始值之间的失真,以及使用第二帧内预测模式对当前块进行编码时所消耗的比特数,确定第二帧内预测模式的第二率失真代价;
S8011-A34、将M个第二帧内预测模式中,第二率失真代价最小的第二帧内预测模式,确定为当前块的帧内预测模式。
具体的,根据第一率失真代价,从N个第一帧内预测模式中粗选出M个第二帧内预测模式,再从M个第二帧内预测模式细筛选出当前块的帧内预测模式。具体是,针对M个第二帧内预测模式中的每个第二帧内预测模式,将该第二帧内预测模式对应的预测值与残差值进行相加,得到当前块的重建值,将该重建值记为该第二帧内预测模式对应的重建值。计算重建值与当前块的原始值之间的失真D2,以及统计使用第二帧内预测模式对当前块进行编码时所消耗的比特数R2,根据失真D2和比特数R2,确定该第二帧内预测模式对应的第二率失真代价J2,例如,J2=D2+R2。根据上述方式,可以确定从M个第二帧内预测模式中,每个第二帧内预测模式对应的第二率失真代价J2。最后,将M个第二帧内预测模式中,第二率失真代价J2最小的第二帧内预测模式,确定为当前块的帧内预测模式。
上述示例一,将N个第一帧内预测模式一起进行粗筛选和细筛选,确定出当前块的帧内预测模式。
在一些实施例中,上述S8011还可以根据下面示例二的方法实现。
示例二,该示例二,先对除基于自编码器的帧内预测模式之外的其他第一帧内预测模式进行粗筛选,将基于自编码器的帧内预测模式与粗筛选出的第一帧内预测模式一起进行细筛选,以增大基于自编码器的帧内预测模式的使用概率。即上述S8011包括如下S8011-B1至S8011-B3的步骤:
S8011-B1、根据第三帧内预测模式对应的预测值与当前块的原始值之间的失真,以及编码第三帧内预测模式的标志位时所消耗的比特数,确定第三帧内预测模式的第一率失真代价,该第三帧内预测模式为N个第一帧内预测模式中除基于自编码器的帧内预测模式之外的第一帧内预测模式;
S8011-B2、根据第一率失真代价,从N-1个第三帧内预测模式中,选出Q个第三帧内预测模式,Q为小于N-1的正整数。
具体的,为例便于描述,将N个第一帧内预测模式中除基于自编码器的帧内预测模式之外的第一帧内预测模式,记为第三帧内预测模式,共N-1个第三帧内预测模式。针对N-1个第三帧内预测模式中的每一个第三帧内预测模式,根据上述计算第一率失真代价相同的方法,计算每个第三帧内预测模式对应的第三帧内预测模式。根据第一率失真代价的大小,从N-1个第三帧内预测模式中,选出第一率失真代价最小的Q个第三帧内预测模式。
S8011-B3、根据第一特性信息的预设取整范围,确定基于自编码器的帧内预测模式对应的P个预测值,并从P个预测值中选出R个预测值,P、R均为正整数,且R小于或等于P。
本步骤,为了降低计算量,对于基于自编码器的帧内预测模式,舍弃编码网络的预测过程和取整过程,而是根据第一特征信息的预设取整范围,来预测取整后的第一特征信息的P个可能的取值,根据这第一特征信息的P个可能的取值来确定基于自编码器的帧内预测模式对应的P个预测值。
可选的,第一特性信息的预设取整范围可以是0,1,或者-1,0,1等。
在一种示例中,根据第一特性信息的预设取整范围,确定基于自编码器的帧内预测模式对应的P个预测值可以通过如下该实现:根据第一特性信息的预设取整范围,预测编码网络输出的第一特性信息的P种取值;将P种取值下的特性信息和当前块周围已重建像素点的像素值,输入解码网络中,得到解码网络输出的P种取值下的预测值;将P种取值下的预测值,确定为基于自编码器的帧内预测模式对应的P个预测值。
举例说明,假设第一特性信息的预设取整范围可以是0,1,第一特征信息为1X2的特征向量,则第一特征信息的P个可能的取值分别为{0,0},{0,1},{1,0},{1,1},P=2
2。可选的,若第一特征信息为1Xn的特征向量时,P=2
n,n为大于等于1的正整数。针对上述第一信息的4个取值,将4个取值中每一个取值下的第一特征信息和当前块周围已重建像素点的像素值,输入解码网络中,得到解码网络输出的该取值下的预测值,进而得到P种取值下的预测值。
在一种示例中,上述R=P,则将确定的P种取值下的预测值均加入细筛选的过程中,从中确定出当前块的帧内预测模式。
在一种示例中,上述R小于P,此时,上述S8011-B3从P个预测值中选出R个预测值包括如下几种方式:
方式一,随机从P个预测值中选出R个预测值。
方式二,从P个预测值中选出与当前块的原始值最接近的R个预测值。
方式三,根据P个预测值与当前块的原始值之间的失真,确定P个预测值对应的第四率失真代价;从P个预测值中,选出第四率失真代价最小的R个预测值。可选的,该P个预测值中每个预测值对应的第四率失真代价,可以等于该预测值与当前块的原始值之间的失真D1。可选的,该预测值对应的第四率失真代价等于该预测块与当前块的原始值之间的失真D1,与编码该基于自编码器的帧内模式的标志位消耗的比特数R1的和。即从P个预测值粗筛选出R个预测值执行如下S8011-B4。
S8011-B4、根据Q个第三帧内预测模式对应的Q个预测值,和基于自编码器的帧内预测模式对应的R个预测值,从N个第一帧内预测模式中,确定当前块的帧内预测模式。
上述S8011-B4的实现方式包括但不限于如下几种:
方式一,将上述Q个第三帧内预测模式对应的Q个预测值和基于自编码器的帧内预测模式对应的R个预测值,分别与当前块的原始值进行比较,将最接近原始值的预设值对应的帧内预测模式,确定为当前块的帧内预测模式。
方式二,通过细筛选的方式,选出当前块的帧内预测模式,即上述S8011-B4包括如下S8011-B41至S8011-B43的步骤:
S8011-B41、确定Q个预测值对应的Q个重建值,以及R个预测值对应的R个重建值。
具体是,确定Q个预测值中每个预测值对应的残差值,将残差值与预测值相加,得到该预设值对应的重建值,进 而得到Q个重建值。同理,确定R个预测值中每个预测值对应的残差值,将残差值与预测值相加,得到该预测值对应的重建值,进而得到R个重建值。
S8011-B42、根据Q+R个重建值分别与当前块的原始值之间的失真,以及使用Q+R个重建值对应的第一帧内预测模式对当前块进行编码时所消耗的比特数,确定第三率失真代价。
根据上述步骤,可以得到Q+R个重建值,针对Q+R个重建值中的每个重建值,计算重建值与当前块的原始值之间的失真D3,以及使用该重建值对应的第一帧内预测模式对当前块进行编码时所消耗的比特数R3,将D3与R3的和,确定为该重建值对应的第三率失真代价。
S8011-B43、将N个第一帧内预测模式中,第三率失真代价最小的第一帧内预测模式,确定为当前块的帧内预测模式。
该实现方式中,将基于自编码器的帧内预测模式的R个预测值增加在帧内预测模式的细筛选过程中,增加了基于自编码器的帧内预测模式的选择概率。
根据上述步骤,确定出当前块的帧内预测模式后,执行如下S802的步骤。
S802、若当前块的帧内预测模式为基于自编码器的帧内预测模式,则获取当前块对应的自编码器,自编码器包括编码网络和解码网络。
由上述可知,不同形状大小的块对应的自编码器可能不同,因此,视频编码器根据当前块的大小,从不同的自编码器中,选择当前块对应的自编码器。
S803、将当前块的原始值输入编码网络中,得到编码网络输出的当前块的第一特性信息。
S804、将当前块的第一特性信息和当前块周围已重建像素点的像素值,输入解码网络中,得到解码网络输出的当前块的预测块。
接着,将当前块的原始值(即原始像素值)输入自编码器的编码网络中,得到编码网络输出的当前块的第一特性信息。接着,将第一特征信息和当前块周围已重建像素点的像素值,输入解码网络中,得到解码网络输出的当前块的预测块。
在一些实施例中,当前块包括亮度分量和/色度分量,此时,上述S803包括如下几种方式:
方式一,若确定当前块的亮度分量使用基于自编码器的帧内预测模式,则将当前块的原始亮度值输入编码网络中,得到当前块的第一亮度特性信息。
方式二,若确定当前块的色度分量使用基于自编码器的帧内预测模式,则将当前块的原始色度值输入编码网络中,得到当前块的第一色度特性信息。
方式三,若确定当前块的亮度分量和色度分量均使用基于自编码器的帧内预测模式,则将当前块的原始亮度值和原始色度值输入编码网络中,得到当前块的第一亮度特性信息和第一色度特性信息。可选的,当前块的原始亮度值和原始色度值可以同时输入编码网络中,也可以逐一输入编码网络中。
在此基础上,对应的,上述上述S804包括如下几种方式:
方式一,若当前块的第一特性信息包括第一亮度特性信息,则将第一亮度特性信息和当前块周围已重建像素点的亮度值,输入解码网络中,得到当前块的亮度预测块。
方式二,若当前块的第一特性信息包括第一色度特性信息,则将第一色度特性信息和当前块周围已重建像素点的色度值,输入解码网络中,得到当前块的色度预测块。
方式三,若当前块的第一特性信息包括第一亮度特性信息和第一色度特性信息,则将第一亮度特性信息和第一色度特性信息,以及当前块周围已重建像素点的像素值,输入解码网络中,得到当前块的亮度预测块和色度预测块。
在一些实施例中,由上述可知,由于编码网络输出的特征信息需要写入码流,因此,需要进行取整,基于此,上 述S803包括S803-A1和S803-A2:
S803-A1、对当前块的第一特性信息进行取整,得到当前块的第二特性信息;
S803-A2、将第二特性信息和当前块周围已重建像素点的像素值,输入解码网络中,得到解码网络输出的当前块的预测块。
在一些实施例中,若当前块包括色度分量和/或色度分量时,上述S803-A1包括如下几种方式:
方式一,若当前块的第一特性信息包括第一亮度特性信息,则对第一亮度特性信息进行取值,得到当前块的第二亮度特性信息。
方式二,若当前块的第一特性信息包括第一色度特性信息,则对第一色度特性信息进行取值,得到当前块的第二色度特性信息。
方式三,若当前块的第一特性信息包括第一亮度特性信息和第一色度特性信息,则对第一亮度特性信息和第一色度特性信息分别进行取值,得到当前块的第二亮度特性信息和第二色度特性信息。
对应的,上述上述S803-A2包括如下几种方式:
方式一,若当前块的第二特性信息包括第二亮度特性信息,则将第二亮度特性信息和当前块周围已重建像素点的亮度值,输入解码网络中,得到当前块的亮度预测块。
方式二,若当前块的第二特性信息包括第二色度特性信息,则将第二色度特性信息和当前块周围已重建像素点的色度值,输入解码网络中,得到当前块的色度预测块。
方式三,若当前块的第二特性信息包括第二亮度特性信息和第二色度特性信息,则将第二亮度特性信息和第二色度特性信息,以及当前块周围已重建像素点的像素值,输入解码网络中,得到当前块的亮度预测块和色度预测块。
在一些实施例中,视频编码设备将当前块的第二特性信息写入码流。
可选的,若当前块的第二特性信息包括第二亮度特性信息,则将第二亮度特性信息写入码流;
可选的,若当前块的第二特性信息包括第二色度特性信息,则将第二色度特性信息写入码流;
可选的,若当前块的第二特性信息包括第二亮度特性信息和第二色度特性信息,则将第二亮度特性信息和第二色度特性信息写入码流。
在一些实施例中,视频编码设备在码流中写入第一标志,该第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式。
在一些实施例中,若第一标志的值为第一数值时,视频编码设备还在码流中写入第二标志,该第二标志用于指示当前块是否使用基于自编码器的帧内预测模式,第一数值用于指示当前序列允许使用基于自编码器的帧内预测模式。
在一些实施例中,视频编码设备直接在码流中写入第二标识,而不写入第一标志,以解码码字。
可选的,第一标志包括在序列级参数码流中。
可选的,第二标志包括在编码单元语法元素中。
可选的,若当前块包括色度信息和亮度信息时,则上述第二标志用于指示当前块的亮度分量和/或色度分量是否使用基于自编码器的帧内预测模式。
本申请实施例的编码方法,通过从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,N为正整数,且所述N个第一帧内预测模式中包括基于自编码器的帧内预测模式;若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则获取所述当前块对应的自编码器,所述自编码器包括编码网络和解码网络;将所述当前块的原始值输入所述编码网络中,得到所述编码网络输出的所述当前块的第一特性信息;将所述当前块的第一特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。即本申请增加了基于自编码器的帧内预测模式,丰富了帧内预测模式。若确定当前块的色度分量的帧内预测模式为基于自编码器的帧内预测模式时,根据当前块的像素值和当前块周围已重建像素点的像素值,确定当前块的预测块,在当前快的的 原始值与当前块周边的已重建值的相关性不大的情况下,由于预测时不仅考虑了当前块周围已重建像素点的像素值,还考虑了当前块的特征信息,可以实现对当前块的准确预测,提高帧内预测的准确性。
应理解,图10至图18仅为本申请的示例,不应理解为对本申请的限制。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
还应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。另外,本申请实施例中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。具体地,A和/或B可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
上文结合图10至图18,详细描述了本申请的方法实施例,下文结合图19至图21,详细描述本申请的装置实施例。
图19是本申请一实施例提供的视频解码器的示意性框图。
如图19所示,视频解码器10包括:
模式确定单元11,用于解码码流,确定当前块的帧内预测模式;
特征确定单元12,用于若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的特征信息;
获取单元13,用于获取所述当前块周围已重建像素点的像素值;
预测单元14,用于将所述当前块的特性信息和所述当前块周围已重建像素点的像素值,输入所述当前块对应的自编码器的解码网络中,得到所述解码网络输出的所述当前块的预测块。
在一些实施例中,模式确定单元11,具体用于解码所述码流,得到第一标志,所述第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式;根据所述第一标志,确定所述当前块的帧内预测模式。
在一些实施例中,模式确定单元11,具体用于若所述第一标志的值为第一数值,则解码所述码流,得到第二标志,所述第二标志用于指示所述当前块是否使用基于自编码器的帧内预测模式,所述第一数值用于指示所述当前序列允许使用基于自编码器的帧内预测模式;根据所述第二标志,确定所述当前块的帧内预测模式。
在一些实施例中,模式确定单元11,具体用于解码所述码流,得到第二标志,所述第二标志用于指示所述当前块是否使用基于自编码器的帧内预测模式;根据所述第二标志,确定所述当前块的帧内预测模式。
可选的,所述第一标志包括在序列级参数语法元素中。
可选的,所述第二标志包括在编码单元语法元素中。
可选的,所述第二标志用于指示所述当前块的亮度分量和/或色度分量是否使用基于自编码器的帧内预测模式。
在一些实施例中,模式确定单元11,具体用于若所述第二标志用于指示所述当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式,则根据第二标志,确定所述当前块的亮度分量和色度分量是否使用基于自编码器帧内预测模式;
若所述第二标志用于指示所述当前块的亮度分量是否为使用基于自编码器的帧内预测模式,则根据第二标志,确定所述当前块的亮度分量是否使用基于自编码器帧内预测模式;
若所述第二标志用于指示所述当前块的色度分量是否使用基于自编码器的帧内预测模式,则根据第二标志,确定所述当前块的色度分量是否使用基于自编码器帧内预测模式。
在一些实施例中,特征确定单元12,具体用于若所述当前块的亮度分量和色度分量均使用基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的亮度特性信息和色度特性信息;
若所述当前块的亮度分量使用基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的亮度特性信息;
若所述当前块的色度分量使用基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的色度特性信息。
在一些实施例中,特征确定单元12,具体用于解码所述当前块的亮度分量对应的语法元素,得到所述当前块的亮度分量对应的语法元素;在所述当前块的亮度分量对应的语法元素中,获得所述当前块的亮度特性信息。
在一些实施例中,特征确定单元12,具体用于解码所述当前块的色度分量对应的语法元素,得到所述当前块的色度分量对应的语法元素;在所述当前块的色度分量对应的语法元素中,获得所述当前块的色度特性信息。
在一些实施例中,预测单元14,具体用于若所述当前块的特性信息包括所述当前块的亮度特性信息和色度特性信息,则将所述当前块的亮度特性信息和所述当前块周围已重建像素点的亮度值,输入所述解码网络中,得到所述当前块的亮度预测块,以及将所述当前块的色度特性信息和所述当前块周围已重建像素点的色度值,输入所述解码网络中,得到所述当前块的色度预测块;
若所述当前块的特性信息包括所述当前块的亮度特性信息,则将所述当前块的亮度特性信息和所述当前块周围已重建像素点的亮度值,输入所述解码网络中,得到所述当前块的亮度预测块;
若所述当前块的特性信息包括所述当前块的色度特性信息,则将所述当前块的色度特性信息和所述当前块周围已重建像素点的色度值,输入所述解码网络中,得到所述当前块的色度预测块。
可选的,所述当前块的特性信息中的元素值为整数。
可选的,所述当前块的特性信息是对所述自编码器的编码网络的最后一层激活函数输出的特性信息进行取整得到的。
可选的,所述编码网络的最后一层激活函数输出的特性信息中的元素值的取值范围为[a,b],所述a和b为整数。
可选的,所述a为0,所述b为1。
示例性的,所述编码网络的最后一层激活函数的表达式为:
其中,所述x为所述最后一层激活函数的输入,所述S(x)为所述最后一层激活函数输出的特性信息。
可选的,所述a为-1,所述b为1。
示例性的,所述编码网络的最后一层激活函数的表达式为:
其中,所述x为所述最后一层激活函数的输入,所述S(x)为所述最后一层激活函数输出的特性信息,所述n为正整数。
可选的,所述n为10。
在一些实施例中,所述自编码器在训练过程中,对所述编码网络输出的原始特性信息进行加噪声处理后,输入所述解码网络。
在一些实施例中,所述自编码器在训练过程中,在正向传播时,对所述编码网络输出的原始特性信息进行取值后输入所述解码网络,在反向传播时,对所述编码网络输出的原始特性信息进行求导运算,以更新所述编码网络中的权重参数。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图19所示的视频解码器10可以执行本申请实施例的解码方法,并且视频解码器10中的各个单元的前述和其它操作和/或功能分别为了实现上述解码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
图20是本申请一实施例提供的视频编码器的示意性框图。
如图20所示,该视频编码器20可包括:
模式确定单元21,用于从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,所述N为正整数,且所述N个第一帧内预测模式中包括基于自编码器的帧内预测模式;
获取单元22,用于若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则获取所述当前块对应的自编码器,所述自编码器包括编码网络和解码网络;
特征确定单元23,用于将所述当前块的原始值输入所述编码网络中,得到所述编码网络输出的所述当前块的第一特性信息;
预测单元24,用于将所述当前块的第一特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。
在一些实施例中,预测单元24,具体用于对所述当前块的第一特性信息进行取整,得到所述当前块的第二特性信息;将所述第二特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。
在一些实施例中,所述预测单元24,还用于将所述当前块的第二特性信息写入码流。
在一些实施例中,所述预测单元24,还用于在码流中写入第一标志,所述第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式。
在一些实施例中,所述预测单元24,还用于若所述第一标志的值为第一数值时,在所述码流中写入第二标志,所述第二标志用于指示所述当前块是否使用基于自编码器的帧内预测模式,所述第一数值用于指示所述当前序列允许使用基于自编码器的帧内预测模式。
在一些实施例中,所述预测单元24,还用于在所述码流中写入第二标识,所述第二标志用于指示所述当前块是否使用基于自编码器的帧内预测模式。
可选的,所述第一标志包括在序列级参数语法元素中。
可选的,所述第二标志包括在编码单元语法元素中。
可选的,所述第二标志用于指示所述当前块的亮度分量和/或色度分量是否使用基于自编码器的帧内预测模式。
在一些实施例中,模式确定单元21,具体用于根据率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式。
在一些实施例中,模式确定单元21,具体用于确定使用所述第一帧内预测模式对所述当前块进行编码时,所述第一帧内预测模式对应的预测值;根据所述预测值与所述当前块的原始值之间的失真,以及编码所述第一帧内预测模式的标志位时所消耗的比特数,确定所述第一帧内预测模式的第一率失真代价;根据第一率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式。
在一些实施例中,模式确定单元21,具体用于根据所述第一率失真代价,从所述N个第一帧内预测模式中选出M个第二帧内预测模式,所述M为小于所述N的正整数;确定使用所述第二帧内预测模式对所述当前块进行编码时,所述第二帧内预测模式对应的重建值;根据所述重建值与所述当前块的原始值之间的失真,以及使用所述第二帧内预测模式对所述当前块进行编码时所消耗的比特数,确定所述第二帧内预测模式的第二率失真代价;将所述M个第二帧内预测模式中,第二率失真代价最小的第二帧内预测模式,确定为所述当前块的帧内预测模式。
在一些实施例中,模式确定单元21,具体用于根据第三帧内预测模式对应的预测值与所述当前块的原始值之间的失真,以及编码所述第三帧内预测模式的标志位时所消耗的比特数,确定所述第三帧内预测模式的第一率失真代价,所述第三帧内预测模式为所述N个第一帧内预测模式中除基于自编码器的帧内预测模式之外的第一帧内预测模式;根 据所述第一率失真代价,从N-1个第三帧内预测模式中,选出Q个第三帧内预测模式,所述Q为小于所述N-1的正整数;根据所述第一特性信息的预设取整范围,确定基于自编码器的帧内预测模式对应的P个预测值,并从所述P个预测值中选出R个预测值,所述P、R均为正整数,且所述R小于或等于所述P;根据所述Q个第三帧内预测模式对应的Q个预测值,和所述基于自编码器的帧内预测模式对应的R个预测值,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式。
在一些实施例中,模式确定单元21,具体用于确定所述Q个预测值对应的Q个重建值,以及所述R个预测值对应的R个重建值;根据Q+R个重建值分别与所述当前块的原始值之间的失真,以及使用所述Q+R个重建值对应的第一帧内预测模式对所述当前块进行编码时所消耗的比特数,确定第三率失真代价;将所述N个第一帧内预测模式中,第三率失真代价最小的第一帧内预测模式,确定为所述当前块的帧内预测模式。
在一些实施例中,模式确定单元21,具体用于根据所述第一特性信息的预设取整范围,预测所述编码网络输出的第一特性信息的P种取值;将所述P种取值下的特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述P种取值下的预测值;将所述P种取值下的预测值,确定为所述基于自编码器的帧内预测模式对应的P个预测值。
在一些实施例中,若所述R小于所述P时,模式确定单元21,具体用于根据所述P个预测值与所述当前块的原始值之间的失真,确定所述P个预测值对应的第四率失真代价;从所述P个预测值中,选出第四率失真代价最小的R个预测值。
在一些实施例中,特征确定单元23,具体用于若确定所述当前块的亮度分量使用基于自编码器的帧内预测模式,则将所述当前块的原始亮度值输入所述编码网络中,得到所述当前块的第一亮度特性信息;
若确定所述当前块的色度分量使用基于自编码器的帧内预测模式,则将所述当前块的原始色度值输入所述编码网络中,得到所述当前块的第一色度特性信息;
若确定所述当前块的亮度分量和色度分量均使用基于自编码器的帧内预测模式,则将所述当前块的原始亮度值和原始色度值输入所述编码网络中,得到所述当前块的第一亮度特性信息和第一色度特性信息。
在一些实施例中,特征确定单元23,具体用于若所述当前块的第一特性信息包括所述第一亮度特性信息,则将所述第一亮度特性信息和所述当前块周围已重建像素点的亮度值,输入所述解码网络中,得到所述当前块的亮度预测块;
若所述当前块的第一特性信息包括所述第一色度特性信息,则将所述第一色度特性信息和所述当前块周围已重建像素点的色度值,输入所述解码网络中,得到所述当前块的色度预测块;
若所述当前块的第一特性信息包括所述第一亮度特性信息和所述第一色度特性信息,则将所述第一亮度特性信息和所述第一色度特性信息,以及所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述当前块的亮度预测块和色度预测块。
在一些实施例中,特征确定单元23,具体用于若所述当前块的第一特性信息包括所述第一亮度特性信息,则对所述第一亮度特性信息进行取值,得到所述当前块的第二亮度特性信息;
若所述当前块的第一特性信息包括所述第一色度特性信息,则对所述第一色度特性信息进行取值,得到所述当前块的第二色度特性信息;
若所述当前块的第一特性信息包括所述第一亮度特性信息和所述第一色度特性信息,则对所述第一亮度特性信息和所述第一色度特性信息分别进行取值,得到所述当前块的第二亮度特性信息和第二色度特性信息。
在一些实施例中,所述预测单元24,具体用于若所述当前块的第二特性信息包括所述第二亮度特性信息,则将所述第二亮度特性信息写入所述码流;
若所述当前块的第二特性信息包括所述第二色度特性信息,则将所述第二色度特性信息写入所述码流;
若所述当前块的第二特性信息包括所述第二亮度特性信息和所述第二色度特性信息,则将所述第二亮度特性信息 和所述第二色度特性信息写入所述码流。
可选的,所述编码网络的最后一层激活函数输出的第一特性信息中的元素值的取值范围为[a,b],所述a和b为整数。
可选的,所述a为0,所述b为1。
示例性的,所述编码网络的最后一层激活函数的表达式为:
其中,所述x为所述最后一层激活函数的输入信息,所述S(x)为所述最后一层激活函数输出的第一特性信息。
可选的,所述a为-1,所述b为1。
示例性的,所述编码网络的最后一层激活函数的表达式为:
其中,所述x为所述最后一层激活函数的输入信息,所述S(x)为所述最后一层激活函数输出的第一特性信息,所述n为正整数。
可选的,所述n为10。
可选的,所述自编码器在训练过程中,对所述编码网络输出的第一特性信息进行加噪声处理后,输入所述解码网络。
可选的,所述自编码器在训练过程中,在正向传播时,对所述编码网络输出的第一特性信息进行取整后输入所述解码网络,在反向传播时,对所述编码网络输出的第一特性信息进行求导运算,以更新所述编码网络中的权重参数。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图20所示的视频编码器20可以对应于执行本申请实施例的编码方法中的相应主体,并且视频解码器20中的各个单元的前述和其它操作和/或功能分别为了实现编码方法等各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能单元的角度描述了本申请实施例的装置和系统。应理解,该功能单元可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件单元组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。可选地,软件单元可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图21是本申请实施例提供的电子设备的示意性框图。
如图32所示,该电子设备30可以为本申请实施例所述的视频编码器,或者视频解码器,该电子设备30可包括:
存储器33和处理器32,该存储器33用于存储计算机程序34,并将该程序代码34传输给该处理器32。换言之,该处理器32可以从存储器33中调用并运行计算机程序34,以实现本申请实施例中的方法。
例如,该处理器32可用于根据该计算机程序34中的指令执行上述方法200中的步骤。
在本申请的一些实施例中,该处理器32可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器33包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦 除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序34可以被分割成一个或多个单元,该一个或者多个单元被存储在该存储器33中,并由该处理器32执行,以完成本申请提供的方法。该一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序34在该电子设备30中的执行过程。
如图21所示,该电子设备30还可包括:
收发器33,该收发器33可连接至该处理器32或存储器33。
其中,处理器32可以控制该收发器33与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器33可以包括发射机和接收机。收发器33还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备30中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
图22是本申请实施例提供的视频编解码系统的示意性框图。
如图22所示,该视频编解码系统40可包括:视频编码器41和视频解码器42,其中视频编码器41用于执行本申请实施例涉及的视频编码方法,视频解码器42用于执行本申请实施例涉及的视频解码方法。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
本申请还提供了一种码流,该码流是根据上述编码方法生成的,可选的,该码流中包括上述第一标志,或者包括第一标志和第二标志。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外 的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。
Claims (55)
- 一种视频解码方法,其特征在于,包括:解码码流,确定当前块的帧内预测模式;若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的特征信息;获取所述当前块周围已重建像素点的像素值;将所述当前块的特性信息和所述当前块周围已重建像素点的像素值,输入所述当前块对应的自编码器的解码网络中,得到所述解码网络输出的所述当前块的预测块。
- 根据权利要求1所述的方法,其特征在于,所述解码码流,确定当前块的帧内预测模式,包括:解码所述码流,得到第一标志,所述第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式;根据所述第一标志,确定所述当前块的帧内预测模式。
- 根据权利要求2所述的方法,其特征在于,所述根据所述第一标志,确定所述当前块的帧内预测模式,包括:若所述第一标志的值为第一数值,则解码所述码流,得到第二标志,所述第二标志用于指示所述当前块是否使用基于自编码器的帧内预测模式,所述第一数值用于指示所述当前序列允许使用基于自编码器的帧内预测模式;根据所述第二标志,确定所述当前块帧内预测模式。
- 根据权利要求2所述的方法,其特征在于,所述第一标志包括在序列级参数语法元素中。
- 根据权利要求3所述的方法,其特征在于,所述第二标志包括在编码单元语法元素中。
- 根据权利要求3所述的方法,其特征在于,所述第二标志用于指示所述当前块的亮度分量和/或色度分量是否使用基于自编码器的帧内预测模式。
- 根据权利要求6所述的方法,其特征在于,所述根据所述第二标志,确定所述当前块帧内预测模式,包括:若所述第二标志用于指示所述当前块的亮度分量和色度分量是否使用基于自编码器的帧内预测模式,则根据第二标志,确定所述当前块的亮度分量和色度分量是否使用基于自编码器帧内预测模式;若所述第二标志用于指示所述当前块的亮度分量是否为使用基于自编码器的帧内预测模式,则根据第二标志,确定所述当前块的亮度分量是否使用基于自编码器帧内预测模式;若所述第二标志用于指示所述当前块的色度分量是否使用基于自编码器的帧内预测模式,则根据第二标志,确定所述当前块色度分量是否使用基于自编码器帧内预测模式。
- 根据权利要求7所述的方法,其特征在于,所述解码所述码流,得到所述当前块的特征信息,包括:若所述当前块的亮度分量和色度分量均使用基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的亮度特性信息和色度特性信息;若所述当前块的亮度分量使用基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的亮度特性信息;若所述当前块的色度分量使用基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的色度特性信息。
- 根据权利要求8所述的方法,其特征在于,所述解码所述码流,得到所述当前块的亮度特性信息,包括:解码所述当前块的亮度分量对应的语法元素;在所述当前块的亮度分量对应的语法元素中,获得所述当前块的亮度特性信息。
- 根据权利要求8所述的方法,其特征在于,所述解码所述码流,得到所述当前块的色度特性信息,包括:解码所述当前块的色度分量对应的语法元素;在所述当前块的色度分量对应的语法元素中,获得所述当前块的色度特性信息。
- 根据权利要求8所述的方法,其特征在于,将所述当前块的特性信息和所述当前块周围已重建像素点的像素值,输入所述当前块对应的自编码器的解码网络中,得到所述解码网络输出的所述当前块的预测块,包括:若所述当前块的特性信息包括所述当前块的亮度特性信息和色度特性信息,则将所述当前块的亮度特性信息和所 述当前块周围已重建像素点的亮度值,输入所述解码网络中,得到所述当前块的亮度预测块,以及将所述当前块的色度特性信息和所述当前块周围已重建像素点的色度值,输入所述解码网络中,得到所述当前块的色度预测块;若所述当前块的特性信息包括所述当前块的亮度特性信息,则将所述当前块的亮度特性信息和所述当前块周围已重建像素点的亮度值,输入所述解码网络中,得到所述当前块的亮度预测块;若所述当前块的特性信息包括所述当前块的色度特性信息,则将所述当前块的色度特性信息和所述当前块周围已重建像素点的色度值,输入所述解码网络中,得到所述当前块的色度预测块。
- 根据权利要求1-11任一项所述的方法,其特征在于,所述当前块的特性信息中的元素值为整数。
- 根据权利要求12所述的方法,其特征在于,所述当前块的特性信息是对所述自编码器的编码网络的最后一层激活函数输出的特性信息进行取整得到的。
- 根据权利要求13所述的方法,其特征在于,所述编码网络的最后一层激活函数输出的特性信息中的元素值的取值范围为[a,b],所述a和b为整数。
- 根据权利要求14所述的方法,其特征在于,所述a为0,所述b为1。
- 根据权利要求14所述的方法,其特征在于,所述a为-1,所述b为1。
- 根据权利要求18所述的方法,其特征在于,所述n为10。
- 根据权利要求13所述的方法,其特征在于,所述自编码器在训练过程中,对所述编码网络输出的原始特性信息进行加噪声处理后,输入所述解码网络。
- 根据权利要求13所述的方法,其特征在于,所述自编码器在训练过程中,在正向传播时,对所述编码网络输出的原始特性信息进行取值后输入所述解码网络,在反向传播时,对所述编码网络输出的原始特性信息进行求导运算,以更新所述编码网络中的权重参数。
- 一种视频编码方法,其特征在于,包括:从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,所述N为正整数,且所述N个第一帧内预测模式中包括基于自编码器的帧内预测模式;若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则获取所述当前块对应的自编码器,所述自编码器包括编码网络和解码网络;将所述当前块的原始值输入所述编码网络中,得到所述编码网络输出的所述当前块的第一特性信息;将所述当前块的第一特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。
- 根据权利要求22所述的方法,其特征在于,所述将所述当前块第一特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块,包括:对所述当前块的第一特性信息进行取整,得到所述当前块的第二特性信息;将所述第二特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。
- 根据权利要求23所述的方法,其特征在于,所述方法还包括:将所述当前块的第二特性信息写入码流。
- 根据权利要求22所述的方法,其特征在于,所述方法还包括:在码流中写入第一标志,所述第一标志用于指示当前序列是否允许使用基于自编码器的帧内预测模式。
- 根据权利要求25所述的方法,其特征在于,所述方法还包括:若所述第一标志的值为第一数值时,在所述码流中写入第二标志,所述第二标志用于指示所述当前块是否使用基于自编码器的帧内预测模式,所述第一数值用于指示所述当前序列允许使用基于自编码器的帧内预测模式。
- 根据权利要求25所述的方法,其特征在于,所述第一标志包括在序列级参数语法元素中。
- 根据权利要求26所述的方法,其特征在于,所述第二标志包括在编码单元语法元素中。
- 根据权利要求26所述的方法,其特征在于,所述第二标志用于指示所述当前块的亮度分量和/或色度分量是否使用基于自编码器的帧内预测模式。
- 根据权利要求22-29任一项所述的方法,其特征在于,所述从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,包括:根据率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式。
- 根据权利要求30所述的方法,其特征在于,所述根据率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式,包括:确定使用所述第一帧内预测模式对所述当前块进行编码时,所述第一帧内预测模式对应的预测值;根据所述预测值与所述当前块的原始值之间的失真,以及编码所述第一帧内预测模式的标志位时所消耗的比特数,确定所述第一帧内预测模式的第一率失真代价;根据第一率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式。
- 根据权利要求31所述的方法,其特征在于,所述根据第一率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式,包括:根据所述第一率失真代价,从所述N个第一帧内预测模式中选出M个第二帧内预测模式,所述M为小于所述N的正整数;确定使用所述第二帧内预测模式对所述当前块进行编码时,所述第二帧内预测模式对应的重建值;根据所述重建值与所述当前块的原始值之间的失真,以及使用所述第二帧内预测模式对所述当前块进行编码时所消耗的比特数,确定所述第二帧内预测模式的第二率失真代价;将所述M个第二帧内预测模式中,第二率失真代价最小的第二帧内预测模式,确定为所述当前块的帧内预测模式。
- 根据权利要求30所述的方法,其特征在于,所述根据率失真代价,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式,包括:根据第三帧内预测模式对应的预测值与所述当前块的原始值之间的失真,以及编码所述第三帧内预测模式的标志位时所消耗的比特数,确定所述第三帧内预测模式的第一率失真代价,所述第三帧内预测模式为所述N个第一帧内预测模式中除基于自编码器的帧内预测模式之外的第一帧内预测模式;根据所述第一率失真代价,从N-1个第三帧内预测模式中,选出Q个第三帧内预测模式,所述Q为小于所述N-1的正整数;根据所述第一特性信息的预设取整范围,确定基于自编码器的帧内预测模式对应的P个预测值,并从所述P个预测值中选出R个预测值,所述P、R均为正整数,且所述R小于或等于所述P;根据所述Q个第三帧内预测模式对应的Q个预测值,和所述基于自编码器的帧内预测模式对应的R个预测值,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式。
- 根据权利要求33所述的方法,其特征在于,所述根据所述Q个第三帧内预测模式对应的Q个预测值,和所述基于自编码器的帧内预测模式对应的R个预测值,从所述N个第一帧内预测模式中,确定所述当前块的帧内预测模式,包括:确定所述Q个预测值对应的Q个重建值,以及所述R个预测值对应的R个重建值;根据Q+R个重建值分别与所述当前块的原始值之间的失真,以及使用所述Q+R个重建值对应的第一帧内预测模式对所述当前块进行编码时所消耗的比特数,确定第三率失真代价;将所述N个第一帧内预测模式中,第三率失真代价最小的第一帧内预测模式,确定为所述当前块的帧内预测模式。
- 根据权利要求33所述的方法,其特征在于,所述根据所述第一特性信息的预设取整范围,确定基于自编码器的帧内预测模式对应的P个预测值,包括:根据所述第一特性信息的预设取整范围,预测所述编码网络输出的第一特性信息的P种取值;将所述P种取值下的特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述P种取值下的预测值;将所述P种取值下的预测值,确定为所述基于自编码器的帧内预测模式对应的P个预测值。
- 根据权利要求33所述的方法,其特征在于,若所述R小于所述P时,则所述从所述P个预测值中选出R个预测值,包括:根据所述P个预测值与所述当前块的原始值之间的失真,确定所述P个预测值对应的第四率失真代价;从所述P个预测值中,选出第四率失真代价最小的R个预测值。
- 根据权利要求23-29任一项所述的方法,其特征在于,所述将所述当前块的原始值输入所述编码网络中,得到所述编码网络输出的第一特性信息,包括:若确定所述当前块的亮度分量使用基于自编码器的帧内预测模式,则将所述当前块的原始亮度值输入所述编码网络中,得到所述当前块的第一亮度特性信息;若确定所述当前块的色度分量使用基于自编码器的帧内预测模式,则将所述当前块的原始色度值输入所述编码网络中,得到所述当前块的第一色度特性信息;若确定所述当前块的亮度分量和色度分量均使用基于自编码器的帧内预测模式,则将所述当前块的原始亮度值和原始色度值输入所述编码网络中,得到所述当前块的第一亮度特性信息和第一色度特性信息。
- 根据权利要求37所述的方法,其特征在于,所述对所述当前块的第一特性信息进行取整,得到所述当前块的第二特性信息,包括:若所述当前块的第一特性信息包括所述第一亮度特性信息,则对所述第一亮度特性信息进行取值,得到所述当前块的第二亮度特性信息;若所述当前块的第一特性信息包括所述第一色度特性信息,则对所述第一色度特性信息进行取值,得到所述当前块的第二色度特性信息;若所述当前块的第一特性信息包括所述第一亮度特性信息和所述第一色度特性信息,则对所述第一亮度特性信息和所述第一色度特性信息分别进行取值,得到所述当前块的第二亮度特性信息和第二色度特性信息。
- 根据权利要求38所述的方法,其特征在于,所述将所述当前块的第二特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块,包括:若所述当前块的第二特性信息包括所述第二亮度特性信息,则将所述第二亮度特性信息和所述当前块周围已重建像素点的亮度值,输入所述解码网络中,得到所述当前块的亮度预测块;若所述当前块的第二特性信息包括所述第二色度特性信息,则将所述第二色度特性信息和所述当前块周围已重建像素点的色度值,输入所述解码网络中,得到所述当前块的色度预测块;若所述当前块的第二特性信息包括所述第二亮度特性信息和所述第二色度特性信息,则将所述第二亮度特性信息和所述第二色度特性信息,以及所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述当前块的亮度预测块和色度预测块。
- 根据权利要求38或39所述的方法,其特征在于,所述将所述当前块的第二特性信息写入码流,包括:若所述当前块的第二特性信息包括所述第二亮度特性信息,则将所述第二亮度特性信息写入所述码流;若所述当前块的第二特性信息包括所述第二色度特性信息,则将所述第二色度特性信息写入所述码流;若所述当前块的第二特性信息包括所述第二亮度特性信息和所述第二色度特性信息,则将所述第二亮度特性信息和所述第二色度特性信息写入所述码流。
- 根据权利要求22-29任一项所述的方法,其特征在于,所述编码网络的最后一层激活函数输出的第一特性信息中的元素值的取值范围为[a,b],所述a和b为整数。
- 根据权利要求41所述的方法,其特征在于,所述a为0,所述b为1。
- 根据权利要求41所述的方法,其特征在于,所述a为-1,所述b为1。
- 根据权利要求45所述的方法,其特征在于,所述n为10。
- 根据权利要求22-29任一项所述的方法,其特征在于,所述自编码器在训练过程中,对所述编码网络输出的第一特性信息进行加噪声处理后,输入所述解码网络。
- 根据权利要求22-29任一项所述的方法,其特征在于,所述自编码器在训练过程中,在正向传播时,对所述编码网络输出的第一特性信息进行取整后输入所述解码网络,在反向传播时,对所述编码网络输出的第一特性信息进行求导运算,以更新所述编码网络中的权重参数。
- 一种视频解码器,其特征在于,包括:模式确定单元,用于解码码流,确定当前块的帧内预测模式;特征确定单元,用于若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则解码所述码流,得到所述当前块的特征信息;获取单元,用于获取所述当前块周围已重建像素点的像素值;预测单元,用于将所述当前块的特性信息和所述当前块周围已重建像素点的像素值,输入所述当前块对应的自编码器的解码网络中,得到所述解码网络输出的所述当前块的预测块。
- 一种视频编码器,其特征在于,包括:模式确定单元,用于从预设的N个第一帧内预测模式中,确定当前块的帧内预测模式,所述N为正整数,且所述N个第一帧内预测模式中包括基于自编码器的帧内预测模式;获取单元,用于若所述当前块的帧内预测模式为基于自编码器的帧内预测模式,则获取所述当前块对应的自编码器,所述自编码器包括编码网络和解码网络;特征确定单元,用于将所述当前块的原始值输入所述编码网络中,得到所述编码网络输出的所述当前块的第一特性信息;预测单元,用于将所述当前块的第一特性信息和所述当前块周围已重建像素点的像素值,输入所述解码网络中,得到所述解码网络输出的所述当前块的预测块。
- 一种视频解码器,其特征在于,包括处理器和存储器;所示存储器用于存储计算机程序;所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现上述权利要求1至21任一项所述的方法。
- 一种视频编码器,其特征在于,包括处理器和存储器;所示存储器用于存储计算机程序;所述处理器用于调用并运行所述存储器中存储的计算机程序,以实现如上述权利要求22至48任一项所述的方法。
- 一种视频编解码系统,其特征在于,包括:根据权利要求51所述的视频编码器;以及根据权利要求52所述的视频解码器。
- 一种计算机可读存储介质,其特征在于,用于存储计算机程序;所述计算机程序使得计算机执行如上述权利要求1至21或22至48任一项所述的方法。
- 一种码流,其特征在于,所述码流是基于如上述权利要求22至48任一项所述的方法生成的。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/119164 WO2023039859A1 (zh) | 2021-09-17 | 2021-09-17 | 视频编解码方法、设备、系统、及存储介质 |
CN202180098997.7A CN117426088A (zh) | 2021-09-17 | 2021-09-17 | 视频编解码方法、设备、系统、及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/119164 WO2023039859A1 (zh) | 2021-09-17 | 2021-09-17 | 视频编解码方法、设备、系统、及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023039859A1 true WO2023039859A1 (zh) | 2023-03-23 |
Family
ID=85602328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/119164 WO2023039859A1 (zh) | 2021-09-17 | 2021-09-17 | 视频编解码方法、设备、系统、及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117426088A (zh) |
WO (1) | WO2023039859A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116456102A (zh) * | 2023-06-20 | 2023-07-18 | 深圳传音控股股份有限公司 | 图像处理方法、处理设备及存储介质 |
CN116760976A (zh) * | 2023-08-21 | 2023-09-15 | 腾讯科技(深圳)有限公司 | 仿射预测决策方法、装置、设备及存储介质 |
CN116805971A (zh) * | 2023-04-11 | 2023-09-26 | 腾讯科技(深圳)有限公司 | 图像编解码方法、装置、设备 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103096051A (zh) * | 2011-11-04 | 2013-05-08 | 华为技术有限公司 | 一种图像块信号分量采样点的帧内解码方法和装置 |
US20180332292A1 (en) * | 2015-11-18 | 2018-11-15 | Mediatek Inc. | Method and apparatus for intra prediction mode using intra prediction filter in video and image compression |
CN109565593A (zh) * | 2016-08-01 | 2019-04-02 | 韩国电子通信研究院 | 图像编码/解码方法和设备以及存储比特流的记录介质 |
CN112335245A (zh) * | 2018-10-12 | 2021-02-05 | Oppo广东移动通信有限公司 | 视频图像分量预测方法及装置、计算机存储介质 |
CN112840649A (zh) * | 2018-09-21 | 2021-05-25 | Lg电子株式会社 | 图像编码系统中通过使用块分割对图像解码的方法及其装置 |
CN113068027A (zh) * | 2018-04-01 | 2021-07-02 | Oppo广东移动通信有限公司 | 使用帧内预测的图像编码/解码方法和装置 |
CN113225557A (zh) * | 2018-09-07 | 2021-08-06 | Oppo广东移动通信有限公司 | 利用帧内预测的图像编码/解码方法以及装置 |
US20210289215A1 (en) * | 2018-12-07 | 2021-09-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder, decoder amd methods for enhancing a robustness for calculation of cross-component linear model parameters |
-
2021
- 2021-09-17 WO PCT/CN2021/119164 patent/WO2023039859A1/zh active Application Filing
- 2021-09-17 CN CN202180098997.7A patent/CN117426088A/zh active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103096051A (zh) * | 2011-11-04 | 2013-05-08 | 华为技术有限公司 | 一种图像块信号分量采样点的帧内解码方法和装置 |
US20180332292A1 (en) * | 2015-11-18 | 2018-11-15 | Mediatek Inc. | Method and apparatus for intra prediction mode using intra prediction filter in video and image compression |
CN109565593A (zh) * | 2016-08-01 | 2019-04-02 | 韩国电子通信研究院 | 图像编码/解码方法和设备以及存储比特流的记录介质 |
CN113068027A (zh) * | 2018-04-01 | 2021-07-02 | Oppo广东移动通信有限公司 | 使用帧内预测的图像编码/解码方法和装置 |
CN113225557A (zh) * | 2018-09-07 | 2021-08-06 | Oppo广东移动通信有限公司 | 利用帧内预测的图像编码/解码方法以及装置 |
CN112840649A (zh) * | 2018-09-21 | 2021-05-25 | Lg电子株式会社 | 图像编码系统中通过使用块分割对图像解码的方法及其装置 |
CN112335245A (zh) * | 2018-10-12 | 2021-02-05 | Oppo广东移动通信有限公司 | 视频图像分量预测方法及装置、计算机存储介质 |
US20210289215A1 (en) * | 2018-12-07 | 2021-09-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder, decoder amd methods for enhancing a robustness for calculation of cross-component linear model parameters |
Non-Patent Citations (1)
Title |
---|
JIN ZHIPENG; AN PING; SHEN LIQUAN: "Video intra prediction using convolutional encoder decoder network", NEUROCOMPUTING, ELSEVIER, AMSTERDAM, NL, vol. 394, 25 June 2019 (2019-06-25), AMSTERDAM, NL , pages 168 - 177, XP086153437, ISSN: 0925-2312, DOI: 10.1016/j.neucom.2019.02.064 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116805971A (zh) * | 2023-04-11 | 2023-09-26 | 腾讯科技(深圳)有限公司 | 图像编解码方法、装置、设备 |
CN116456102A (zh) * | 2023-06-20 | 2023-07-18 | 深圳传音控股股份有限公司 | 图像处理方法、处理设备及存储介质 |
CN116456102B (zh) * | 2023-06-20 | 2023-10-03 | 深圳传音控股股份有限公司 | 图像处理方法、处理设备及存储介质 |
CN116760976A (zh) * | 2023-08-21 | 2023-09-15 | 腾讯科技(深圳)有限公司 | 仿射预测决策方法、装置、设备及存储介质 |
CN116760976B (zh) * | 2023-08-21 | 2023-12-08 | 腾讯科技(深圳)有限公司 | 仿射预测决策方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN117426088A (zh) | 2024-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110720218B (zh) | 与视频译码中的变换处理一起应用的帧内滤波 | |
CN110393010B (zh) | 视频译码中的帧内滤波旗标 | |
TWI728220B (zh) | 用於視訊寫碼之多種類型樹架構 | |
WO2023039859A1 (zh) | 视频编解码方法、设备、系统、及存储介质 | |
US11451840B2 (en) | Trellis coded quantization coefficient coding | |
TW201639364A (zh) | 在視訊寫碼中調色板區塊大小之限制 | |
JP7277586B2 (ja) | モードおよびサイズに依存したブロックレベル制限の方法および装置 | |
WO2024022359A1 (zh) | 一种图像编解码方法及装置 | |
US20230319267A1 (en) | Video coding method and video decoder | |
WO2023044868A1 (zh) | 视频编解码方法、设备、系统、及存储介质 | |
WO2022174475A1 (zh) | 视频编解码方法与系统、及视频编码器与视频解码器 | |
CN115866297A (zh) | 视频处理方法、装置、设备及存储介质 | |
WO2023092404A1 (zh) | 视频编解码方法、设备、系统、及存储介质 | |
WO2022217447A1 (zh) | 视频编解码方法与系统、及视频编解码器 | |
WO2022193389A1 (zh) | 视频编解码方法与系统、及视频编解码器 | |
WO2022155922A1 (zh) | 视频编解码方法与系统、及视频编码器与视频解码器 | |
WO2022193390A1 (zh) | 视频编解码方法与系统、及视频编解码器 | |
WO2023220969A1 (zh) | 视频编解码方法、装置、设备、系统及存储介质 | |
WO2023000182A1 (zh) | 图像编解码及处理方法、装置及设备 | |
WO2023122968A1 (zh) | 帧内预测方法、设备、系统、及存储介质 | |
WO2023236113A1 (zh) | 视频编解码方法、装置、设备、系统及存储介质 | |
WO2023236936A1 (zh) | 一种图像编解码方法及装置 | |
WO2023184248A1 (zh) | 视频编解码方法、装置、设备、系统及存储介质 | |
WO2024192733A1 (zh) | 视频编解码方法、装置、设备、系统、及存储介质 | |
WO2024216632A1 (zh) | 视频编解码方法、装置、设备、系统、及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21957140 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180098997.7 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21957140 Country of ref document: EP Kind code of ref document: A1 |