EP3987808A1 - A method, an apparatus and a computer program product for video encoding and video decoding - Google Patents
A method, an apparatus and a computer program product for video encoding and video decodingInfo
- Publication number
- EP3987808A1 EP3987808A1 EP20826119.8A EP20826119A EP3987808A1 EP 3987808 A1 EP3987808 A1 EP 3987808A1 EP 20826119 A EP20826119 A EP 20826119A EP 3987808 A1 EP3987808 A1 EP 3987808A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- block
- syntax elements
- probability
- order
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000004590 computer program Methods 0.000 title claims description 26
- 230000008569 process Effects 0.000 claims abstract description 42
- 238000000638 solvent extraction Methods 0.000 claims abstract description 15
- 230000002123 temporal effect Effects 0.000 claims description 13
- 238000005192 partition Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 abstract description 40
- 238000012549 training Methods 0.000 description 28
- 230000033001 locomotion Effects 0.000 description 23
- 239000013598 vector Substances 0.000 description 14
- 241000023320 Luma <angiosperm> Species 0.000 description 12
- 238000003491 array Methods 0.000 description 12
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 10
- 238000005070 sampling Methods 0.000 description 10
- 230000007246 mechanism Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000000750 progressive effect Effects 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000011664 signaling Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
- H04N19/463—Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
Definitions
- the present solution generally relates to video encoding and video decoding. Background
- a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission, and a decoder that can uncompress the compressed video representation back into a viewable form.
- the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
- Hybrid video codecs may encode the video information in two phases. Firstly, sample values (i.e. pixel values) in a certain picture area are predicted, e.g., by motion compensation means or by spatial means. Secondly, the prediction error, i.e. the difference between the prediction block of samples and the original block of samples, is coded.
- the video decoder on the other hand reconstructs the output video by applying prediction means similar to the encoder to form a prediction representation of the sample blocks and prediction error decoding. After applying prediction and prediction error decoding means, the decoder sums up the prediction and prediction error signals to form the output video frame.
- the probability of usage of each coding tool is determined according to a learning process.
- the learning process is based on already coded group of pictures.
- the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
- the probability model is different for blocks having different sizes.
- the probability model is different for different regions of the source picture.
- the order for syntax elements is defined based on a pre encoding and decoding process modelling the determined probabilities.
- an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive a source picture; partition the source picture into a set of non-overlapping blocks; for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks; derive a probability model according to the determined probabilities; define an order for syntax elements for a block according to the probability model; and encode syntax elements of the block into a bitstream according to the defined order.
- the apparatus further comprising computer program code to cause the apparatus to determine the probability of usage of each coding tool according to a learning process.
- the learning process is based on already coded group of pictures.
- the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
- the probability model is different for blocks having different sizes.
- the probability model is different for different regions of the source picture.
- the apparatus comprises means for defining the order for syntax elements based on a pre-encoding and decoding process modelling the determined probabilities.
- a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive a source picture; partition the source picture into a set of non overlapping blocks; for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks; to derive a probability model according to the determined probabilities; to define an order for syntax elements for a block according to the probability model; and to encode syntax elements of the block into a bitstream according to the defined order.
- the computer program product is embodied on a non- transitory computer readable medium.
- FIG. 1 shows an encoding process according to an embodiment
- FIG. 2 shows a decoding process according to an embodiment
- Fig. 3 shows syntax element signaling order of Merge tools in VVC
- Fig. 4 shows an example of previously coded blocks that are used for ordering the syntax elements of the current block
- Fig. 5 shows an adaptive syntax element ordering procedure according to an embodiment
- Fig. 6 shows an example of a present embodiments applied for different block sizes
- Fig. 7 shows an example of an offline training
- Fig. 8 shows an example of an online training and online fine-tuning neural network
- Fig. 9 shows an example of offline training and online fine-tuning neural network
- FIG. 10 is a flowchart illustrating a method according to an embodiment
- FIG. 11 shows an apparatus according to an embodiment. Description of Example Embodiments
- the invention is not limited to this particular arrangement.
- the invention may be applicable to video coding systems like streaming system, DVD (Digital Versatile Disc) players, digital television receivers, personal video recorders, systems and computer programs on personal computers, handheld computers and communication devices, as well as network elements such as transcoders and cloud computing arrangements where video data is handled.
- DVD Digital Versatile Disc
- the Advanced Video Coding standard (which maybe abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC).
- JVT Joint Video Team
- MPEG Moving Picture Experts Group
- ISO International Organization for Standardization
- ISO International Electrotechnical Commission
- the H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU- T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC).
- AVC MPEG-4 Part 10 Advanced Video Coding
- High Efficiency Video Coding standard (which may be abbreviated HEVC or H.265/HEVC) was developed by the Joint Collaborative Team - Video Coding (JCT-VC) of VCEG and MPEG.
- JCT-VC Joint Collaborative Team - Video Coding
- the standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.265 and ISO/IEC International Standard 23008- 2, also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC).
- Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D-HEVC, and REXT, respectively.
- bitstream and coding structures, and concepts of H.264/ AVC and HEVC and some of their extensions are described in this section as an example of a video encoder, decoder, encoding method, decoding method, and a bitstream structure, wherein the embodiments may be implemented.
- Some of the key definitions, bitstream and coding structures, and concepts of H.264/AVC are the same as in HEVC standard - hence, they are described below jointly.
- the aspects of various embodiments are not limited to H.264/ AVC or HEVC or their extensions, but rather the description is given for one possible basis on top of which the present embodiments may be partly or fully realized.
- a syntax element may be defined as an element of data represented in the bitstream.
- a syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order.
- bitstream syntax and semantics as well as the decoding process for error-free bitstreams are specified in H.264/AVC and HEVC.
- the encoding process is not specified, but encoders must generate conforming bitstreams.
- Bitstream and decoder conformance can be verified with the Hypothetical Reference Decoder (HRD).
- HRD Hypothetical Reference Decoder
- the standards contain coding tools that help in coping with transmission errors and losses, but the use of the tools in encoding is optional and no decoding process has been specified for erroneous bitstreams.
- the elementary unit for the input to an H.264/AVC or HEVC encoder and the output of an H.264/AVC or HEVC decoder, respectively, is a picture.
- a picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoder may be referred to as a decoded picture.
- the source and decoded pictures may each be comprised of one or more sample arrays, such as one of the following sets of sample arrays, wherein each of the samples may represent one color component:
- RGB Green, Blue and Red
- these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use.
- the actual color representation method in use may be indicated e.g. in a coded bitstream e.g. using the Video Usability Information (VUI) syntax of H.264/AVC and/or HEVC.
- VUI Video Usability Information
- a component may be defined as an array or a single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
- a picture may either be a frame or a field.
- a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
- a field is a set of alternate sample rows of a frame. Fields may be used as encoder input for example when the source signal is interlaced.
- Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or may be subsampled when compared to luma sample arrays.
- Some chroma formats may be summarized as follows: - In monochrome sampling there is only one sample array, which may be nominally considered the luma array.
- each of the two chroma arrays has half the height and half the width of the luma array.
- each of the two chroma arrays has the same height and half the width of the luma array.
- each of the two chroma arrays has the same height and width as the luma array.
- the location of chroma samples with respect to luma samples may be determined in the encoder side (e.g. as pre-processing step or as part of encoding).
- the chroma sample positions with respect to luma sample positions may be pre-defined for example in a coding standard, such as H.264/AVC or HEVC, or may be indicated in the bitstream for example as part of VUI of H.264/AVC or HEVC.
- the source video sequence(s) provided as input for encoding may either represent interlaced source content or progressive source content. Fields of opposite parity have been captured at different times for interlaced source content. Progressive source content contains captured frames.
- An encoder may encode fields of interlaced source content in two ways: a pair of interlaced fields may be coded into a coded frame or a field may be coded as a coded field.
- an encoder may encode frames of progressive source content in two ways: a frame of progressive source content may be coded into a coded frame or a pair of coded fields.
- a field pair or a complementary field pair may be defined as two fields next to each other in decoding and/or output order, having opposite parity (i.e.
- Some video coding standards or schemes allow mixing of coded frames and coded fields in the same coded video sequence.
- predicting a coded field from a field in a coded frame and/or predicting a coded frame for a complementary field pair may be enabled in encoding and/or decoding.
- a partitioning may be defined as a division of a set into subsets such that each element of the set is in exactly one of the subsets.
- a picture partitioning may be defined as a division of a picture into smaller non-overlapping units.
- a block partitioning may be defined as a division of a block into smaller non-overlapping units, such as sub-blocks.
- term block partitioning may be considered to cover multiple levels of partitioning, for example partitioning of a picture into slices, and partitioning of each slice into smaller units, such as macroblocks of H.264/AVC. It is noted that the same unit, such as a picture, may have more than one partitioning. For example, a coding unit of HEVC may be partitioned into prediction units and separately by another quadtree into transform units.
- Hybrid video codecs may encode the video information in two phases. At first, pixel value in a certain picture area (or“block”) are predicted for example by motion compensation means or by spatial means. In the first phase, predictive coding may be applied, for example, as so-called sample prediction and/or so-called syntax prediction.
- sample prediction pixel or sample values in a certain picture area or“block” are predicted. These pixel or sample values can be predicted, for example, using one or more of the following ways:
- Motion compensation mechanisms (which may also be referred to as inter prediction, temporal prediction or motion-compensation temporal prediction or motion-compensated prediction or MCP) involve finding and indicating an area in one of the previously encoded video frames that corresponds closely to the block being coded. Inter prediction may reduce temporal redundancy.
- Intra prediction where pixel or sample values can be predicted by spatial mechanisms, involve finding and indicating a spatial region relationship. Intra prediction can be performed in spatial or transform domain, i.e., either sample values or transform coefficients can be predicted. Intra prediction may be exploited in intra coding, where no inter-prediction is applied.
- syntax prediction which may also be referred to as parameter prediction
- syntax elements and/or syntax element values and/or variables derived from syntax elements are predicted from syntax elements (de)coded earlier and/or variables derived earlier.
- syntax prediction are provided below:
- motion vectors e.g. for inter and/or inter-view prediction may be coded differentially with respect to a block-specific predicted motion vector.
- the predicted motion vectors are created in a predefined way, for example by calculating the media of the encoded or decoded motion vectors of the adjacent blocks.
- Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor.
- the reference index of previously coded/decoded picture can be predicted. Differential coding of motion vectors may be disabled according to slice boundaries.
- the block partitioning e.g. from CTU to CUs and down to PUs, may be predicted.
- filter parameter prediction the filtering parameters e.g. for sample adaptive offset may be predicted.
- Prediction approaches using image information from a previously coded image can also be called as inter prediction methods which may also be referred to as temporal prediction and motion compensation.
- Prediction approaches using image information within the same image can also be called as intra prediction methods.
- the second phase is coding the error between the prediction block of samples and the original block of samples. This may be accomplished by transforming the difference in sample values using a specified transform. This transform may be e.g. a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized, and entropy coded.
- DCT Discrete Cosine Transform
- encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size of transmission bitrate).
- the decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a prediction representation of the sample blocks (using the motion or spatial information created by the encoder and included in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized error signal in the spatial domain).
- the decoder After applying sample prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the sample values) to form the output video frame.
- FIG. 1 illustrates an image to be encoded (I n ); a prediction representation of an image block (P’ n ); a prediction error signal (D n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (F n ); a final reconstructed image (R’ n ); a transform (T) and inverse transform a quantization (Q) and inverse quantization (Q 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pi nter ); intra prediction (P mtra ); mode selection (MS) and filtering (F).
- FIG. 2 illustrates a prediction representation of an image block (P’ n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (T n ); a final reconstructed image (R’ n ); an inverse transform (T-l); an inverse quantization (Q 1 ); an entropy decoding (E 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
- the decoder may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming pictures in the video sequence.
- motion information is indicated by motion vectors associated with each motion compensated image block.
- Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder), and the prediction source block in one of the previously coded or decoded image (or pictures).
- H.264/AVC and HEVC as many other video compression standards, a picture is divided into a mesh of rectangles, for each of which a similar block in one of the reference pictures is indicated for inter prediction. The location of the prediction block is coded as a motion vector that indicates the position of the prediction block relative to the block being coded.
- a neural network is a computation graph that may comprise several layers of successive computation.
- a layer may comprise one or more units“neurons” performing an elementary computation.
- a unit may be connected to one or more units, and the connection may have associated a weight. The weight may be used for scaling the signal passing through the associated connection. Weights may be leamable parameters, i.e., values which can be learned from training data.
- Each layer is configured to extract data from the input data.
- Data form lower layers may represent low-level semantics, whereas higher layers represent higher-level semantics.
- a feed-forward NN and recurrent NN represent examples of neural network architectures.
- Feed-forward neural networks do not have a feedback loop: each layer takes input from one or more of the subsequent layers. Also, units inside a certain layer may take input from units in one or more of the preceding layers, and provide output to one or more of the following layers.
- an untrained neural network model has to go through a training phase.
- the training phase is the development phase, where the neural network learns to perform the final task.
- a training data set that is used for training a neural network is supposed to be representative of the data on which the neural network will be used.
- the neural network uses the examples in the training dataset to modify its leamable parameters (e.g., its connections’ weights) in order to achieve the desired task. Training may happen by minimizing or decreasing the output’s error, also referred to as the loss. Examples of the losses are mean squared error, cross-entropy, etc.
- training is an iterative process, where at each iteration, the algorithm modifies the weights of the neural network to make a gradual improvement of the network’s output, i.e., to gradually decrease the loss.
- the trained neural network model is applied to new data during an interference phase, in which the neural network performs the desired task to which it has been trained for.
- the inference phase is also known as a testing phase.
- the neural network provides an output which is a result of the inference on the new data.
- the result can be a classification result or a recognition result.
- Training can be performed in several ways. The main ones are supervised, unsupervised, and reinforcement training.
- supervised training the neural network model is provided with input-output pairs, where the output may be a [0071] label.
- unsupervised training the neural network is provided only with input data (and also with output raw data in case of self-supervised training).
- reinforcement learning the supervision is sparser and less precise; instead of input-output pairs, the neural network gets input data and, sometimes, delayed rewards in the form of scores (E.g., -1, 0, or +1).
- the learning is a result of a training algorithm, or of a meta-level neural network providing the training signal.
- the training algorithm comprises changing some properties of the neural network so that its output is as close as possible to a desired output.
- the syntax elements (SEs) in a category of coding tools are coded and transmitted in the bitstream in a certain and fixed order.
- the ordering may be done based on the probability of the usage of each tool in the entire sequence. However, such ordering is sub-optimal since the probability of usage of a certain coding tool depends on different properties such as: content in each image or coding block, temporal distance of each frame to the reference frame, quality of each frame or block relative to its reference, size of the block, etc.
- FIG. 3 illustrates the syntax element order of merge coding tools in current version of Versatile Coding standard (H.266/VVC).
- merge coding tools 310 which are encoded or decoded in a fixed order. These merge coding tools comprise normal merge mode 320, merge with motion vector difference (MMVD) mode 330, sub-block merge mode 340, Combined Intra Inter Prediction (CUP) mode 350 and a triangle mode 360.
- MMVD motion vector difference
- CUP Combined Intra Inter Prediction
- CUP Combined Intra-Inter
- the decoder has to decode Normal merge flag 320, MMVD flag 330 and Sub block flag 340 syntax elements before parsing and decoding the CUP syntax element 350.
- the usage of CUP merge mode 350 for the content is higher than usage of the other tools in the category, then the bitrate will increase since these syntax elements are coded into the bitstream in a block level. The same issue applies to the other syntax elements in this category but also in general with the entire video coding tools.
- the present embodiments are targeted to solve the issue of having a fixed order for coding the syntax elements (SEs) in video coding, and propose an adaptive re-ordering of syntax elements in order to provide an efficient model for coding syntax elements.
- SEs syntax elements
- the adaptive order for the syntax elements according to the embodiments may be based on a learning process from the video content.
- the learning process may be implemented according to one or more of the following options:
- a probability of the usage of each coding tool in a coding category in each frame can be determined based on a learning process from the previously encoded/decoded blocks in the picture(s).
- the probability of the usage of each coding tool may be determined based on a certain property. For example, the probability of the usage can be determined based on a block size, the quality of the current block compared to the reference block, the temporal distance of a block inside the picture compared to the reference block, etc.
- the learning may be based on already coded group of pictures (GOP).
- GOP group of pictures
- at least one previously coded picture s coding information is used for learning the syntax element of each block in the current picture. This is adaptively improved when more frames are involved.
- encoding in two steps may be performed to the video.
- a video may be partially or fully coded in order to derive the picture/block level information of the coding tools for the learning process.
- the syntax elements are re-ordered based on the information that is extracted from the first round of encoding.
- the syntax elements (SEs) of a group of coding tools can be implemented according to various embodiments. For example, there can be N prediction tools with distinct syntax elements (SE1, SE2, ..., SEN) for coding a block in the codec.
- these syntax elements may be adaptively re ordered based on their probability of usage in the codec. This means that the more probable a certain tool is used, the earlier it’s syntax element is placed in the order of syntax elements. The probability of the usage may be estimated based on a learning process from the previously coded blocks.
- Figure 4 illustrates an example of the current coding block and the previously coded blocks.
- a probability model may be derived, which probability model defines the ordering of the syntax elements.
- the model may be derived based on at least one previously coded block. However, in order to have more precise probability model, the coding information of more blocks are needed, e.g. one or more CTU rows/columns of the picture.
- the order of syntax elements in a group of coding tools can be modelled based on the probability of the corresponding syntax elements in previous blocks.
- Figure 5 illustrates a procedure for re-ordering syntax elements in the encoding and decoding process, according to an embodiment.
- syntax elements of an input block 510 may be coded in a different order than other blocks.
- the syntax elements may be coded based on a pre-defined and fixed syntax elements order 520.
- its coding information i.e., the syntax element information is fed to the syntax element modelling process 540 that models the order based on the probability of the usage of the coding tools in previous block(s).
- the syntax elements modelling 540 is then used for updating 550 the syntax elements order. Consequently, the syntax elements are re-ordered based on the coding information from the previous block(s).
- the re-ordered syntax elements are then used for coding the syntax elements of the next block(s).
- the modelling can be done by calculating the probability of usage of each syntax element or coding tool in previous blocks and by re-ordering the syntax elements from highest probable SE to the lowest probable SE.
- the modelling can be done by using a set of neural networks (NNs) for the learning process and deciding the order of SEs based on the previously coded block(s) SE information.
- NNs neural networks
- the neural network can be trained according to one of the following ways:
- Figure 7 shows an example of offline training of a neural network.
- the NN is trained (separately outside of the coding chain with the same or different data as training set) prior to encoding and then it is used for syntax element re-ordering.
- the input block(s) 710 are provided for neural network 720.
- the neural network 720 models the syntax element order 730 for each block based on the probability of the usage of the coding tools in previous block(s). Consequently, the syntax elements are re-ordered based on the coding information from the neural network.
- the syntax elements are then (de)coded according to the computed the syntax elements order.
- Figure 8 shows an example of online training and online fine-tuning of neural network.
- the NN is trained in the encoding/decoding process (i.e., online learning) and then used for SE re-ordering.
- more than one block’s coding information may be needed.
- the NN may use the coding information of at least one CTU row or column for the training operation.
- the model is used for SE re-ordering.
- the NN may be later fine-tuned based on the next blocks’ coding information.
- Figure 9 shows an example of offline training and online fine-tuning of neural network.
- the NN is trained similar as in the example of Figure 7 (i.e., outside of the coding chain), but the coding information of the blocks in the encoding/decoding process is used for fine-tuning the NN.
- the fine-tuning process there may be another NN is used in the coding chain.
- the process of modelling the SEs’ order may be done differently for different block sizes.
- a distinct SEs ordering model may be defined and the order may be updated based on the previously coded blocks with the same size. This is illustrated in Figure 6.
- the syntax elements modelling and ordering may be done based on the coding information from the previous frame(s).
- a syntax elements model may be derived from the coding information of the blocks in the frame.
- the syntax elements model may then be used for ordering the syntax elements of the next frame(s). It is appreciated that there may be more than one frame used for the modelling the syntax elements order.
- a separate syntax elements model may be derived.
- the syntax elements of this block is ordered based on the syntax elements model of the reference picture.
- the ordering of syntax elements may be done based on pre-encoding and decoding process. For example, one or more frames of the video may be pre encoded and decoded prior to the final encoding/decoding. Accordingly, the syntax elements modelling can be done with coding information and probabilities of the coding tools in the pre encoding/decoding process. At least one syntax elements order may be derived and stored in the codec for coding the syntax elements in the final encoding/decoding process.
- syntax elements order derived for coding the syntax elements of the blocks in the final encoding/decoding process there may be more than one syntax elements order derived for coding the syntax elements of the blocks in the final encoding/decoding process.
- the order may be derived based on different aspects as below:
- the syntax elements order may be derived based on the size of the blocks.
- the quality level of each frame may be different in the video sequence. Moreover, the quality of each block may also differ inside each picture. Hence, the syntax elements order may be derived based on the quality level of each block.
- the syntax elements order may be derived based on the temporal ID of the frame in the group of pictures (GOP).
- the codec may decide to use the one with proper properties, e.g. based on a certain block size, temporal ID or quality.
- the encoder may select the proper order based on a rate-distortion optimization by using all the available syntax elements orders for a certain group of coding tools. In such case, there may be a signaling of the index of the corresponding syntax elements order to the decoder.
- the syntax elements order modelling can be done for different regions of the image/video differently.
- the syntax elements modelling, and re-ordering may be done differently for each area.
- at least one syntax elements modelling may be applied to polar areas and at least one to the equator areas.
- syntax elements modelling may be done in encoder side and/or decoder side.
- FIG. 10 is a flowchart illustrating a method according to an embodiment.
- a method comprises receiving 1010 a source picture; partitioning 1020 the source picture into a set of non-overlapping blocks; for a first block being coded, coding 1030 syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, determining 1040 a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving 1050 a probability model according to the determined probabilities; defining 1060 an order for syntax elements for a block according to the probability model; and 1070 syntax elements of the block into a bitstream according to the defined order.
- An apparatus comprises means for receiving a source picture; means for partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, means for coding syntax elements of various coding tools according to a pre-defmed order into the bitstream; for any one or more subsequent block, means for determining a probability of a usage of each of the various coding tools in any one or more previous blocks; means for deriving a probability model according to the determined probabilities; means for defining an order for syntax elements for a block according to the probability model; and means for encoding syntax elements of the block into a bitstream according to the defined order.
- the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
- the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 10 according to various embodiments.
- An apparatus according to an embodiment is illustrated in Figure 11.
- An apparatus of this embodiment is a camera having multiple lenses and imaging sensors, but also other types of cameras may be used to capture wide view images and/or wide view video.
- wide view image and wide view video mean an image and a video, respectively, which comprise visual information having a relatively large viewing angle, larger than 100 degrees.
- a so called 360 panorama image/video as well as images/videos captured by using a fisheye lens may also be called as a wide view image/video in this specification.
- the wide view image/video may mean an image/video in which some kind of projection distortion may occur when a direction of view changes between successive images or frames of the video so that a transform may be needed to find out co located samples from a reference image or a reference frame. This will be described in more detail later in this specification.
- the camera 2700 of Figure 11 comprises two or more camera units 2701 and is capable of capturing wide view images and/or wide view video.
- Each camera unit 2701 is located at a different location in the multi-camera system and may have a different orientation with respect to other camera units 2501.
- the camera units 2701 may have an omnidirectional constellation so that it has a 360-viewing angle in a 3D-space. In other words, such camera 2700 may be able to see each direction of a scene so that each spot of the scene around the camera 2700 can be viewed by at least one camera unit 2701.
- the camera 2700 of Figure 11 may also comprise a processor 2704 for controlling the operations of the camera 2700. There may also be a memory 2706 for storing data and computer code to be executed by the processor 2704, and a transceiver 2708 for communicating with, for example, a communication network and/or other devices in a wireless and/or wired manner.
- the camera 2700 may further comprise a user interface (UI) 2710 for displaying information to the user, for generating audible signals and/or for receiving user input.
- UI user interface
- the camera 2700 need not comprise each feature mentioned above or may comprise other features as well. For example, there may be electric and/or mechanical elements for adjusting and/or controlling optics of the camera units 2701 (not shown).
- Figure 11 also illustrates some operational elements which may be implemented, for example, as a computer code in the software of the processor, in a hardware, or both.
- a focus control element 2714 may perform operations related to adjustment of the optical system of camera unit or units to obtain focus meeting target specifications or some other predetermined criteria.
- An optics adjustment element 2716 may perform movements of the optical system or one or more parts of it according to instructions provided by the focus control element 2714. It should be noted here that the actual adjustment of the optical system need not be performed by the apparatus, but it may be performed manually, wherein the focus control element 2714 may provide information for the user interface 2710 to indicate a user of the device how to adjust the optical system.
- a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
- a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
- the computer program code comprises one or more operational characteristics.
- Said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus, wherein a programmable operational characteristic of the system comprises receiving a source picture; partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, coding syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, determining a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving a probability model according to the determined probabilities; defining an order for syntax elements for a block according to the probability model; and encoding syntax elements of the block into a bitstream according to the defined order.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962863490P | 2019-06-19 | 2019-06-19 | |
PCT/FI2020/050421 WO2020254723A1 (en) | 2019-06-19 | 2020-06-12 | A method, an apparatus and a computer program product for video encoding and video decoding |
Publications (2)
Publication Number | Publication Date |
---|---|
EP3987808A1 true EP3987808A1 (en) | 2022-04-27 |
EP3987808A4 EP3987808A4 (en) | 2023-07-05 |
Family
ID=74036977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20826119.8A Withdrawn EP3987808A4 (en) | 2019-06-19 | 2020-06-12 | A method, an apparatus and a computer program product for video encoding and video decoding |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3987808A4 (en) |
WO (1) | WO2020254723A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114119789B (en) * | 2022-01-27 | 2022-05-03 | 电子科技大学 | Lightweight HEVC chrominance image quality enhancement method based on online learning |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013128010A2 (en) * | 2012-03-02 | 2013-09-06 | Canon Kabushiki Kaisha | Method and devices for encoding a sequence of images into a scalable video bit-stream, and decoding a corresponding scalable video bit-stream |
US10404999B2 (en) * | 2013-09-27 | 2019-09-03 | Qualcomm Incorporated | Residual coding for depth intra prediction modes |
US10264268B2 (en) * | 2015-04-07 | 2019-04-16 | Shenzhen Boyan Technology Ltd. | Pre-encoding for high efficiency video coding |
US10979718B2 (en) * | 2017-09-01 | 2021-04-13 | Apple Inc. | Machine learning video processing systems and methods |
CN117768643A (en) * | 2017-10-13 | 2024-03-26 | 弗劳恩霍夫应用研究促进协会 | Intra prediction mode concept for block-wise slice coding |
-
2020
- 2020-06-12 WO PCT/FI2020/050421 patent/WO2020254723A1/en unknown
- 2020-06-12 EP EP20826119.8A patent/EP3987808A4/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
EP3987808A4 (en) | 2023-07-05 |
WO2020254723A1 (en) | 2020-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3633990B1 (en) | An apparatus and method for using a neural network in video coding | |
CN114339221B (en) | Convolutional neural network based filter for video encoding and decoding | |
US11290722B2 (en) | Method and apparatus for video encoding or decoding | |
GB2548358A (en) | A method, an apparatus and a computer program product for coding a 360-degree panoramic images and video | |
US11496746B2 (en) | Machine learning based rate-distortion optimizer for video compression | |
KR20210125088A (en) | Encoders, decoders and corresponding methods harmonizing matrix-based intra prediction and quadratic transform core selection | |
WO2020015433A1 (en) | Method and apparatus for intra prediction using cross-component linear model | |
US20150092862A1 (en) | Modified hevc transform tree syntax | |
CN118101948A (en) | Encoder, decoder and corresponding deduction method of boundary strength of deblocking filter | |
US20220394288A1 (en) | Parameter Update of Neural Network-Based Filtering | |
CN115037948A (en) | Neural network based video coding and decoding loop filter with residual scaling | |
US20220337853A1 (en) | On Neural Network-Based Filtering for Imaging/Video Coding | |
CN113615173A (en) | Method and device for carrying out optical flow prediction correction on affine decoding block | |
US12015785B2 (en) | No reference image quality assessment based decoder side inter prediction | |
CN115550646A (en) | External attention in neural network-based video codec | |
KR20210129736A (en) | Optical flow-based video inter prediction | |
JP2024019652A (en) | Method and device for video filtering | |
EP3987808A1 (en) | A method, an apparatus and a computer program product for video encoding and video decoding | |
US20220394309A1 (en) | On Padding Methods For Neural Network-Based In-Loop Filter | |
US11729424B2 (en) | Visual quality assessment-based affine transformation | |
CN115209142A (en) | Unified neural network loop filter | |
EP3973709A2 (en) | A method, an apparatus and a computer program product for video encoding and video decoding | |
US20220159281A1 (en) | No Reference Image Quality Assessment Based Decoder Side Intra Prediction | |
WO2023194650A1 (en) | A method, an apparatus and a computer program product for video coding | |
WO2024002579A1 (en) | A method, an apparatus and a computer program product for video coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220119 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20230602 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 20/00 20190101ALI20230526BHEP Ipc: G06N 3/04 20060101ALI20230526BHEP Ipc: H04N 19/149 20140101ALI20230526BHEP Ipc: G06T 9/00 20060101ALI20230526BHEP Ipc: G06N 3/08 20060101ALI20230526BHEP Ipc: H04N 19/192 20140101ALI20230526BHEP Ipc: H04N 19/70 20140101AFI20230526BHEP |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20240103 |