EP3987808A1 - A method, an apparatus and a computer program product for video encoding and video decoding - Google Patents

A method, an apparatus and a computer program product for video encoding and video decoding

Info

Publication number
EP3987808A1
EP3987808A1 EP20826119.8A EP20826119A EP3987808A1 EP 3987808 A1 EP3987808 A1 EP 3987808A1 EP 20826119 A EP20826119 A EP 20826119A EP 3987808 A1 EP3987808 A1 EP 3987808A1
Authority
EP
European Patent Office
Prior art keywords
block
syntax elements
probability
order
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP20826119.8A
Other languages
German (de)
French (fr)
Other versions
EP3987808A4 (en
Inventor
Ramin GHAZNAVI YOUVALARI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of EP3987808A1 publication Critical patent/EP3987808A1/en
Publication of EP3987808A4 publication Critical patent/EP3987808A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission

Definitions

  • the present solution generally relates to video encoding and video decoding. Background
  • a video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission, and a decoder that can uncompress the compressed video representation back into a viewable form.
  • the encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
  • Hybrid video codecs may encode the video information in two phases. Firstly, sample values (i.e. pixel values) in a certain picture area are predicted, e.g., by motion compensation means or by spatial means. Secondly, the prediction error, i.e. the difference between the prediction block of samples and the original block of samples, is coded.
  • the video decoder on the other hand reconstructs the output video by applying prediction means similar to the encoder to form a prediction representation of the sample blocks and prediction error decoding. After applying prediction and prediction error decoding means, the decoder sums up the prediction and prediction error signals to form the output video frame.
  • the probability of usage of each coding tool is determined according to a learning process.
  • the learning process is based on already coded group of pictures.
  • the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
  • the probability model is different for blocks having different sizes.
  • the probability model is different for different regions of the source picture.
  • the order for syntax elements is defined based on a pre encoding and decoding process modelling the determined probabilities.
  • an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive a source picture; partition the source picture into a set of non-overlapping blocks; for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks; derive a probability model according to the determined probabilities; define an order for syntax elements for a block according to the probability model; and encode syntax elements of the block into a bitstream according to the defined order.
  • the apparatus further comprising computer program code to cause the apparatus to determine the probability of usage of each coding tool according to a learning process.
  • the learning process is based on already coded group of pictures.
  • the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
  • the probability model is different for blocks having different sizes.
  • the probability model is different for different regions of the source picture.
  • the apparatus comprises means for defining the order for syntax elements based on a pre-encoding and decoding process modelling the determined probabilities.
  • a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive a source picture; partition the source picture into a set of non overlapping blocks; for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks; to derive a probability model according to the determined probabilities; to define an order for syntax elements for a block according to the probability model; and to encode syntax elements of the block into a bitstream according to the defined order.
  • the computer program product is embodied on a non- transitory computer readable medium.
  • FIG. 1 shows an encoding process according to an embodiment
  • FIG. 2 shows a decoding process according to an embodiment
  • Fig. 3 shows syntax element signaling order of Merge tools in VVC
  • Fig. 4 shows an example of previously coded blocks that are used for ordering the syntax elements of the current block
  • Fig. 5 shows an adaptive syntax element ordering procedure according to an embodiment
  • Fig. 6 shows an example of a present embodiments applied for different block sizes
  • Fig. 7 shows an example of an offline training
  • Fig. 8 shows an example of an online training and online fine-tuning neural network
  • Fig. 9 shows an example of offline training and online fine-tuning neural network
  • FIG. 10 is a flowchart illustrating a method according to an embodiment
  • FIG. 11 shows an apparatus according to an embodiment. Description of Example Embodiments
  • the invention is not limited to this particular arrangement.
  • the invention may be applicable to video coding systems like streaming system, DVD (Digital Versatile Disc) players, digital television receivers, personal video recorders, systems and computer programs on personal computers, handheld computers and communication devices, as well as network elements such as transcoders and cloud computing arrangements where video data is handled.
  • DVD Digital Versatile Disc
  • the Advanced Video Coding standard (which maybe abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC).
  • JVT Joint Video Team
  • MPEG Moving Picture Experts Group
  • ISO International Organization for Standardization
  • ISO International Electrotechnical Commission
  • the H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU- T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC).
  • AVC MPEG-4 Part 10 Advanced Video Coding
  • High Efficiency Video Coding standard (which may be abbreviated HEVC or H.265/HEVC) was developed by the Joint Collaborative Team - Video Coding (JCT-VC) of VCEG and MPEG.
  • JCT-VC Joint Collaborative Team - Video Coding
  • the standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.265 and ISO/IEC International Standard 23008- 2, also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC).
  • Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D-HEVC, and REXT, respectively.
  • bitstream and coding structures, and concepts of H.264/ AVC and HEVC and some of their extensions are described in this section as an example of a video encoder, decoder, encoding method, decoding method, and a bitstream structure, wherein the embodiments may be implemented.
  • Some of the key definitions, bitstream and coding structures, and concepts of H.264/AVC are the same as in HEVC standard - hence, they are described below jointly.
  • the aspects of various embodiments are not limited to H.264/ AVC or HEVC or their extensions, but rather the description is given for one possible basis on top of which the present embodiments may be partly or fully realized.
  • a syntax element may be defined as an element of data represented in the bitstream.
  • a syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order.
  • bitstream syntax and semantics as well as the decoding process for error-free bitstreams are specified in H.264/AVC and HEVC.
  • the encoding process is not specified, but encoders must generate conforming bitstreams.
  • Bitstream and decoder conformance can be verified with the Hypothetical Reference Decoder (HRD).
  • HRD Hypothetical Reference Decoder
  • the standards contain coding tools that help in coping with transmission errors and losses, but the use of the tools in encoding is optional and no decoding process has been specified for erroneous bitstreams.
  • the elementary unit for the input to an H.264/AVC or HEVC encoder and the output of an H.264/AVC or HEVC decoder, respectively, is a picture.
  • a picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoder may be referred to as a decoded picture.
  • the source and decoded pictures may each be comprised of one or more sample arrays, such as one of the following sets of sample arrays, wherein each of the samples may represent one color component:
  • RGB Green, Blue and Red
  • these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use.
  • the actual color representation method in use may be indicated e.g. in a coded bitstream e.g. using the Video Usability Information (VUI) syntax of H.264/AVC and/or HEVC.
  • VUI Video Usability Information
  • a component may be defined as an array or a single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
  • a picture may either be a frame or a field.
  • a frame comprises a matrix of luma samples and possibly the corresponding chroma samples.
  • a field is a set of alternate sample rows of a frame. Fields may be used as encoder input for example when the source signal is interlaced.
  • Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or may be subsampled when compared to luma sample arrays.
  • Some chroma formats may be summarized as follows: - In monochrome sampling there is only one sample array, which may be nominally considered the luma array.
  • each of the two chroma arrays has half the height and half the width of the luma array.
  • each of the two chroma arrays has the same height and half the width of the luma array.
  • each of the two chroma arrays has the same height and width as the luma array.
  • the location of chroma samples with respect to luma samples may be determined in the encoder side (e.g. as pre-processing step or as part of encoding).
  • the chroma sample positions with respect to luma sample positions may be pre-defined for example in a coding standard, such as H.264/AVC or HEVC, or may be indicated in the bitstream for example as part of VUI of H.264/AVC or HEVC.
  • the source video sequence(s) provided as input for encoding may either represent interlaced source content or progressive source content. Fields of opposite parity have been captured at different times for interlaced source content. Progressive source content contains captured frames.
  • An encoder may encode fields of interlaced source content in two ways: a pair of interlaced fields may be coded into a coded frame or a field may be coded as a coded field.
  • an encoder may encode frames of progressive source content in two ways: a frame of progressive source content may be coded into a coded frame or a pair of coded fields.
  • a field pair or a complementary field pair may be defined as two fields next to each other in decoding and/or output order, having opposite parity (i.e.
  • Some video coding standards or schemes allow mixing of coded frames and coded fields in the same coded video sequence.
  • predicting a coded field from a field in a coded frame and/or predicting a coded frame for a complementary field pair may be enabled in encoding and/or decoding.
  • a partitioning may be defined as a division of a set into subsets such that each element of the set is in exactly one of the subsets.
  • a picture partitioning may be defined as a division of a picture into smaller non-overlapping units.
  • a block partitioning may be defined as a division of a block into smaller non-overlapping units, such as sub-blocks.
  • term block partitioning may be considered to cover multiple levels of partitioning, for example partitioning of a picture into slices, and partitioning of each slice into smaller units, such as macroblocks of H.264/AVC. It is noted that the same unit, such as a picture, may have more than one partitioning. For example, a coding unit of HEVC may be partitioned into prediction units and separately by another quadtree into transform units.
  • Hybrid video codecs may encode the video information in two phases. At first, pixel value in a certain picture area (or“block”) are predicted for example by motion compensation means or by spatial means. In the first phase, predictive coding may be applied, for example, as so-called sample prediction and/or so-called syntax prediction.
  • sample prediction pixel or sample values in a certain picture area or“block” are predicted. These pixel or sample values can be predicted, for example, using one or more of the following ways:
  • Motion compensation mechanisms (which may also be referred to as inter prediction, temporal prediction or motion-compensation temporal prediction or motion-compensated prediction or MCP) involve finding and indicating an area in one of the previously encoded video frames that corresponds closely to the block being coded. Inter prediction may reduce temporal redundancy.
  • Intra prediction where pixel or sample values can be predicted by spatial mechanisms, involve finding and indicating a spatial region relationship. Intra prediction can be performed in spatial or transform domain, i.e., either sample values or transform coefficients can be predicted. Intra prediction may be exploited in intra coding, where no inter-prediction is applied.
  • syntax prediction which may also be referred to as parameter prediction
  • syntax elements and/or syntax element values and/or variables derived from syntax elements are predicted from syntax elements (de)coded earlier and/or variables derived earlier.
  • syntax prediction are provided below:
  • motion vectors e.g. for inter and/or inter-view prediction may be coded differentially with respect to a block-specific predicted motion vector.
  • the predicted motion vectors are created in a predefined way, for example by calculating the media of the encoded or decoded motion vectors of the adjacent blocks.
  • Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor.
  • the reference index of previously coded/decoded picture can be predicted. Differential coding of motion vectors may be disabled according to slice boundaries.
  • the block partitioning e.g. from CTU to CUs and down to PUs, may be predicted.
  • filter parameter prediction the filtering parameters e.g. for sample adaptive offset may be predicted.
  • Prediction approaches using image information from a previously coded image can also be called as inter prediction methods which may also be referred to as temporal prediction and motion compensation.
  • Prediction approaches using image information within the same image can also be called as intra prediction methods.
  • the second phase is coding the error between the prediction block of samples and the original block of samples. This may be accomplished by transforming the difference in sample values using a specified transform. This transform may be e.g. a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized, and entropy coded.
  • DCT Discrete Cosine Transform
  • encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size of transmission bitrate).
  • the decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a prediction representation of the sample blocks (using the motion or spatial information created by the encoder and included in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized error signal in the spatial domain).
  • the decoder After applying sample prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the sample values) to form the output video frame.
  • FIG. 1 illustrates an image to be encoded (I n ); a prediction representation of an image block (P’ n ); a prediction error signal (D n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (F n ); a final reconstructed image (R’ n ); a transform (T) and inverse transform a quantization (Q) and inverse quantization (Q 1 ); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pi nter ); intra prediction (P mtra ); mode selection (MS) and filtering (F).
  • FIG. 2 illustrates a prediction representation of an image block (P’ n ); a reconstructed prediction error signal (D’ n ); a preliminary reconstructed image (T n ); a final reconstructed image (R’ n ); an inverse transform (T-l); an inverse quantization (Q 1 ); an entropy decoding (E 1 ); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
  • the decoder may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming pictures in the video sequence.
  • motion information is indicated by motion vectors associated with each motion compensated image block.
  • Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder), and the prediction source block in one of the previously coded or decoded image (or pictures).
  • H.264/AVC and HEVC as many other video compression standards, a picture is divided into a mesh of rectangles, for each of which a similar block in one of the reference pictures is indicated for inter prediction. The location of the prediction block is coded as a motion vector that indicates the position of the prediction block relative to the block being coded.
  • a neural network is a computation graph that may comprise several layers of successive computation.
  • a layer may comprise one or more units“neurons” performing an elementary computation.
  • a unit may be connected to one or more units, and the connection may have associated a weight. The weight may be used for scaling the signal passing through the associated connection. Weights may be leamable parameters, i.e., values which can be learned from training data.
  • Each layer is configured to extract data from the input data.
  • Data form lower layers may represent low-level semantics, whereas higher layers represent higher-level semantics.
  • a feed-forward NN and recurrent NN represent examples of neural network architectures.
  • Feed-forward neural networks do not have a feedback loop: each layer takes input from one or more of the subsequent layers. Also, units inside a certain layer may take input from units in one or more of the preceding layers, and provide output to one or more of the following layers.
  • an untrained neural network model has to go through a training phase.
  • the training phase is the development phase, where the neural network learns to perform the final task.
  • a training data set that is used for training a neural network is supposed to be representative of the data on which the neural network will be used.
  • the neural network uses the examples in the training dataset to modify its leamable parameters (e.g., its connections’ weights) in order to achieve the desired task. Training may happen by minimizing or decreasing the output’s error, also referred to as the loss. Examples of the losses are mean squared error, cross-entropy, etc.
  • training is an iterative process, where at each iteration, the algorithm modifies the weights of the neural network to make a gradual improvement of the network’s output, i.e., to gradually decrease the loss.
  • the trained neural network model is applied to new data during an interference phase, in which the neural network performs the desired task to which it has been trained for.
  • the inference phase is also known as a testing phase.
  • the neural network provides an output which is a result of the inference on the new data.
  • the result can be a classification result or a recognition result.
  • Training can be performed in several ways. The main ones are supervised, unsupervised, and reinforcement training.
  • supervised training the neural network model is provided with input-output pairs, where the output may be a [0071] label.
  • unsupervised training the neural network is provided only with input data (and also with output raw data in case of self-supervised training).
  • reinforcement learning the supervision is sparser and less precise; instead of input-output pairs, the neural network gets input data and, sometimes, delayed rewards in the form of scores (E.g., -1, 0, or +1).
  • the learning is a result of a training algorithm, or of a meta-level neural network providing the training signal.
  • the training algorithm comprises changing some properties of the neural network so that its output is as close as possible to a desired output.
  • the syntax elements (SEs) in a category of coding tools are coded and transmitted in the bitstream in a certain and fixed order.
  • the ordering may be done based on the probability of the usage of each tool in the entire sequence. However, such ordering is sub-optimal since the probability of usage of a certain coding tool depends on different properties such as: content in each image or coding block, temporal distance of each frame to the reference frame, quality of each frame or block relative to its reference, size of the block, etc.
  • FIG. 3 illustrates the syntax element order of merge coding tools in current version of Versatile Coding standard (H.266/VVC).
  • merge coding tools 310 which are encoded or decoded in a fixed order. These merge coding tools comprise normal merge mode 320, merge with motion vector difference (MMVD) mode 330, sub-block merge mode 340, Combined Intra Inter Prediction (CUP) mode 350 and a triangle mode 360.
  • MMVD motion vector difference
  • CUP Combined Intra Inter Prediction
  • CUP Combined Intra-Inter
  • the decoder has to decode Normal merge flag 320, MMVD flag 330 and Sub block flag 340 syntax elements before parsing and decoding the CUP syntax element 350.
  • the usage of CUP merge mode 350 for the content is higher than usage of the other tools in the category, then the bitrate will increase since these syntax elements are coded into the bitstream in a block level. The same issue applies to the other syntax elements in this category but also in general with the entire video coding tools.
  • the present embodiments are targeted to solve the issue of having a fixed order for coding the syntax elements (SEs) in video coding, and propose an adaptive re-ordering of syntax elements in order to provide an efficient model for coding syntax elements.
  • SEs syntax elements
  • the adaptive order for the syntax elements according to the embodiments may be based on a learning process from the video content.
  • the learning process may be implemented according to one or more of the following options:
  • a probability of the usage of each coding tool in a coding category in each frame can be determined based on a learning process from the previously encoded/decoded blocks in the picture(s).
  • the probability of the usage of each coding tool may be determined based on a certain property. For example, the probability of the usage can be determined based on a block size, the quality of the current block compared to the reference block, the temporal distance of a block inside the picture compared to the reference block, etc.
  • the learning may be based on already coded group of pictures (GOP).
  • GOP group of pictures
  • at least one previously coded picture s coding information is used for learning the syntax element of each block in the current picture. This is adaptively improved when more frames are involved.
  • encoding in two steps may be performed to the video.
  • a video may be partially or fully coded in order to derive the picture/block level information of the coding tools for the learning process.
  • the syntax elements are re-ordered based on the information that is extracted from the first round of encoding.
  • the syntax elements (SEs) of a group of coding tools can be implemented according to various embodiments. For example, there can be N prediction tools with distinct syntax elements (SE1, SE2, ..., SEN) for coding a block in the codec.
  • these syntax elements may be adaptively re ordered based on their probability of usage in the codec. This means that the more probable a certain tool is used, the earlier it’s syntax element is placed in the order of syntax elements. The probability of the usage may be estimated based on a learning process from the previously coded blocks.
  • Figure 4 illustrates an example of the current coding block and the previously coded blocks.
  • a probability model may be derived, which probability model defines the ordering of the syntax elements.
  • the model may be derived based on at least one previously coded block. However, in order to have more precise probability model, the coding information of more blocks are needed, e.g. one or more CTU rows/columns of the picture.
  • the order of syntax elements in a group of coding tools can be modelled based on the probability of the corresponding syntax elements in previous blocks.
  • Figure 5 illustrates a procedure for re-ordering syntax elements in the encoding and decoding process, according to an embodiment.
  • syntax elements of an input block 510 may be coded in a different order than other blocks.
  • the syntax elements may be coded based on a pre-defined and fixed syntax elements order 520.
  • its coding information i.e., the syntax element information is fed to the syntax element modelling process 540 that models the order based on the probability of the usage of the coding tools in previous block(s).
  • the syntax elements modelling 540 is then used for updating 550 the syntax elements order. Consequently, the syntax elements are re-ordered based on the coding information from the previous block(s).
  • the re-ordered syntax elements are then used for coding the syntax elements of the next block(s).
  • the modelling can be done by calculating the probability of usage of each syntax element or coding tool in previous blocks and by re-ordering the syntax elements from highest probable SE to the lowest probable SE.
  • the modelling can be done by using a set of neural networks (NNs) for the learning process and deciding the order of SEs based on the previously coded block(s) SE information.
  • NNs neural networks
  • the neural network can be trained according to one of the following ways:
  • Figure 7 shows an example of offline training of a neural network.
  • the NN is trained (separately outside of the coding chain with the same or different data as training set) prior to encoding and then it is used for syntax element re-ordering.
  • the input block(s) 710 are provided for neural network 720.
  • the neural network 720 models the syntax element order 730 for each block based on the probability of the usage of the coding tools in previous block(s). Consequently, the syntax elements are re-ordered based on the coding information from the neural network.
  • the syntax elements are then (de)coded according to the computed the syntax elements order.
  • Figure 8 shows an example of online training and online fine-tuning of neural network.
  • the NN is trained in the encoding/decoding process (i.e., online learning) and then used for SE re-ordering.
  • more than one block’s coding information may be needed.
  • the NN may use the coding information of at least one CTU row or column for the training operation.
  • the model is used for SE re-ordering.
  • the NN may be later fine-tuned based on the next blocks’ coding information.
  • Figure 9 shows an example of offline training and online fine-tuning of neural network.
  • the NN is trained similar as in the example of Figure 7 (i.e., outside of the coding chain), but the coding information of the blocks in the encoding/decoding process is used for fine-tuning the NN.
  • the fine-tuning process there may be another NN is used in the coding chain.
  • the process of modelling the SEs’ order may be done differently for different block sizes.
  • a distinct SEs ordering model may be defined and the order may be updated based on the previously coded blocks with the same size. This is illustrated in Figure 6.
  • the syntax elements modelling and ordering may be done based on the coding information from the previous frame(s).
  • a syntax elements model may be derived from the coding information of the blocks in the frame.
  • the syntax elements model may then be used for ordering the syntax elements of the next frame(s). It is appreciated that there may be more than one frame used for the modelling the syntax elements order.
  • a separate syntax elements model may be derived.
  • the syntax elements of this block is ordered based on the syntax elements model of the reference picture.
  • the ordering of syntax elements may be done based on pre-encoding and decoding process. For example, one or more frames of the video may be pre encoded and decoded prior to the final encoding/decoding. Accordingly, the syntax elements modelling can be done with coding information and probabilities of the coding tools in the pre encoding/decoding process. At least one syntax elements order may be derived and stored in the codec for coding the syntax elements in the final encoding/decoding process.
  • syntax elements order derived for coding the syntax elements of the blocks in the final encoding/decoding process there may be more than one syntax elements order derived for coding the syntax elements of the blocks in the final encoding/decoding process.
  • the order may be derived based on different aspects as below:
  • the syntax elements order may be derived based on the size of the blocks.
  • the quality level of each frame may be different in the video sequence. Moreover, the quality of each block may also differ inside each picture. Hence, the syntax elements order may be derived based on the quality level of each block.
  • the syntax elements order may be derived based on the temporal ID of the frame in the group of pictures (GOP).
  • the codec may decide to use the one with proper properties, e.g. based on a certain block size, temporal ID or quality.
  • the encoder may select the proper order based on a rate-distortion optimization by using all the available syntax elements orders for a certain group of coding tools. In such case, there may be a signaling of the index of the corresponding syntax elements order to the decoder.
  • the syntax elements order modelling can be done for different regions of the image/video differently.
  • the syntax elements modelling, and re-ordering may be done differently for each area.
  • at least one syntax elements modelling may be applied to polar areas and at least one to the equator areas.
  • syntax elements modelling may be done in encoder side and/or decoder side.
  • FIG. 10 is a flowchart illustrating a method according to an embodiment.
  • a method comprises receiving 1010 a source picture; partitioning 1020 the source picture into a set of non-overlapping blocks; for a first block being coded, coding 1030 syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, determining 1040 a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving 1050 a probability model according to the determined probabilities; defining 1060 an order for syntax elements for a block according to the probability model; and 1070 syntax elements of the block into a bitstream according to the defined order.
  • An apparatus comprises means for receiving a source picture; means for partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, means for coding syntax elements of various coding tools according to a pre-defmed order into the bitstream; for any one or more subsequent block, means for determining a probability of a usage of each of the various coding tools in any one or more previous blocks; means for deriving a probability model according to the determined probabilities; means for defining an order for syntax elements for a block according to the probability model; and means for encoding syntax elements of the block into a bitstream according to the defined order.
  • the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
  • the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 10 according to various embodiments.
  • An apparatus according to an embodiment is illustrated in Figure 11.
  • An apparatus of this embodiment is a camera having multiple lenses and imaging sensors, but also other types of cameras may be used to capture wide view images and/or wide view video.
  • wide view image and wide view video mean an image and a video, respectively, which comprise visual information having a relatively large viewing angle, larger than 100 degrees.
  • a so called 360 panorama image/video as well as images/videos captured by using a fisheye lens may also be called as a wide view image/video in this specification.
  • the wide view image/video may mean an image/video in which some kind of projection distortion may occur when a direction of view changes between successive images or frames of the video so that a transform may be needed to find out co located samples from a reference image or a reference frame. This will be described in more detail later in this specification.
  • the camera 2700 of Figure 11 comprises two or more camera units 2701 and is capable of capturing wide view images and/or wide view video.
  • Each camera unit 2701 is located at a different location in the multi-camera system and may have a different orientation with respect to other camera units 2501.
  • the camera units 2701 may have an omnidirectional constellation so that it has a 360-viewing angle in a 3D-space. In other words, such camera 2700 may be able to see each direction of a scene so that each spot of the scene around the camera 2700 can be viewed by at least one camera unit 2701.
  • the camera 2700 of Figure 11 may also comprise a processor 2704 for controlling the operations of the camera 2700. There may also be a memory 2706 for storing data and computer code to be executed by the processor 2704, and a transceiver 2708 for communicating with, for example, a communication network and/or other devices in a wireless and/or wired manner.
  • the camera 2700 may further comprise a user interface (UI) 2710 for displaying information to the user, for generating audible signals and/or for receiving user input.
  • UI user interface
  • the camera 2700 need not comprise each feature mentioned above or may comprise other features as well. For example, there may be electric and/or mechanical elements for adjusting and/or controlling optics of the camera units 2701 (not shown).
  • Figure 11 also illustrates some operational elements which may be implemented, for example, as a computer code in the software of the processor, in a hardware, or both.
  • a focus control element 2714 may perform operations related to adjustment of the optical system of camera unit or units to obtain focus meeting target specifications or some other predetermined criteria.
  • An optics adjustment element 2716 may perform movements of the optical system or one or more parts of it according to instructions provided by the focus control element 2714. It should be noted here that the actual adjustment of the optical system need not be performed by the apparatus, but it may be performed manually, wherein the focus control element 2714 may provide information for the user interface 2710 to indicate a user of the device how to adjust the optical system.
  • a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
  • a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
  • the computer program code comprises one or more operational characteristics.
  • Said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus, wherein a programmable operational characteristic of the system comprises receiving a source picture; partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, coding syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, determining a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving a probability model according to the determined probabilities; defining an order for syntax elements for a block according to the probability model; and encoding syntax elements of the block into a bitstream according to the defined order.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method for video coding comprises receiving a source picture; partitioning the source picture into a set of non-overlapping blocks (510); for a first block being coded, coding syntax elements of various coding tools into the bitstream (530) according to a pre-defined order (520); for any one or more subsequent block, determining a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving a probability model according to the determined probabilities (540); defining an order for syntax elements for a block according to the probability model (550); and encoding syntax elements of the block into a bitstream (530) according to the defined order (520). In some embodiments, the probability of usage of each coding tool is determined according to a learning process involving a neural network.

Description

A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND VIDEO DECODING
Technical Field
[0001] The present solution generally relates to video encoding and video decoding. Background
[0002] A video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission, and a decoder that can uncompress the compressed video representation back into a viewable form. The encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
[0003] Hybrid video codecs may encode the video information in two phases. Firstly, sample values (i.e. pixel values) in a certain picture area are predicted, e.g., by motion compensation means or by spatial means. Secondly, the prediction error, i.e. the difference between the prediction block of samples and the original block of samples, is coded. The video decoder on the other hand reconstructs the output video by applying prediction means similar to the encoder to form a prediction representation of the sample blocks and prediction error decoding. After applying prediction and prediction error decoding means, the decoder sums up the prediction and prediction error signals to form the output video frame.
Summary
[0004] The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.
[0005] Various aspects include a method, an apparatus and a computer readable medium comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various embodiments are disclosed in the dependent claims.
[0006] According to a first aspect, there is provided a method comprising
- receiving a source picture;
partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, coding syntax elements of various coding tools according to a pre-defmed order into the bitstream; for any one or more subsequent block, determining a probability of a usage of each of the various coding tools in any one or more previous blocks;
deriving a probability model according to the determined probabilities;
defining an order for syntax elements for a block according to the probability model;
encoding syntax elements of the block into a bitstream according to the defined order.
[0007] According to an embodiment, the probability of usage of each coding tool is determined according to a learning process.
[0008] According to an embodiment, the learning process is based on already coded group of pictures.
[0009] According to an embodiment, the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
[0010] According to an embodiment, the probability model is different for blocks having different sizes.
[0011] According to an embodiment, the probability model is different for different regions of the source picture.
[0012] According to an embodiment, the order for syntax elements is defined based on a pre encoding and decoding process modelling the determined probabilities.
[0013] According to a second aspect, there is provided an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following: receive a source picture; partition the source picture into a set of non-overlapping blocks; for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks; derive a probability model according to the determined probabilities; define an order for syntax elements for a block according to the probability model; and encode syntax elements of the block into a bitstream according to the defined order.
[0014] According to an embodiment, the apparatus further comprising computer program code to cause the apparatus to determine the probability of usage of each coding tool according to a learning process.
[0015] According to an embodiment, the learning process is based on already coded group of pictures.
[0016] According to an embodiment, the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block. [0017] According to an embodiment, the probability model is different for blocks having different sizes.
[0018] According to an embodiment, the probability model is different for different regions of the source picture.
[0019] According to an embodiment, the apparatus comprises means for defining the order for syntax elements based on a pre-encoding and decoding process modelling the determined probabilities.
[0020] According to a third aspect, there is provided a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive a source picture; partition the source picture into a set of non overlapping blocks; for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks; to derive a probability model according to the determined probabilities; to define an order for syntax elements for a block according to the probability model; and to encode syntax elements of the block into a bitstream according to the defined order.
[0021] According to an embodiment, the computer program product is embodied on a non- transitory computer readable medium.
Description of the Drawings
[0022] In the following, various embodiments will be described in more detail with reference to the appended drawings, in which
[0023] Fig. 1 shows an encoding process according to an embodiment;
[0024] Fig. 2 shows a decoding process according to an embodiment;
[0025] Fig. 3 shows syntax element signaling order of Merge tools in VVC;
[0026] Fig. 4 shows an example of previously coded blocks that are used for ordering the syntax elements of the current block;
[0027] Fig. 5 shows an adaptive syntax element ordering procedure according to an embodiment;
[0028] Fig. 6 shows an example of a present embodiments applied for different block sizes;
[0029] Fig. 7 shows an example of an offline training;
[0030] Fig. 8 shows an example of an online training and online fine-tuning neural network;
[0031] Fig. 9 shows an example of offline training and online fine-tuning neural network;
[0032] Fig. 10 is a flowchart illustrating a method according to an embodiment;
[0033] Fig. 11 shows an apparatus according to an embodiment. Description of Example Embodiments
[0034] In the following, several embodiments will be described in the context of one video coding arrangement. It is to be noted, however, that the invention is not limited to this particular arrangement. For example, the invention may be applicable to video coding systems like streaming system, DVD (Digital Versatile Disc) players, digital television receivers, personal video recorders, systems and computer programs on personal computers, handheld computers and communication devices, as well as network elements such as transcoders and cloud computing arrangements where video data is handled.
[0035] In the following, several embodiments are described using the convention of referring to (de)coding, which indicates that the embodiments may apply to decoding and/or encoding.
[0036] The Advanced Video Coding standard (which maybe abbreviated AVC or H.264/AVC) was developed by the Joint Video Team (JVT) of the Video Coding Experts Group (VCEG) of the Telecommunications Standardization Sector of International Telecommunication Union (ITU-T) and the Moving Picture Experts Group (MPEG) of International Organization for Standardization (ISO) / International Electrotechnical Commission (IEC). The H.264/AVC standard is published by both parent standardization organizations, and it is referred to as ITU- T Recommendation H.264 and ISO/IEC International Standard 14496-10, also known as MPEG-4 Part 10 Advanced Video Coding (AVC). There have been multiple versions of the H.264/AVC standard, each integrating new extensions or features to the specification. These extensions include Scalable Video Coding (SVC) and Multiview Video Coding (MVC).
[0037] The High Efficiency Video Coding standard (which may be abbreviated HEVC or H.265/HEVC) was developed by the Joint Collaborative Team - Video Coding (JCT-VC) of VCEG and MPEG. The standard is published by both parent standardization organizations, and it is referred to as ITU-T Recommendation H.265 and ISO/IEC International Standard 23008- 2, also known as MPEG-H Part 2 High Efficiency Video Coding (HEVC). Extensions to H.265/HEVC include scalable, multiview, three-dimensional, and fidelity range extensions, which may be referred to as SHVC, MV-HEVC, 3D-HEVC, and REXT, respectively. The references in this description to H.265/HEVC, SHVC, MV-HEVC, 3D-HEVC and REXT that have been made for the purpose of understanding definitions, structures or concepts of these standard specifications are to be understood to be references to the latest versions of these standards that were available before the date of this application, unless otherwise indicated.
[0038] Some key definitions, bitstream and coding structures, and concepts of H.264/ AVC and HEVC and some of their extensions are described in this section as an example of a video encoder, decoder, encoding method, decoding method, and a bitstream structure, wherein the embodiments may be implemented. Some of the key definitions, bitstream and coding structures, and concepts of H.264/AVC are the same as in HEVC standard - hence, they are described below jointly. The aspects of various embodiments are not limited to H.264/ AVC or HEVC or their extensions, but rather the description is given for one possible basis on top of which the present embodiments may be partly or fully realized.
[0039] In the description of existing standards as well as in the description of example embodiments, a syntax element may be defined as an element of data represented in the bitstream. A syntax structure may be defined as zero or more syntax elements present together in the bitstream in a specified order.
[0040] Similarly, to many earlier video coding standards, the bitstream syntax and semantics as well as the decoding process for error-free bitstreams are specified in H.264/AVC and HEVC. The encoding process is not specified, but encoders must generate conforming bitstreams. Bitstream and decoder conformance can be verified with the Hypothetical Reference Decoder (HRD). The standards contain coding tools that help in coping with transmission errors and losses, but the use of the tools in encoding is optional and no decoding process has been specified for erroneous bitstreams.
[0041] The elementary unit for the input to an H.264/AVC or HEVC encoder and the output of an H.264/AVC or HEVC decoder, respectively, is a picture. A picture given as an input to an encoder may also be referred to as a source picture, and a picture decoded by a decoder may be referred to as a decoded picture.
[0042] The source and decoded pictures may each be comprised of one or more sample arrays, such as one of the following sets of sample arrays, wherein each of the samples may represent one color component:
- Luma (Y) only (monochrome).
- Luma and two chroma (Y CbCr or Y CgCo).
- Green, Blue and Red (GBR, also known as RGB).
- Arrays representing other unspecified monochrome or tri-stimulus color samplings (for example, YZX, also known as XYZ).
[0043] In the following, these arrays may be referred to as luma (or L or Y) and chroma, where the two chroma arrays may be referred to as Cb and Cr; regardless of the actual color representation method in use. The actual color representation method in use may be indicated e.g. in a coded bitstream e.g. using the Video Usability Information (VUI) syntax of H.264/AVC and/or HEVC. A component may be defined as an array or a single sample from one of the three sample arrays (luma and two chroma) or the array or a single sample of the array that compose a picture in monochrome format.
[0044] In H.264/AVC and HEVC, a picture may either be a frame or a field. A frame comprises a matrix of luma samples and possibly the corresponding chroma samples. A field is a set of alternate sample rows of a frame. Fields may be used as encoder input for example when the source signal is interlaced. Chroma sample arrays may be absent (and hence monochrome sampling may be in use) or may be subsampled when compared to luma sample arrays. Some chroma formats may be summarized as follows: - In monochrome sampling there is only one sample array, which may be nominally considered the luma array.
- In 4:2:0 sampling, each of the two chroma arrays has half the height and half the width of the luma array.
- In 4:2:2 sampling, each of the two chroma arrays has the same height and half the width of the luma array.
- In 4:4:4 sampling when no separate color planes are in use, each of the two chroma arrays has the same height and width as the luma array.
[0045] In H.264/AVC and HEVC, it is possible to code sample arrays as separate color planes into the bitstream and respectively decode separately coded color planes from the bitstream. When separate color planes are in use, each one of them is separately processed (by the encoder and/or the decoder) as a picture with monochrome sampling.
[0046] When chroma subsampling is in use (e.g. 4:2:0 or 4:2:2 chroma sampling), the location of chroma samples with respect to luma samples may be determined in the encoder side (e.g. as pre-processing step or as part of encoding). The chroma sample positions with respect to luma sample positions may be pre-defined for example in a coding standard, such as H.264/AVC or HEVC, or may be indicated in the bitstream for example as part of VUI of H.264/AVC or HEVC.
[0047] Generally, the source video sequence(s) provided as input for encoding may either represent interlaced source content or progressive source content. Fields of opposite parity have been captured at different times for interlaced source content. Progressive source content contains captured frames. An encoder may encode fields of interlaced source content in two ways: a pair of interlaced fields may be coded into a coded frame or a field may be coded as a coded field. Likewise, an encoder may encode frames of progressive source content in two ways: a frame of progressive source content may be coded into a coded frame or a pair of coded fields. A field pair or a complementary field pair may be defined as two fields next to each other in decoding and/or output order, having opposite parity (i.e. one being a top field and another being a bottom field) and neither belonging to any other complementary field pair. Some video coding standards or schemes allow mixing of coded frames and coded fields in the same coded video sequence. Moreover, predicting a coded field from a field in a coded frame and/or predicting a coded frame for a complementary field pair (coded as fields) may be enabled in encoding and/or decoding.
[0048] A partitioning may be defined as a division of a set into subsets such that each element of the set is in exactly one of the subsets. A picture partitioning may be defined as a division of a picture into smaller non-overlapping units. A block partitioning may be defined as a division of a block into smaller non-overlapping units, such as sub-blocks. In some cases, term block partitioning may be considered to cover multiple levels of partitioning, for example partitioning of a picture into slices, and partitioning of each slice into smaller units, such as macroblocks of H.264/AVC. It is noted that the same unit, such as a picture, may have more than one partitioning. For example, a coding unit of HEVC may be partitioned into prediction units and separately by another quadtree into transform units.
[0049] Hybrid video codecs, for example ITU-T H.263, H.264/AVC and HEVC, may encode the video information in two phases. At first, pixel value in a certain picture area (or“block”) are predicted for example by motion compensation means or by spatial means. In the first phase, predictive coding may be applied, for example, as so-called sample prediction and/or so-called syntax prediction.
[0050] In the sample prediction, pixel or sample values in a certain picture area or“block” are predicted. These pixel or sample values can be predicted, for example, using one or more of the following ways:
- motion compensation mechanism;
- intra prediction mechanism.
[0051] Motion compensation mechanisms (which may also be referred to as inter prediction, temporal prediction or motion-compensation temporal prediction or motion-compensated prediction or MCP) involve finding and indicating an area in one of the previously encoded video frames that corresponds closely to the block being coded. Inter prediction may reduce temporal redundancy.
[0052] Intra prediction, where pixel or sample values can be predicted by spatial mechanisms, involve finding and indicating a spatial region relationship. Intra prediction can be performed in spatial or transform domain, i.e., either sample values or transform coefficients can be predicted. Intra prediction may be exploited in intra coding, where no inter-prediction is applied.
[0053] In the syntax prediction, which may also be referred to as parameter prediction, syntax elements and/or syntax element values and/or variables derived from syntax elements are predicted from syntax elements (de)coded earlier and/or variables derived earlier. Non-limiting examples of syntax prediction are provided below:
[0054] In motion vector prediction, motion vectors, e.g. for inter and/or inter-view prediction may be coded differentially with respect to a block-specific predicted motion vector. In many video codecs, the predicted motion vectors are created in a predefined way, for example by calculating the media of the encoded or decoded motion vectors of the adjacent blocks. Another way to create motion vector predictions, sometimes referred to as advanced motion vector prediction (AMVP), is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values, the reference index of previously coded/decoded picture can be predicted. Differential coding of motion vectors may be disabled according to slice boundaries.
[0055] The block partitioning, e.g. from CTU to CUs and down to PUs, may be predicted. [0056] In filter parameter prediction, the filtering parameters e.g. for sample adaptive offset may be predicted.
[0057] Prediction approaches using image information from a previously coded image can also be called as inter prediction methods which may also be referred to as temporal prediction and motion compensation. Prediction approaches using image information within the same image can also be called as intra prediction methods.
[0058] The second phase is coding the error between the prediction block of samples and the original block of samples. This may be accomplished by transforming the difference in sample values using a specified transform. This transform may be e.g. a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized, and entropy coded.
[0059] By varying the fidelity of the quantization process, encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size of transmission bitrate).
[0060] The decoder reconstructs the output video by applying a prediction mechanism similar to that used by the encoder in order to form a prediction representation of the sample blocks (using the motion or spatial information created by the encoder and included in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantized error signal in the spatial domain).
[0061] After applying sample prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the sample values) to form the output video frame.
[0062] An example of an encoding process is illustrated in Fig. 1. Fig. 1 illustrates an image to be encoded (In); a prediction representation of an image block (P’n); a prediction error signal (Dn); a reconstructed prediction error signal (D’n); a preliminary reconstructed image (Fn); a final reconstructed image (R’n); a transform (T) and inverse transform a quantization (Q) and inverse quantization (Q 1); entropy encoding (E); a reference frame memory (RFM); inter prediction (Pinter); intra prediction (Pmtra); mode selection (MS) and filtering (F). An example of a decoding process is illustrated in Fig. 2. Fig. 2 illustrates a prediction representation of an image block (P’n); a reconstructed prediction error signal (D’n); a preliminary reconstructed image (Tn); a final reconstructed image (R’n); an inverse transform (T-l); an inverse quantization (Q 1); an entropy decoding (E 1); a reference frame memory (RFM); a prediction (either inter or intra) (P); and filtering (F).
[0063] The decoder (and encoder) may also apply additional filtering processes in order to improve the quality of the output video before passing it for display and/or storing as a prediction reference for the forthcoming pictures in the video sequence.
[0064] In many video codecs, including H.264/AVC and HEVC, motion information is indicated by motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder), and the prediction source block in one of the previously coded or decoded image (or pictures). H.264/AVC and HEVC, as many other video compression standards, a picture is divided into a mesh of rectangles, for each of which a similar block in one of the reference pictures is indicated for inter prediction. The location of the prediction block is coded as a motion vector that indicates the position of the prediction block relative to the block being coded.
[0065] Many video codecs utilize neural networks to enhance their operation. A neural network (NN) is a computation graph that may comprise several layers of successive computation. A layer may comprise one or more units“neurons” performing an elementary computation. A unit may be connected to one or more units, and the connection may have associated a weight. The weight may be used for scaling the signal passing through the associated connection. Weights may be leamable parameters, i.e., values which can be learned from training data.
[0066] Each layer is configured to extract data from the input data. Data form lower layers may represent low-level semantics, whereas higher layers represent higher-level semantics.
[0067] A feed-forward NN and recurrent NN represent examples of neural network architectures. Feed-forward neural networks do not have a feedback loop: each layer takes input from one or more of the subsequent layers. Also, units inside a certain layer may take input from units in one or more of the preceding layers, and provide output to one or more of the following layers.
[0068] In order to configure a neural network to perform a task, an untrained neural network model has to go through a training phase. The training phase is the development phase, where the neural network learns to perform the final task. A training data set that is used for training a neural network is supposed to be representative of the data on which the neural network will be used. During training, the neural network uses the examples in the training dataset to modify its leamable parameters (e.g., its connections’ weights) in order to achieve the desired task. Training may happen by minimizing or decreasing the output’s error, also referred to as the loss. Examples of the losses are mean squared error, cross-entropy, etc. In recent deep-leaming techniques, training is an iterative process, where at each iteration, the algorithm modifies the weights of the neural network to make a gradual improvement of the network’s output, i.e., to gradually decrease the loss.
[0069] After training, the trained neural network model is applied to new data during an interference phase, in which the neural network performs the desired task to which it has been trained for. The inference phase is also known as a testing phase. As a result of the inference phase, the neural network provides an output which is a result of the inference on the new data. For example, the result can be a classification result or a recognition result.
[0070] Training can be performed in several ways. The main ones are supervised, unsupervised, and reinforcement training. In supervised training, the neural network model is provided with input-output pairs, where the output may be a [0071] label. In unsupervised training, the neural network is provided only with input data (and also with output raw data in case of self-supervised training). In reinforcement learning, the supervision is sparser and less precise; instead of input-output pairs, the neural network gets input data and, sometimes, delayed rewards in the form of scores (E.g., -1, 0, or +1). The learning is a result of a training algorithm, or of a meta-level neural network providing the training signal. In general, the training algorithm comprises changing some properties of the neural network so that its output is as close as possible to a desired output.
[0072] In image/video coding standards, e.g., AVC, HEVC and VVC, the syntax elements (SEs) in a category of coding tools are coded and transmitted in the bitstream in a certain and fixed order. The ordering may be done based on the probability of the usage of each tool in the entire sequence. However, such ordering is sub-optimal since the probability of usage of a certain coding tool depends on different properties such as: content in each image or coding block, temporal distance of each frame to the reference frame, quality of each frame or block relative to its reference, size of the block, etc.
[0073] Figure 3 illustrates the syntax element order of merge coding tools in current version of Versatile Coding standard (H.266/VVC). There are several merge coding tools 310 which are encoded or decoded in a fixed order. These merge coding tools comprise normal merge mode 320, merge with motion vector difference (MMVD) mode 330, sub-block merge mode 340, Combined Intra Inter Prediction (CUP) mode 350 and a triangle mode 360. In the decoder side, in order to understand the usage of any of these tools for a certain block, all the previous syntax elements need to be decoded first. For example, if a block is coded in Combined Intra-Inter (CUP) mode, the decoder has to decode Normal merge flag 320, MMVD flag 330 and Sub block flag 340 syntax elements before parsing and decoding the CUP syntax element 350. Moreover, if the usage of CUP merge mode 350 for the content is higher than usage of the other tools in the category, then the bitrate will increase since these syntax elements are coded into the bitstream in a block level. The same issue applies to the other syntax elements in this category but also in general with the entire video coding tools.
[0074] The present embodiments are targeted to solve the issue of having a fixed order for coding the syntax elements (SEs) in video coding, and propose an adaptive re-ordering of syntax elements in order to provide an efficient model for coding syntax elements.
[0075] The adaptive order for the syntax elements according to the embodiments may be based on a learning process from the video content. To this end, the learning process may be implemented according to one or more of the following options:
[0076] A probability of the usage of each coding tool in a coding category in each frame can be determined based on a learning process from the previously encoded/decoded blocks in the picture(s).
[0077] The probability of the usage of each coding tool may be determined based on a certain property. For example, the probability of the usage can be determined based on a block size, the quality of the current block compared to the reference block, the temporal distance of a block inside the picture compared to the reference block, etc.
[0078] The learning may be based on already coded group of pictures (GOP). In such case, in order to decide the coding order of syntax elements, at least one previously coded picture’s coding information is used for learning the syntax element of each block in the current picture. This is adaptively improved when more frames are involved.
[0079] In another approach, encoding in two steps may be performed to the video. In the first step, a video may be partially or fully coded in order to derive the picture/block level information of the coding tools for the learning process. In the second round of encoding (i.e., the final encoding), the syntax elements are re-ordered based on the information that is extracted from the first round of encoding.
[0080] In the following, the solution is described in more detailed manner by means of the following embodiments.
[0081] The syntax elements (SEs) of a group of coding tools can be implemented according to various embodiments. For example, there can be N prediction tools with distinct syntax elements (SE1, SE2, ..., SEN) for coding a block in the codec. In order to efficiently code the syntax elements of a block into the bitstream, these syntax elements may be adaptively re ordered based on their probability of usage in the codec. This means that the more probable a certain tool is used, the earlier it’s syntax element is placed in the order of syntax elements. The probability of the usage may be estimated based on a learning process from the previously coded blocks.
[0082] Figure 4 illustrates an example of the current coding block and the previously coded blocks. Based on the learning process, a probability model may be derived, which probability model defines the ordering of the syntax elements. The model may be derived based on at least one previously coded block. However, in order to have more precise probability model, the coding information of more blocks are needed, e.g. one or more CTU rows/columns of the picture.
[0083] Thus, according to an embodiment, the order of syntax elements in a group of coding tools can be modelled based on the probability of the corresponding syntax elements in previous blocks.
[0084] Figure 5 illustrates a procedure for re-ordering syntax elements in the encoding and decoding process, according to an embodiment. As will be realized, syntax elements of an input block 510 may be coded in a different order than other blocks. For the first input block 510, the syntax elements may be coded based on a pre-defined and fixed syntax elements order 520. When the first input block is coded 530, its coding information, i.e., the syntax element information is fed to the syntax element modelling process 540 that models the order based on the probability of the usage of the coding tools in previous block(s). The syntax elements modelling 540 is then used for updating 550 the syntax elements order. Consequently, the syntax elements are re-ordered based on the coding information from the previous block(s). The re-ordered syntax elements are then used for coding the syntax elements of the next block(s).
[0085] The process of modelling the syntax elements order may be done in different ways. Few examples are given in the following:
[0086] The modelling can be done by calculating the probability of usage of each syntax element or coding tool in previous blocks and by re-ordering the syntax elements from highest probable SE to the lowest probable SE.
[0087] The modelling can be done by using a set of neural networks (NNs) for the learning process and deciding the order of SEs based on the previously coded block(s) SE information.
[0088] When the modelling is done by using a set of neural networks, the neural network can be trained according to one of the following ways:
[0089] Figure 7 shows an example of offline training of a neural network. In this example, the NN is trained (separately outside of the coding chain with the same or different data as training set) prior to encoding and then it is used for syntax element re-ordering. As will be realized, in training syntax elements of an input block 710, the input block(s) 710 are provided for neural network 720. The neural network 720 models the syntax element order 730 for each block based on the probability of the usage of the coding tools in previous block(s). Consequently, the syntax elements are re-ordered based on the coding information from the neural network. The syntax elements are then (de)coded according to the computed the syntax elements order.
[0090] Figure 8 shows an example of online training and online fine-tuning of neural network. In this example, the NN is trained in the encoding/decoding process (i.e., online learning) and then used for SE re-ordering. In this case, for the NN to be more effective, more than one block’s coding information may be needed. For example, the NN may use the coding information of at least one CTU row or column for the training operation. Then after the training, the model is used for SE re-ordering. The NN may be later fine-tuned based on the next blocks’ coding information.
[0091] Figure 9 shows an example of offline training and online fine-tuning of neural network. In this example, the NN is trained similar as in the example of Figure 7 (i.e., outside of the coding chain), but the coding information of the blocks in the encoding/decoding process is used for fine-tuning the NN. For the fine-tuning process, there may be another NN is used in the coding chain.
[0092] According to embodiment, the process of modelling the SEs’ order may be done differently for different block sizes. In other words, for a specific block size, a distinct SEs ordering model may be defined and the order may be updated based on the previously coded blocks with the same size. This is illustrated in Figure 6.
[0093] In video content, the syntax elements modelling and ordering may be done based on the coding information from the previous frame(s). In this case, when a frame of a video is (de)coded, a syntax elements model may be derived from the coding information of the blocks in the frame. The syntax elements model may then be used for ordering the syntax elements of the next frame(s). It is appreciated that there may be more than one frame used for the modelling the syntax elements order.
[0094] According to an embodiment, for the past frame (already coded and reconstructed frame) a separate syntax elements model may be derived. When coding a block in future frame, the syntax elements of this block is ordered based on the syntax elements model of the reference picture.
[0095] According to an embodiment, the ordering of syntax elements may be done based on pre-encoding and decoding process. For example, one or more frames of the video may be pre encoded and decoded prior to the final encoding/decoding. Accordingly, the syntax elements modelling can be done with coding information and probabilities of the coding tools in the pre encoding/decoding process. At least one syntax elements order may be derived and stored in the codec for coding the syntax elements in the final encoding/decoding process.
[0096] In such case, there may be more than one syntax elements order derived for coding the syntax elements of the blocks in the final encoding/decoding process. The order may be derived based on different aspects as below:
[0097] The syntax elements order may be derived based on the size of the blocks.
[0098] The quality level of each frame may be different in the video sequence. Moreover, the quality of each block may also differ inside each picture. Hence, the syntax elements order may be derived based on the quality level of each block.
[0099] The syntax elements order may be derived based on the temporal ID of the frame in the group of pictures (GOP).
[0100] In the case of multiple syntax elements order derivation for a group of coding tools, the codec may decide to use the one with proper properties, e.g. based on a certain block size, temporal ID or quality.
[0101] According to another embodiment, the encoder may select the proper order based on a rate-distortion optimization by using all the available syntax elements orders for a certain group of coding tools. In such case, there may be a signaling of the index of the corresponding syntax elements order to the decoder.
[0102] According to another embodiment, the syntax elements order modelling can be done for different regions of the image/video differently. For example, in the case of 360 degree content, due to the characteristics of the projection format of the resulted 2D image from the 3D mapping process, there are certain sampling characteristics in different parts. In the equirectangular projection format, there are over-sampling and stretching in the areas close to the polar areas of the projected content. Hence, for these areas, the probability of usage of tools may be different than other parts (e.g. areas in equator). Thus, the syntax elements modelling, and re-ordering may be done differently for each area. For example, at least one syntax elements modelling may be applied to polar areas and at least one to the equator areas.
[0103] In general, based on the characteristics of the content, there may be more than one syntax elements model for modeling the syntax elements order for a content. [0104] The syntax elements modelling may be done in encoder side and/or decoder side.
[0105] Fig. 10 is a flowchart illustrating a method according to an embodiment. A method comprises receiving 1010 a source picture; partitioning 1020 the source picture into a set of non-overlapping blocks; for a first block being coded, coding 1030 syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, determining 1040 a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving 1050 a probability model according to the determined probabilities; defining 1060 an order for syntax elements for a block according to the probability model; and 1070 syntax elements of the block into a bitstream according to the defined order.
[0106] It is appreciated that term“receiving” is understood here to mean that the information is read from a memory or received over a communications connection.
[0107] An apparatus according to an embodiment comprises means for receiving a source picture; means for partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, means for coding syntax elements of various coding tools according to a pre-defmed order into the bitstream; for any one or more subsequent block, means for determining a probability of a usage of each of the various coding tools in any one or more previous blocks; means for deriving a probability model according to the determined probabilities; means for defining an order for syntax elements for a block according to the probability model; and means for encoding syntax elements of the block into a bitstream according to the defined order. The means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry. The memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Figure 10 according to various embodiments.
[0108] An apparatus according to an embodiment is illustrated in Figure 11. An apparatus of this embodiment is a camera having multiple lenses and imaging sensors, but also other types of cameras may be used to capture wide view images and/or wide view video.
[0109] The terms wide view image and wide view video mean an image and a video, respectively, which comprise visual information having a relatively large viewing angle, larger than 100 degrees. Hence, a so called 360 panorama image/video as well as images/videos captured by using a fisheye lens may also be called as a wide view image/video in this specification. More generally, the wide view image/video may mean an image/video in which some kind of projection distortion may occur when a direction of view changes between successive images or frames of the video so that a transform may be needed to find out co located samples from a reference image or a reference frame. This will be described in more detail later in this specification.
[0110] The camera 2700 of Figure 11 comprises two or more camera units 2701 and is capable of capturing wide view images and/or wide view video. Each camera unit 2701 is located at a different location in the multi-camera system and may have a different orientation with respect to other camera units 2501. As an example, the camera units 2701 may have an omnidirectional constellation so that it has a 360-viewing angle in a 3D-space. In other words, such camera 2700 may be able to see each direction of a scene so that each spot of the scene around the camera 2700 can be viewed by at least one camera unit 2701.
[0111] The camera 2700 of Figure 11 may also comprise a processor 2704 for controlling the operations of the camera 2700. There may also be a memory 2706 for storing data and computer code to be executed by the processor 2704, and a transceiver 2708 for communicating with, for example, a communication network and/or other devices in a wireless and/or wired manner. The camera 2700 may further comprise a user interface (UI) 2710 for displaying information to the user, for generating audible signals and/or for receiving user input. However, the camera 2700 need not comprise each feature mentioned above or may comprise other features as well. For example, there may be electric and/or mechanical elements for adjusting and/or controlling optics of the camera units 2701 (not shown).
[0112] Figure 11 also illustrates some operational elements which may be implemented, for example, as a computer code in the software of the processor, in a hardware, or both. A focus control element 2714 may perform operations related to adjustment of the optical system of camera unit or units to obtain focus meeting target specifications or some other predetermined criteria. An optics adjustment element 2716 may perform movements of the optical system or one or more parts of it according to instructions provided by the focus control element 2714. It should be noted here that the actual adjustment of the optical system need not be performed by the apparatus, but it may be performed manually, wherein the focus control element 2714 may provide information for the user interface 2710 to indicate a user of the device how to adjust the optical system.
[0113] The various embodiments can be implemented with the help of computer program code that resides in a memory and causes the relevant apparatuses to carry out the method. For example, a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment. Yet further, a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment. The computer program code comprises one or more operational characteristics. Said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus, wherein a programmable operational characteristic of the system comprises receiving a source picture; partitioning the source picture into a set of non-overlapping blocks; for a first block being coded, coding syntax elements of various coding tools according to a pre-defined order into the bitstream; for any one or more subsequent block, determining a probability of a usage of each of the various coding tools in any one or more previous blocks; deriving a probability model according to the determined probabilities; defining an order for syntax elements for a block according to the probability model; and encoding syntax elements of the block into a bitstream according to the defined order.
[0114] f desired, the different functions discussed herein may be performed in a different order and/or concurrently with other. Furthermore, if desired, one or more of the above-described functions and embodiments may be optional or may be combined.
[0115] Although various aspects of the embodiments are set out in the independent claims, other aspects comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
[0116] It is also noted herein that while the above describes example embodiments, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications, which may be made without departing from the scope of the present disclosure as, defined in the appended claims.

Claims

Claims:
1. A method, comprising:
- receiving a source picture;
- partitioning the source picture into a set of non-overlapping blocks;
- for a first block being coded, coding syntax elements of various coding tools according to a pre-defined order into the bitstream;
- for any one or more subsequent block, determining a probability of a usage of each of the various coding tools in any one or more previous blocks;
- deriving a probability model according to the determined probabilities;
- defining an order for syntax elements for a block according to the probability model; and
- encoding syntax elements of the block into a bitstream according to the defined order.
2. The method according to claim 1, determining the probability of usage of each coding tool according to a learning process.
3. The method according to claim 2, wherein the learning process is based on already coded group of pictures.
4. The method according to claim 1 , wherein the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
5. The method according to claim 1 , wherein the probability model is different for blocks having different sizes.
6. The method according to claim 1, wherein the probability model is different for different regions of the source picture.
7. The method according to claim 1, wherein the order for syntax elements is defined based on a pre-encoding and decoding process modelling the determined probabilities.
8. An apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:
- receive a source picture;
- partition the source picture into a set of non-overlapping blocks;
- for a first block being coded, to code syntax elements of various coding tools according to a pre-defmed order into the bitstream; - for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks;
- derive a probability model according to the determined probabilities;
- define an order for syntax elements for a block according to the probability model; and
- encode syntax elements of the block into a bitstream according to the defined order.
9. The apparatus according to claim 8, further comprising computer program code to cause the apparatus to determine the probability of usage of each coding tool according to a learning process.
10. The apparatus according to claim 9, wherein the learning process is based on already coded group of pictures.
11. The apparatus according to claim 8, wherein the probability of usage is determined according to one or more of the following: a block size, a quality of the current block compared to a reference block, temporal distance of a block compared to the reference block.
12. The apparatus according to claim 8, wherein the probability model is different for blocks having different sizes.
13. The apparatus according to claim 8, wherein the probability model is different for different regions of the source picture.
14. The apparatus according to claim 8, further comprising means for defining the order for syntax elements based on a pre-encoding and decoding process modelling the determined probabilities.
15. An apparatus according to claim 8, further comprising at least one processor and a memory including computer program code.
16. A computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to
- receive a source picture;
- partition the source picture into a set of non-overlapping blocks;
- for a first block being coded, to code syntax elements of various coding tools according to a pre-defined order into the bitstream;
- for any one or more subsequent block, to determine a probability of a usage of each of the various coding tools in any one or more previous blocks;
- to derive a probability model according to the determined probabilities; - to define an order for syntax elements for a block according to the probability model; and
- to encode syntax elements of the block into a bitstream according to the defined order.
17. A computer program product according to claim 16, wherein the computer program product is embodied on a non-transitory computer readable medium.
EP20826119.8A 2019-06-19 2020-06-12 A method, an apparatus and a computer program product for video encoding and video decoding Withdrawn EP3987808A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962863490P 2019-06-19 2019-06-19
PCT/FI2020/050421 WO2020254723A1 (en) 2019-06-19 2020-06-12 A method, an apparatus and a computer program product for video encoding and video decoding

Publications (2)

Publication Number Publication Date
EP3987808A1 true EP3987808A1 (en) 2022-04-27
EP3987808A4 EP3987808A4 (en) 2023-07-05

Family

ID=74036977

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20826119.8A Withdrawn EP3987808A4 (en) 2019-06-19 2020-06-12 A method, an apparatus and a computer program product for video encoding and video decoding

Country Status (2)

Country Link
EP (1) EP3987808A4 (en)
WO (1) WO2020254723A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119789B (en) * 2022-01-27 2022-05-03 电子科技大学 Lightweight HEVC chrominance image quality enhancement method based on online learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013128010A2 (en) * 2012-03-02 2013-09-06 Canon Kabushiki Kaisha Method and devices for encoding a sequence of images into a scalable video bit-stream, and decoding a corresponding scalable video bit-stream
US10404999B2 (en) * 2013-09-27 2019-09-03 Qualcomm Incorporated Residual coding for depth intra prediction modes
US10264268B2 (en) * 2015-04-07 2019-04-16 Shenzhen Boyan Technology Ltd. Pre-encoding for high efficiency video coding
US10979718B2 (en) * 2017-09-01 2021-04-13 Apple Inc. Machine learning video processing systems and methods
CN117768643A (en) * 2017-10-13 2024-03-26 弗劳恩霍夫应用研究促进协会 Intra prediction mode concept for block-wise slice coding

Also Published As

Publication number Publication date
EP3987808A4 (en) 2023-07-05
WO2020254723A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
EP3633990B1 (en) An apparatus and method for using a neural network in video coding
CN114339221B (en) Convolutional neural network based filter for video encoding and decoding
US11290722B2 (en) Method and apparatus for video encoding or decoding
GB2548358A (en) A method, an apparatus and a computer program product for coding a 360-degree panoramic images and video
US11496746B2 (en) Machine learning based rate-distortion optimizer for video compression
KR20210125088A (en) Encoders, decoders and corresponding methods harmonizing matrix-based intra prediction and quadratic transform core selection
WO2020015433A1 (en) Method and apparatus for intra prediction using cross-component linear model
US20150092862A1 (en) Modified hevc transform tree syntax
CN118101948A (en) Encoder, decoder and corresponding deduction method of boundary strength of deblocking filter
US20220394288A1 (en) Parameter Update of Neural Network-Based Filtering
CN115037948A (en) Neural network based video coding and decoding loop filter with residual scaling
US20220337853A1 (en) On Neural Network-Based Filtering for Imaging/Video Coding
CN113615173A (en) Method and device for carrying out optical flow prediction correction on affine decoding block
US12015785B2 (en) No reference image quality assessment based decoder side inter prediction
CN115550646A (en) External attention in neural network-based video codec
KR20210129736A (en) Optical flow-based video inter prediction
JP2024019652A (en) Method and device for video filtering
EP3987808A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
US20220394309A1 (en) On Padding Methods For Neural Network-Based In-Loop Filter
US11729424B2 (en) Visual quality assessment-based affine transformation
CN115209142A (en) Unified neural network loop filter
EP3973709A2 (en) A method, an apparatus and a computer program product for video encoding and video decoding
US20220159281A1 (en) No Reference Image Quality Assessment Based Decoder Side Intra Prediction
WO2023194650A1 (en) A method, an apparatus and a computer program product for video coding
WO2024002579A1 (en) A method, an apparatus and a computer program product for video coding

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220119

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20230602

RIC1 Information provided on ipc code assigned before grant

Ipc: G06N 20/00 20190101ALI20230526BHEP

Ipc: G06N 3/04 20060101ALI20230526BHEP

Ipc: H04N 19/149 20140101ALI20230526BHEP

Ipc: G06T 9/00 20060101ALI20230526BHEP

Ipc: G06N 3/08 20060101ALI20230526BHEP

Ipc: H04N 19/192 20140101ALI20230526BHEP

Ipc: H04N 19/70 20140101AFI20230526BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20240103