CN113242426A

CN113242426A - Video encoding and decoding method, device and storage medium

Info

Publication number: CN113242426A
Application number: CN202110369008.5A
Authority: CN
Inventors: 张昊; 姜俊宏; 杨明田; 苏昊天; 张本政; 黄兴军; 王照
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-08-10
Anticipated expiration: 2041-04-06
Also published as: CN113242426B

Abstract

The invention discloses a video coding and decoding method, a device and a storage medium, comprising the following steps: applying DBF filtering and/or SAO filtering at a coding end, and skipping the DBF filtering on coding units with the depth smaller than a depth threshold value according to the depth of the coding units for the DBF filtering; for the SAO filtering, according to the difference value between the original image data of the coding unit and the reconstruction data before SAO, the SAO filtering is not carried out on the coding unit with the difference value smaller than the difference value threshold; and decoding and post-processing are carried out at a decoding end. The method can greatly shorten the calculation time and effectively reduce the calculation complexity of encoding and decoding under the condition of ensuring that the quality loss of the decoded video is small.

Description

Video encoding and decoding method, device and storage medium

Technical Field

The invention relates to the technical field of computers and communication, in particular to a video coding and decoding method, video coding and decoding equipment and a storage medium.

Background

In recent years, video encoding and decoding are developed rapidly as a relatively common video compression method, the video encoding and decoding standard is continuously updated, the algorithm complexity is improved, the calculation amount is large, and the calculation time is increased.

With the technological progress, the cross research of various disciplines also gets more extensive attention, and certain achievements are obtained by adopting post-processing methods such as deep learning and the like to perform post-processing enhancement on the video sequence after coding, decoding and compressing. However, the current video decoding and post-processing combined technology has high overall complexity and overlong calculation time, and cannot meet the requirements of real-time application, so that the method cannot be popularized in the industry.

For example, for the h.265/HEVC standard codec HM which is currently used, the standard is newer, the complexity of the algorithm itself is higher, and the joint application of post-processing technologies such as deep learning is limited. Meanwhile, the intra-loop filtering processes DBF and SAO of the h.265/HEVC codec include a deblocking effect and ringing effect quality enhancement function, and many current compressed video post-processing enhancement technologies, such as STDF (spatial-Temporal distortion Fusion), include enhancement algorithms such as filtering for image quality, so that there are many algorithms for repeatedly improving quality in the codec and post-processing, which is high in computational complexity and time-consuming.

Disclosure of Invention

The invention aims to at least solve one of the technical problems in the prior art, and provides a video coding and decoding method which can reduce the time of video coding and decoding, reduce the computational complexity, is beneficial to the combined application of video decoding and post-processing technology and is convenient to popularize and use.

According to a first aspect of the present invention, a video coding and decoding method includes: applying DBF filtering and/or SAO filtering at a coding end, and skipping the DBF filtering on coding units with the depth smaller than a depth threshold value according to the depth of the coding units for the DBF filtering; for the SAO filtering, according to the difference value between the original image data of the coding unit and the reconstruction data before SAO, the SAO filtering is not carried out on the coding unit with the difference value smaller than the difference value threshold; and decoding and post-processing are carried out at a decoding end.

By using the coding unit information and the residual data, the DBF and/or SAO filtering is subjected to early skip processing during coding, so that the coding calculation amount is reduced, meanwhile, the repeated work of a decoding end is subjected to skip optimization, and the calculation amount is effectively reduced. The quality enhancement is carried out by utilizing the post-processing during the decoding, the quality loss caused by a fast algorithm can be made up, the encoding and decoding complexity is effectively reduced while the decoding quality is ensured, and the encoding and decoding time is reduced.

According to some embodiments of the invention, the coding unit is a CTU.

The method is suitable for the H.265/HEVC coding standard or a newer generation coding standard based on the H.265 coding framework, such as H.266/VVC. For example, for SAO filtering, the difference value may select a sum of luma residuals and a sum of residuals of two chroma cbs, Cr, such as a sum of luma residuals of the CTU original pixel and the reconstructed pixel before SAO is done and an average of a sum of residuals of the CTU original pixel and the reconstructed pixel before SAO on chroma Cb and Cr.

According to some embodiments of the invention, the post-processing is a deep learning based post-processing. Illustratively, the deep learning model includes, but is not limited to, a deep neural network model such as STDF (Spatio-Temporal Deformable Fusion), and the like.

A video codec device according to an embodiment of a second aspect of the present invention includes:

at least one memory for storing a computer program; and at least one processor for executing the computer program stored in the at least one memory to implement the above-mentioned video encoding and decoding method.

According to some embodiments of the invention, a video capture terminal is also included.

A computer-readable storage medium according to an embodiment of the third aspect of the present invention, having a computer program stored thereon, is characterized in that when the computer program is processed and executed, the above-mentioned video encoding and decoding method is implemented.

One or more embodiments of the present invention have at least the following beneficial effects:

under the condition of ensuring that the quality loss of the decoded video is small, the calculation time is greatly shortened, and the calculation complexity of encoding and decoding is effectively reduced. Experiments show that early skipping is carried out in DBF and SAO links in H.265/HEVC coding and decoding, video post-processing enhancement is carried out based on STDF, and only in terms of a decoding algorithm, compared with the original H.265/HEVC, the method can reduce the calculation complexity by more than 8% and only lose objective quality by 0.03%.

Drawings

Fig. 1 shows a CU partition structure of a frame map by an h.265/HEVC encoder;

FIG. 2 illustrates an algorithm flow for early skipping DBF in an embodiment;

FIG. 3 illustrates test results of an early skip DBF algorithm of an embodiment at different avgDepth thresholds;

FIG. 4 is a flowchart of an early skip SAO algorithm according to an embodiment;

FIG. 5 is a distY and distC distribution of all CTUs of a test sequence in a particular embodiment;

FIG. 6 is a block diagram of an apparatus of one embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a video coding and decoding method, which carries out early skip optimization on DBF filtering and/or SAO filtering at a coding end, wherein: for DBF filtering, according to the depth of a coding unit, skipping the DBF filtering of the coding unit with the depth smaller than a depth threshold value; for SAO filtering, according to the difference value between the original image data of the coding unit and the reconstruction data before SAO, the SAO filtering is not carried out on the coding unit with the difference value smaller than the difference value threshold; and decoding and post-processing are carried out at a decoding end.

In the method, a depth threshold is used for judging which coding units can skip the DBF, namely the value of the depth threshold is generally not higher than the maximum depth of the coding units; the difference threshold is used to determine which coding units can skip the SAO, and similarly, the value of the difference threshold is generally not higher than the maximum difference between the original image data of the coding unit and the reconstructed data before the SAO is performed. The skip optimization of the coding unit in the filtering link is rarely considered by the existing coding method to avoid influencing the video decoding quality. With the continuous update of the encoding rule and the complexity of the encoding method, the amount of calculation of the video codec increases, and the filtering operation adversely affects the computational efficiency and processing time of the encoding and decoding operations.

In view of this problem, the inventors found that, for the two parts of the in-loop filtering DBF and SAO, which act on different coding units, the difference of the filtering efficiency is large, there are some tasks with useless or low filtering efficiency, and by setting corresponding thresholds to skip these inefficient tasks, the amount of coding computation and the repeated work at the coding and decoding ends can be reduced. The video coding and decoding post-processing method is matched with the post-processing at the decoding end, and can complete the effects of filtering, relatively enhancing contrast and the like of the video image under the video coding and decoding post-processing enhancing effect of a deep neural network and the like, and recover and compensate the lost motion details in the video image. The video image is further restored by post-processing, so that the quality loss caused by a fast algorithm is compensated, the decoding complexity is reduced and the decoding time is shortened on the basis of basically ensuring the objective quality.

It can be understood that, the larger the threshold value is, the more coding units skipping filtering are, the quality loss may be increased to a certain extent, and in practical applications, values are usually taken as threshold points within a depth range and a difference range of the coding units according to different post-processing modes and objective quality requirements, so as to better balance decoding efficiency and decoding quality.

The following describes an application of the above method, taking a currently widely used h.265/HEVC standard reference encoder HM16.9 as an example, to perform video post-processing enhancement based on an STDF deep learning neural network.

The equipment platform applied in the embodiment is a PC (personal computer) end, an Ubuntu18.04 operating system is used, and the GPU model is GeForce RTX 2080 and 8G video memory. A virtual Python3.7 development environment is constructed by adopting an Anaconda3.5 third-party dependent library management tool, and a deep learning framework of an experiment is a Pythroch framework which is widely popular in academia. As the whole experiment needs to be trained on the GPU, a driver of the CUDA and other dependency libraries including TQDM, LMDB, PYYAML, OPENCV-PYTHON, SCIKIT-IMAGE and the like are also installed. The video codec uses the reference codec HM16.9(HEVC Test Model) of the h.265/HEVC video coding standard. Table 1 shows overall device configuration information.

TABLE 1

(1) Early skip DBF filtering

In order to find the optimal CU depth, the h.265/HEVC encoder uses a Quadtree (QT) partition for the Coding Units (CUs), and the recursive partition traverses the CU depth, including 64 × 64, 32 × 32, 16 × 16, 8 × 8 CU blocks (total 4 layers of depth), and 64 × 64 CU block depth is 0, and 8 × 8 CU block depth is 3. In terms of depth selection of the encoder for the CU, as shown in fig. 1, a partitioning condition of the CU in a frame of image is shown, and for a smooth region with simple texture information, the encoder can more easily select a large block of CU as an optimal coding block, i.e. a shallower depth/a smaller depth value; for regions where the texture information is rich, the encoder prefers to select smaller blocks, finer CU encodings, i.e. larger depth values.

CTUs (Coding Tree units) in h.265/HEVC belong to the root node of the Quadtree (QT) partition, i.e. 64 × 64 CU blocks. The internal implementation of the decoder DBF is based on the CTU level, thus calculating the CTU depth as the coding unit depth. The codec divides the CTU into 16 × 16(256) minimum blocks of 4 × 4, and stores the depth value of each 4 × 4 block by calculating the average depth (avgDepth) of the 256 4 × 4 blocks as the CTU depth, as shown in equation (1), where blockdepth (i) represents the depth value of the i-th 4 × 4 block.

The "filtering effectiveness" of the CTU by the DBF can be measured by calculating a simpler average sad (sum of Absolute differences), which represents the change of the CTU before and after filtering. The higher filtering benefit means that the CTU has larger change after DBF filtering, otherwise, the filtering benefit is low. The average sad (avgsad) is calculated as shown in equation (2), where p (x, y) and p' (x, y) represent the pixel values of CTUs at (x, y) before and after DBF filtering, respectively.

avgSAD＝∑_(x,y)∈CTUI p' (x, y) -p (x, y) |/256 equation (2)

With the Low _ Delay _ P configuration at HM16.9, 5 video sequences (as shown in table 2) were tested, 5 frames each, for a total of 37640 CTUs. Dividing the depth value range of 0-3 into 6 intervals by taking 0.5 as an interval step length, and counting the DBF filtering efficiency on each interval, namely the distribution of avgSAD, wherein the counting result is shown in Table 3.

TABLE 2

Video sequence	Resolution ratio
		BasketballDrill	832x480
BasketballDrive	1920x1080
		BasketballPass	416x240
KristenAndSara	1280x720
		Traffic	2560x1600

TABLE 3

As can be seen from table 3, in the CTUs with the average depth avgDepth in the interval of 0 to 0.5, the avgSAD of almost all (97.79%) of the CTUs is less than 3.0, which indicates that the DBF filtering efficiency of the CTUs with the avgDepth in the interval of 0 to 0.5 is relatively low, and skipping can be considered. According to the follow-up test results, the CTU depth threshold can be further adjusted, the occupation ratio of skipping over the CTU is improved, the quality and the efficiency are balanced, and the actual speed is greatly improved.

To find the avgDepth interval for speed and quality trade-offs, the performance of the proposed algorithm to skip the CTU level DBF early is tested with different avgDepth thresholds (0.5, 1.0, 1.5, 2.0, 2.5, 3.0) over the 11 different resolution HEVC standard test sequences of table 4.

TABLE 4

The algorithm for skipping the CTU level DBF early is to traverse all CTUs in the current video frame, and perform the following processing for each CTU, as shown in fig. 2:

s101, calculating the average depth (avgDepth) of the current CTU according to a formula (1), if the avgDepth is less than an avgDepth threshold, executing a step 103, and otherwise, executing a step 102;

s102, DBF filtering is carried out on the current CTU;

s103, skipping the DBF filtering of the current CTU.

Fig. 3 shows the test results of the DBF algorithm for early skipping of CTUs at different avgDepth thresholds. In the figure, the time (time, unit is second) and PSNR are the average values of all video sequences, the abscissa "original" corresponds to the non-skipped DBF, PSNR _ enhanced is the decoding plus post-processing quality enhancement effect, and PSNR _ origin is the non-post-processing image quality. It can be observed that the post-processing can effectively compensate for the quality loss due to skipping the filtering process. Compared with the original codec, in terms of bitrate (rate) and time, the bitrate at which the avgDepth is 1.5 is in the early and middle stages of rate increase, the rate increase is not too large, but compared with the original codec, the time decrease is in the middle and later stages, and after the post-processing quality is enhanced, the quality loss caused by a fast algorithm can be basically compensated. Subsequent experiments show that the overall effect is better when the avgDepth threshold value is 1.5.

(2) Early skip SAO filtering

SAO is used to remove ringing effects in video, and the main modes are divided into three modes, i.e., OFF mode (no compensation), EO or BO mode (EO compensation or BO compensation according to CTU pixel characteristics), and Merge mode (compensation using SAO parameters of the upper CTU or the left CTU). The ringing effect is a phenomenon that ripples are generated at the edge of a block, which affects the subjective quality of video, and the root cause of the ringing effect is the loss of high frequency information caused during the transformation and quantization processes. The SAO takes the CTU as a processing unit, operates the pixels of the corrugated curve part, adds negative values at wave crests and positive values at wave troughs, and thus achieves the purpose of removing the ringing effect.

Considering that the purpose of SAO is to remove the ringing effect caused by the loss of high frequency information in the transformation and quantization, the SAO stage of skipping loop filtering in the blocks with less loss of high frequency information may be selected, and the SAO is only performed on the blocks with more loss of high frequency information, thereby realizing skipping the SAO as much as possible under the condition of low video quality loss, so as to achieve the purpose of reducing the decoding time.

For this purpose, the sum of the residuals of the original CTU pixel and the reconstructed pixel before SAO processing on the luminance and the sum of the residuals on the chrominance Cb, Cr may be selected to reflect the degree of loss of the high frequency information. Specifically, the sum of the luminance residuals of the CTU original pixel and the reconstructed pixel before SAO processing and the average of the sum of the residuals of the CTU original pixel and the reconstructed pixel before SAO processing on the chrominance Cb and Cr may be used, and accordingly, the threshold values are set for the chrominance and luminance components, respectively.

Fig. 4 shows an algorithm flow of SAO for early skipping CTUs, taking the current CTU as an example, as follows:

s201, counting residual errors and distY of CTU reconstructed pixels and CTU original pixels on the brightness before SAO is carried out;

s202, counting an average value distC of residual sum of CTU reconstructed pixels and CTU original pixels on chroma Cb and Cr before SAO is carried out;

s203, when distY is smaller than a distY threshold value, skipping SAO of the CTU brightness, otherwise, performing SAO of the brightness on the CTU;

and S204, when the distC is smaller than the distC threshold value, skipping the SAO of the CTU chroma, otherwise, performing the SAO on the CTU chroma.

The calculation formulas of distY and distC are respectively:

wherein height Y and width Y are height and width of pixels in luminance of CTU, respectively, and height C and width hC are height and width of pixels in chrominance of CTU, respectively. CTU_ori,Y(x, y) and CTU_rec,Y(x, y) are the values of the luminance components of the original and reconstructed CTUs, respectively, at point (x, y), CTU_ori,Cb、CTU_rec,Cb、CTU_ori,CrAnd CTU_rec,Cr(x, y) and so on.

In order to select the most suitable distY threshold and distC threshold, trace analysis is performed on the distY and distC of each CTU of a test sequence, as shown in fig. 5, where the abscissa is the number of CTUs. It can be seen that most CTUs have distC below 2000 and are relatively smooth before SAO. When the distC exceeds 2000, the change is very severe. The change in distY becomes severe starting from 12000 and CTUs with distY less than 12000 cover 80%. It is contemplated that the thresholds for distY and distC may be 12000 and 2000, respectively.

Subsequent experiments show that the overall effect is best when the distY and distC thresholds are 12000 and 2000 respectively. At the moment, the decoding time can be saved by 2.35%, the influence on the image quality and the code rate can be ignored, and the image quality which is the same as that of the code stream coded by the original coder after the decoded video is subjected to post-processing and the quality of the video is enhanced.

After a suitable threshold is selected, uncompressed h.265 standard test sequences given by 11 Video Coding Joint groups (Joint Video on Video Coding) are tested, as shown in table 4. The configuration of Low _ Delay _ P is adopted on HM16.9, the Quantization Parameter (QP) is 37, 11 test sequences are subjected to Video coding and decoding, an early skip optimization algorithm of DBF and/or SAO is performed while coding, a coded code stream is obtained, the coded code stream is decoded, and post-processing Enhancement is performed by using an stspatio-Temporal resolution fusion variable encoding scheme proposed by jianing ding et al in the paper "spatial-Temporal resolution constraint for Compressed Video Quality Enhancement".

The peak signal-to-noise ratio PSNR value is adopted as an evaluation index of the objective quality of the video image, and the peak signal-to-noise ratio PSNR before optimization_oAnd comparing the peak signal-to-noise ratio PSNR' after optimization, and measuring the condition of quality change by using the delta PSNR. If the delta PSNR change is small, the early skipping algorithm of the method is proved to have good performance. PSNR peak SNR is calculated as in equation (5), Δ PSNR is calculated as in equation (6), and PS is used to represent the degree of change in the decoded objective quality after early skip optimization relative to the objective quality without early skip optimization, and is calculated as in equation (7)

ΔPSNR＝PSNR_o-PSNR' equation (6)

For the time performance evaluation index, respectively counting the total decoding time t after early skipping optimization_optAnd overall decoding time t without early skip optimization_oriAnd TS is used to indicate the degree of decoding time reduction before and after early skip optimization. The TS calculation formula is shown in formula (8).

Tables 5 and 6 show the comparison results before and after the early skip DBF filter optimization. On average, early skipping of DBF can reduce computational complexity by 5.96% compared to the un-optimized decoding algorithm at the loss of only 0.03% of objective quality (0.009 db).

TABLE 5

TABLE 6

Tables 7 and 8 show the results of the comparison before and after early SAO skipping. On average, compared with an unoptimized decoding algorithm, early-stage skipping of the SAO filtering effectively reduces the computational complexity of decoding on the premise of ensuring little change of the guest quality, and can reduce the decoding time by 1.39% on average through a large amount of data tests.

TABLE 7

TABLE 8

Tables 9 and 10 show the comparison results before and after the early-skip DBF and early-skip SAO optimization (joint optimization for short) are performed simultaneously. On average, the jointly optimized algorithm can reduce the computational complexity by more than 8% compared to the optimized decoding algorithm at the loss of only 0.03% of the objective quality (0.009 db).

At present, codecs such as a reference encoder HM of the existing h.265 standard and post-processing technologies such as a deep neural network cannot be well combined together. On one hand, because the H.265/HEVC standard is newer, the algorithm complexity is higher, the deep learning network model is huge, the calculated amount is huge, the current deep learning post-processing network and the H.265/HEVC decoding process are relatively independent, and when a decoding end decodes, the processes of in-loop filtering and the like for improving the quality exist, so that the quality losses of blocking effect, ringing effect, color deviation, image blurring and the like are removed. In the deep learning network post-processing, quality enhancement algorithms like filtering also exist, and a part of redundant repeated quality enhancement work exists in the quality enhancement algorithms and the quality enhancement algorithms, so that useless task quantity is increased, the complexity of the whole algorithm is high, computing resources are wasted, and computing time is increased. The method carries out early skip processing on DBF and/or SAO filtering at the encoding end, effectively reduces the calculation amount of encoding and decoding and post-processing, utilizes the post-processing to carry out quality enhancement to make up for the quality loss caused by a rapid algorithm, effectively reduces the complexity of encoding and decoding while ensuring the decoding quality, and reduces the encoding and decoding time.

It is noted that the h.265 coding standard is used in the above example, and the coding unit is CTU, but the method is not limited to the above coding rule, for example, for the coding standards of h.264 and other macroblock modes, the coding unit is macroblock.

The method is not particularly limited in post-processing manner, and other deep learning post-processing or conventional post-processing methods may be employed in addition to the exemplary STDF-based post-processing enhancement.

For example, methods such as BM3D denoising with high efficiency are proposed for optimizing video processing functions in terms of denoising, sharpening, and increasing the frame rate. Researchers have also proposed using surrounding area information to improve the current image area quality by virtue of the consistency of the video frames (Borkowski D, Jakubwski A, K Ja ń czak-Borkowska. Feynman-Kac Formula and retrieval of High ISO Images [ C ]// International Conference on Computer Vision and graphics. spring International Publishing, 2014.).

Dong et al proposed a four-layer convolutional neural network to enhance video image quality in 2015, named AR-CNN method (c.dong, y.deng, c.change Loy, and x.tang, "Compression aspects reduction by a reduced connectivity network," in Proceedings of the IEEE International Conference on Computer Vision (ICCV),2015, pp.576-584). Yang et al, in 2019, first proposed a deep learning based Multi-frame Quality Enhancement method MFQE (Ren Y, Mai X, Wang Z, et al, Multi-frame Quality Enhancement for Compressed Video [ C ]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018), in which a frame with higher Quality is used as a key frame, i.e., a reference frame for other frames with lower relative Quality, and further proposed another new Multi-frame convolutional neural network (MF-CNN) capable of aggregating the key frame and related information with lower adjacent Quality, thereby more effectively improving the Quality of related Video frames MFQE2.0(Guan Z, Xing Q, Mai X, et al, MFQE 2.0: ANQe for Multi-frame Analysis-Video Analysis [ IEEE ] Video and image J, 2019, PP (99): 1-1.).

An embodiment of the present invention further provides a video encoding and decoding device, including: at least one memory for storing a computer program; and at least one processor for executing a computer program stored in the at least one memory to implement the above-described video encoding and decoding method. The device may be a computer or a server, and the device configuration may refer to the above-described embodiments. Fig. 6 illustrates a processor 100 and a memory 200, wherein information such as software programs, instructions, data sets, etc. is stored in the memory 200, and the processor 100 executes the video encoding and decoding method by operating the data stored in the memory 200.

In a possible implementation manner, the device further comprises a video acquisition terminal in communication connection with the device, and the video acquisition terminal is used for acquiring a video to be encoded, and the video acquisition terminal can be a smart phone, a tablet computer, a personal computer, or a monitoring device and the like which are provided with a camera, so that existing video data can be processed, and real-time application in live webcasting, video on demand, on-line classroom and the like can be realized.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by one or more processors, for example, by one processor 100 in fig. 6, and can cause the one or more processors to execute the video coding and decoding method in the above method embodiment.

The above embodiments are illustrative of the present invention, but the present invention is not limited to the details of the above embodiments, and various equivalent substitutions or simple modifications within the technical spirit of the present invention by those skilled in the art should be included in the scope of the present invention.

Claims

1. A video encoding and decoding method, comprising:

applying DBF filtering and/or SAO filtering at a coding end, and skipping the DBF filtering on coding units with the depth smaller than a depth threshold value according to the depth of the coding units for the DBF filtering; for the SAO filtering, according to the difference value between the original image data of the coding unit and the reconstruction data before SAO, the SAO filtering is not carried out on the coding unit with the difference value smaller than the difference value threshold;

and decoding and post-processing are carried out at a decoding end.

2. The video coding and decoding method according to claim 1, wherein the coding unit is a CTU.

3. The video coding and decoding method according to claim 1, wherein the post-processing is deep learning based post-processing.

4. The video coding and decoding method according to claim 3, wherein the deep learning model is a deep neural network model.

5. A video coding and decoding device, comprising:

at least one memory for storing a computer program;

at least one processor configured to execute a computer program stored in the at least one memory to implement the video codec method according to any one of claims 1 to 4.

6. The video coding and decoding device according to claim 5, further comprising a video capture terminal.

7. A computer-readable storage medium, on which a computer program is stored, which, when being processed and executed, carries out a video coding method according to any one of claims 1 to 4.