CN113556567B

CN113556567B - Method and device for inter-frame prediction

Info

Publication number: CN113556567B
Application number: CN202010330793.9A
Authority: CN
Inventors: 贾川民; 马思伟; 王晶
Original assignee: Peking University; Huawei Technologies Co Ltd
Current assignee: Peking University; Huawei Technologies Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2022-11-25
Anticipated expiration: 2040-04-24
Also published as: CN113556567A

Abstract

The application relates to a video coding and decoding technology in the field of artificial intelligence, and provides an inter-frame prediction method, which comprises the following steps: obtaining a prediction motion field and a motion field residual, wherein the prediction motion field is a motion field of a reference frame, and the resolution of the motion field residual is smaller than that of the reference frame; generating a reconstructed motion field according to the prediction motion field and the motion field residual error; up-sampling the reconstructed motion field to generate a target motion field; and generating a prediction frame according to the target motion field. Since the motion field residue with lower resolution contains less information, the entropy encoding or decoding of the motion field residue with lower resolution is more efficient in the subsequent entropy encoding or decoding of the motion field residue. Furthermore, upsampling can increase the resolution of the reconstructed motion field, so that the resolution of the predicted frame is the same as the resolution of the encoded image. Therefore, the method can improve the coding efficiency and the decoding efficiency without influencing the inter-frame prediction effect.

Description

Method and device for inter-frame prediction

Technical Field

The application relates to a video coding and decoding technology in artificial intelligence, in particular to a method and a device for inter-frame prediction.

Background

Video coding can reduce redundant information in video data, and thus, video coding is significant for improving the storage efficiency and transmission efficiency of video. The end-to-end video coding is a new video coding method, and a global optimization model of a reconstructed video and an original video is established by utilizing a neural network, so that the limitation that the traditional video coding model can only be locally optimized is broken through.

In the end-to-end video coding, after an encoding end encodes an original video, a code stream containing an encoding result needs to be transmitted to a decoding end; after receiving the code stream, the decoding end needs to decode the code stream, recover the coding information, and reconstruct the video according to the coding information. The existing coding efficiency and decoding efficiency need to be improved.

Disclosure of Invention

The application provides a method and a device for inter-frame prediction, which can improve the coding efficiency and the decoding efficiency of a motion field.

In a first aspect, a method for inter-frame prediction is provided, including: obtaining a prediction motion field and a motion field residual, wherein the prediction motion field is a motion field of a reference frame, and the resolution of the motion field residual is smaller than that of the reference frame; generating a reconstructed motion field from the prediction motion field and the motion field residual; up-sampling the reconstructed motion field to generate a target motion field; and generating a prediction frame according to the object motion field.

The above method may be performed by an encoding side or a decoding side. For the encoding end, the resolution of the motion field residual is less than that of the current frame, and the resolution of the motion field residual is less than that of the reference frame; for the decoding end, the resolution of the motion field residual is smaller than the resolution of the reference frame. Because the information contained in the motion field residual error with lower resolution is less than the information contained in the motion field residual error with higher resolution, the coding efficiency of entropy coding by using the motion field residual error with lower resolution is higher in the subsequent process of entropy coding the motion field residual error; accordingly, the decoding efficiency of entropy decoding using the motion field residue having the lower resolution is also higher. Furthermore, upsampling can increase the resolution of the reconstructed motion field and thus the resolution of the predicted frame, e.g. to the same resolution as the encoded image. Therefore, the method can improve the coding efficiency and the decoding efficiency without influencing the inter-frame prediction effect.

Optionally, the obtaining of the prediction motion field and motion field residual comprises: acquiring a real motion field and the prediction motion field, wherein the real motion field is a motion field of a current frame, and the resolution of the real motion field is less than that of the current frame; generating a motion field residual from the real motion field and the prediction motion field.

The optional implementation mode is executed by the encoding end, and the encoding efficiency can be improved while the inter-frame prediction effect is not influenced.

Optionally, the obtaining of the prediction motion field and motion field residuals includes: and acquiring the motion field residual error from the code stream.

The optional implementation mode is executed by a decoding end, and the decoding efficiency can be improved while the inter-frame prediction effect is not influenced.

Optionally, the resolution of the real motion field is one fourth of the resolution of the current frame and/or the resolution of the prediction motion field is one fourth of the resolution of the reference frame.

When the height and the width of the real motion field are respectively H/2 and W/2, the resolution of the real motion field is one fourth of that of the current frame; when the height and width of the prediction motion field are H/2 and W/2, respectively, the resolution of the prediction motion field is one-fourth of the resolution of the reference frame. This embodiment can achieve better compression than using motion fields of other resolutions.

In a second aspect, the present application provides an inter prediction apparatus comprising several functional units for implementing any one of the methods of the first aspect. For example, the inter prediction apparatus may include:

an acquisition unit configured to acquire a prediction motion field and a motion field residual, the prediction motion field being a motion field of a reference frame, the motion field residual having a resolution smaller than a resolution of the reference frame;

a reconstruction unit for generating a reconstructed motion field from the prediction motion field and the motion field residual;

the up-sampling unit is used for up-sampling the reconstructed motion field to generate a target motion field;

and the inter-frame prediction unit is used for generating a prediction frame according to the object motion field.

In a third aspect, the present application provides a video encoder comprising:

an inter prediction apparatus as described in the second aspect, configured to generate the motion field residual;

the transformation neural network is used for carrying out transformation coding on the motion field residual error and outputting a transformation motion field residual error;

the quantization module is used for quantizing the residual error of the transformation motion field and outputting the quantized residual error of the transformation motion field;

the entropy coding module is used for entropy coding the quantized transformation motion field residual error and outputting a code stream;

the inverse quantization module is used for carrying out inverse quantization on the quantized transformation motion field residual error and outputting a restored transformation motion field residual error;

the inverse transformation neural network is used for carrying out inverse transformation on the recovered transformation motion field residual error and outputting the recovered motion field residual error;

the inter prediction means is further arranged to generate the predicted frame from the restored motion field residuals.

In a fourth aspect, the present application provides a video decoder comprising:

the entropy decoding module is used for decoding quantized transformation motion field residual errors from the code stream;

the inverse quantization module is used for carrying out inverse quantization on the quantized transform motion field residual error and outputting a recovered transform motion field residual error;

an inter-frame prediction apparatus according to a second aspect, configured to generate the prediction frame according to the restored motion field residuals and the prediction motion field.

In a fifth aspect, the present application provides an encoding device comprising a non-volatile memory and a processor coupled to each other, the processor calling program code stored in the non-volatile memory to perform part or all of the steps of any one of the methods of the first aspect.

In a sixth aspect, the present application provides a decoding device comprising a non-volatile memory and a processor coupled to each other, the processor calling program code stored in the non-volatile memory to perform part or all of the steps of any one of the methods of the first aspect.

In a seventh aspect, the present application provides a computer readable storage medium storing program code, wherein the program code comprises instructions for performing some or all of the steps of any one of the methods of the first aspect.

In an eighth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.

Drawings

FIG. 1 is a schematic diagram of a video encoding system suitable for use in the present application;

FIG. 2 is a schematic diagram of a video decoding system suitable for use in the present application;

FIG. 3 is a diagram illustrating an inter-frame prediction method according to the present application;

FIG. 4 is a schematic diagram of a neural network for video encoding provided herein;

FIG. 5 is a diagram illustrating the coding effect of several video coding methods;

FIG. 6 is a diagram of an inter-frame prediction apparatus provided in the present application;

fig. 7 is a schematic diagram of a video encoding apparatus or a video decoding apparatus provided in the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a video coding system (also referred to as an encoder) suitable for use in the present application.

The video coding system comprises modules such as intra prediction (intra prediction), inter prediction (inter prediction), transform (transform), quantization (quantization), entropy coding (entropy encoding), in-loop filtering (in-loop filtering), and the like.

The video is composed of a plurality of original frames including a current frame (F) to be encoded _n ). After a current frame is input into a video coding system, the current frame is input into an intra-frame prediction neural network or an inter-frame prediction neural network based on a prediction mode, and then intra-frame prediction or inter-frame prediction is carried out. In FIG. 1, P denotes prediction information, D _n Residual information representing the current frame (e.g. motion field residual and image residual), uF _n ' denotes reconstruction information before filtering, D _n ' denotes the restored residual information.

The intra-frame prediction means that the pixel values of the pixel points in the region to be reconstructed in the current frame are predicted by using the pixel values of the pixel points in the reconstructed region of the current frame. In which neural networks implementing intra-coding, intra-prediction, and intra-decoding may be jointly trained. The combined training is carried out by taking the weight of the existing module as an initial point to carry out transfer learning, so that the time for model training can be greatly reduced.

Inter-frame prediction refers to the prediction of the pixel values of the pixels in the current frame by using the reconstructed frame (i.e., the reference frame), for example, the last reconstructed frame F can be used _n-1 ' as the current frame F _n Inter prediction is performed on the reference frame. Alternatively, deep Neural Networks (DNNs) may be used for inter-frame prediction. For example, the current frame and the reference frame may be input to DNN (e.g., flownet) to obtain a motion field of the current frame and a motion field of the reference frame, respectively, and a motion field residual may be generated based on the two motion fields. The motion field residual can be obtained by directly subtracting the motion field of the current frame from the motion field of the reference frame, or by other methods. And then, the motion field residual error can be subjected to transform coding by utilizing a neural network, and then the motion field residual error subjected to transform coding is subjected to quantization, entropy coding and other operations, so that a code stream is finally obtained.

In the process of processing at the encoding end, the quantized motion can be further processedAnd performing inverse transformation coding on the recovered transformation motion field residual error through a neural network to obtain the recovered motion field residual error. The recovered motion field residual can be used together with the prediction motion field to generate a reconstructed prediction frame, which is used to reconstruct the current frame F after loop filtering processing by the neural network _n '，F _n ' can be used as a subsequent inter-frame coding (e.g. original frame F) _n+1 Inter-coded).

The processing of the code stream by the decoding side is similar to the inverse process of encoding the image by the encoding side, and fig. 2 shows a video decoding system (also referred to as a decoder) suitable for use in the present application.

As shown in fig. 2, a decoding end first obtains a quantized motion field residual from a code stream by entropy decoding, the motion field residual is dequantized to generate a restored motion field residual, the restored motion field residual is inverse-transform coded by a neural network to obtain a restored motion field residual, the decoding end can generate a reconstructed prediction frame from the restored motion field residual and a prediction motion field (generated based on a reference frame reconstructed in the previous decoding process), and then obtains reconstruction information uF before filtering by using the prediction frame _n '. For intra-frame prediction, the prediction information P may be the pixel values of pixels in the reconstructed region of the current frame; for inter prediction, the prediction information P is the reconstructed prediction frame.

uF _n ' reconstruction information of the current frame, namely the reconstructed frame F, can be obtained through the filtering processing of the neural network loop _n '，F _n ' may be used as a reference frame for subsequent inter prediction.

As can be seen from fig. 1 and fig. 2, after the encoding end encodes the original video, it needs to transmit a code stream containing the motion field residual to the decoding end; and the decoding end performs inter-frame prediction according to the motion field residual error and generates a prediction frame. Fig. 3 is an inter-frame prediction method provided by the present application, which may be performed by an encoding side or a decoding side. The method 300 includes:

s310, obtaining a prediction motion field and a motion field residual, wherein the prediction motion field is a motion field of a reference frame, and the resolution of the motion field residual is smaller than that of the reference frame.

When S310 is performed by the encoding side, the encoding side may generate a real motion field from the current frame and a prediction motion field from the reference frame, and then generate a motion field residual from the real motion field and the prediction motion field, and how the encoding side generates the motion field residual will be described in detail below.

When S310 is executed by the decoding end, the decoding end may receive the code stream, perform entropy decoding on the code stream, and generate a motion field residual error; the prediction motion field may be obtained by processing the reference frame through a neural network.

The motion fields are related to the pixels of the image and can therefore also be described in terms of resolution. In the method 300, the resolution of the real motion field is smaller than that of the current frame, and the resolution of the prediction motion field is smaller than that of the reference frame, so the resolution of the motion field residuals generated based on the real motion field and the prediction motion field are also smaller than that of the current frame, and since the information contained in the motion field residuals with lower resolution is less than that contained in the motion field residuals with higher resolution, the encoding efficiency of entropy encoding using the motion field residuals with lower resolution is higher in the subsequent process of entropy encoding the motion field residuals and the prediction motion field; accordingly, the decoding efficiency of entropy decoding using the motion field residue having a lower resolution is also higher.

The method for acquiring the real motion field and the prediction motion field at the encoding end is described as follows.

The encoding side may divide the video into groups of pictures (GOPs), perform picture encoding on a first frame in the GOP, and perform inter-frame encoding on frames other than the first frame in the GOP, where the first frame refers to a first encoded frame, and may or may not be a frame with the earliest time in the GOP.

After the first frame is image-encoded, a frame may be determined as a current frame from the original frames not encoded in the GOP, and the current frame may be inter-predicted using the first frame as a reference frame of the current frame.

The current frame and the reference frame may be processed separately by the neural network shown in fig. 4 to obtain a real motion field and a predicted motion field, the neural network generating the predicted motion field and the real motion field in fig. 4 is a trained self-encoder (the self-encoder is, for example, a Flownet), and an arrow between neural network layers of the self-encoder represents a jump connection; the numbers below the convolutional and deconvolution layers represent the number of convolutional kernels that the neural network layer contains. The neural network shown in fig. 4 is an example applicable to the present application, and the neural network capable of generating a real motion field and a prediction motion field is not limited thereto.

The real motion field and the prediction motion field are for example pixel-wise optical flow fields. Motion fields are a concept that describes the motion of objects in space, but it is difficult to derive motion fields directly from the images that make up a video. An optical flow field (optical flow field) is a two-dimensional vector field which reflects the variation trend of the gray scale of each pixel point on the image and can be regarded as an instantaneous velocity field generated by the movement of the pixel point with the gray scale on the image plane, so that the motion field can be represented by the optical flow field.

Alternatively, if the dimensions of the current frame and the reference frame are H × W, the dimensions of the real motion field and the prediction motion field may be (H/2) × (W/2), H representing the height and W representing the width. When the height and the width of the real motion field are respectively H/2 and W/2, the resolution of the real motion field is one fourth of that of the current frame; when the height and width of the prediction motion field are H/2 and W/2, respectively, the resolution of the prediction motion field is one-fourth of the resolution of the reference frame. Table 1 gives the effect of this example compared to the other schemes.

TABLE 1

After the coding end acquires the real motion field and the prediction motion field, a motion field residual error can be obtained in a mode of directly subtracting the real motion field from the prediction motion field, and the motion field residual error can also be obtained in other modes. Optionally, the encoding end may compress and quantize the motion field residual error through the neural network shown in fig. 4, and then perform entropy encoding to generate a code stream; accordingly, after receiving the code stream, the decoding end needs to perform entropy decoding, decompression and inverse quantization to obtain the motion field residual.

After obtaining the motion field residual, the encoding end or the decoding end may further perform the following steps.

S320, generating a reconstructed motion field according to the prediction motion field and the motion field residual error;

s330, performing up-sampling on the reconstructed motion field to generate a target motion field;

s340, generating a prediction frame according to the object motion field.

The encoding end or the decoding end can directly add the motion field residual and the prediction motion field to generate a reconstructed motion field, and up-sample the reconstructed motion field to generate an object motion field, and then, can perform warping (warp) processing on the object motion field to generate a prediction frame.

The upsampling may be a nearest neighbor interpolation (nearest neighbor) interpolation method, or may be other upsampling methods, which is not limited in this application.

Upsampling can increase the resolution of the reconstructed motion field and thus the resolution of the predicted frame, e.g. to the same extent as the resolution of the encoded image. Therefore, the method 300 can improve encoding efficiency and decoding efficiency without affecting the inter prediction effect.

Fig. 5 shows the coding effect of method 300 and several other video coding methods. As can be seen in fig. 5, the peak signal to noise ratio (PSNR) of the method 300 is slightly greater than the PSNR of HM-16.9 and much greater than the PSNR of the x264, x265, and 16 × 16 Motion Vector (MV) blocks.

Examples of the inter prediction methods provided by the present application are described above in detail. It is understood that the corresponding apparatus contains hardware structures and/or software modules corresponding to the respective functions for implementing the functions described above. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present application may perform division of functional units on the apparatus according to the method example described above, for example, each function may be divided into each functional unit, or two or more functions may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit. It should be noted that, the division of the cells in the present application is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

Fig. 6 shows a schematic structural diagram of an inter-frame prediction apparatus provided in the present application. The apparatus 600 includes an acquisition unit 610, a reconstruction unit 620, an upsampling unit 630, and an inter prediction unit 640.

The obtaining unit 610 is configured to obtain a prediction motion field and a motion field residual, where the prediction motion field is a motion field of a reference frame, and a resolution of the motion field residual is smaller than a resolution of the reference frame;

a reconstruction unit 620 is configured to generate a reconstructed motion field according to the prediction motion field and the motion field residual;

an upsampling unit 630 is configured to upsample the reconstructed motion field to generate a target motion field;

the inter prediction unit 640 is configured to generate a prediction frame according to the object motion field.

Optionally, the obtaining unit 610 is specifically configured to: acquiring a real motion field and the prediction motion field, wherein the real motion field is a motion field of a current frame, and the resolution of the real motion field is less than that of the current frame; generating a motion field residual from the real motion field and the prediction motion field.

Optionally, the resolution of the real motion field is one fourth of the resolution of the current frame, and the resolution of the prediction motion field is one fourth of the resolution of the reference frame.

Optionally, the obtaining unit is specifically configured to: and acquiring the motion field residual error from the code stream.

The specific manner in which the apparatus 600 performs the inter prediction method and the resulting beneficial effects can be seen in the related description of the method embodiments.

Fig. 7 shows a schematic structural diagram of an encoding device or a decoding device provided by the present application. The dashed lines in fig. 7 indicate that the unit or the module is optional. The apparatus 700 may be used to implement the methods described in the method embodiments above. The device 700 may be a terminal device or a server or a chip.

The device 700 includes one or more processors 701, and the one or more processors 701 may enable the device 700 to implement the methods in the method embodiments. The processor 701 may be a general-purpose processor or a special-purpose processor. For example, the processor 701 may be a Central Processing Unit (CPU). The CPU may be configured to control the apparatus 700, execute software programs, and process data of the software programs. The device 700 may also include a communication unit 705 to enable input (reception) and output (transmission) of signals, such as codestreams.

For example, the device 700 may be a chip and the communication unit 705 may be an input and/or output circuit of the chip, or the communication unit 705 may be a communication interface of the chip, and the chip may be a component of a terminal device or a network device or other electronic devices.

Also for example, the device 700 may be a terminal device or a server, and the communication unit 705 may be a transceiver of the terminal device or the server, or the communication unit 705 may be a transceiver circuit of the terminal device or the server.

The apparatus 700 may comprise one or more memories 702, on which programs 704 are stored, and the programs 704 may be executed by the processor 701 to generate instructions 703, so that the processor 701 executes the method described in the above method embodiments according to the instructions 703. Optionally, the memory 702 may also store data (such as video or code stream to be encoded). Alternatively, the processor 701 may also read data stored in the memory 702, the data may be stored at the same memory address as the program 704, or the data may be stored at a different memory address from the program 704.

The processor 701 and the memory 702 may be provided separately or integrated together, for example, on a System On Chip (SOC) of the terminal device.

The specific manner in which the processor 701 executes the method embodiments may be referred to in the description of the method embodiments.

It should be understood that the steps of the above-described method embodiments may be performed by logic circuits in the form of hardware or instructions in the form of software in the processor 701. The processor 701 may be a CPU, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic device, such as a discrete gate, a transistor logic device, or discrete hardware components.

The application also provides a computer program product, which when executed by a processor 701 implements the method according to any of the method embodiments of the application.

The computer program product may be stored in the memory 702, for example, as the program 704, and the program 704 is finally converted into an executable object file capable of being executed by the processor 701 through preprocessing, compiling, assembling, linking and the like.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer, implements the method of any of the method embodiments of the present application. The computer program may be a high-level language program or an executable object program.

Such as memory 702. Memory 702 may be either volatile memory or nonvolatile memory, or memory 702 may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM).

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process and the generated technical effect of the apparatus and the device described above may refer to the corresponding process and technical effect in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the disclosed system, apparatus and method can be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not performed. The above-described embodiments of the apparatus are merely exemplary, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, and a plurality of units or components may be combined or integrated into another system. In addition, the coupling between the units or the coupling between the components may be direct coupling or indirect coupling, and the coupling includes electrical, mechanical or other connections.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association relationship describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims

1. An inter-frame prediction method, comprising:

obtaining a prediction motion field and a motion field residual, wherein the prediction motion field is a motion field of a reference frame, and the resolution of the motion field residual is smaller than that of the reference frame;

generating a reconstructed motion field from the prediction motion field and the motion field residual;

up-sampling the reconstructed motion field to generate a target motion field;

and generating a prediction frame according to the target motion field.

2. The method of claim 1, wherein obtaining the prediction motion field and motion field residuals comprises:

acquiring a real motion field and the prediction motion field, wherein the real motion field is a motion field of a current frame, and the resolution of the real motion field is less than that of the current frame;

generating a motion field residual from the real motion field and the prediction motion field.

3. The method of claim 2, wherein the resolution of the real motion field is one-fourth of the resolution of the current frame, and wherein the resolution of the prediction motion field is one-fourth of the resolution of the reference frame.

4. The method of claim 1, wherein obtaining the prediction motion field and motion field residuals comprises:

and acquiring the motion field residual error from the code stream.

5. An inter-frame prediction apparatus, comprising:

6. The apparatus according to claim 5, wherein the obtaining unit is specifically configured to:

7. The apparatus of claim 6, wherein the resolution of the real motion field is one-fourth of the resolution of the current frame, and wherein the resolution of the prediction motion field is one-fourth of the resolution of the reference frame.

8. The apparatus according to claim 5, wherein the obtaining unit is specifically configured to:

and acquiring the motion field residual error from the code stream.

9. A video encoder, comprising:

an inter prediction apparatus as claimed in any one of claims 5 to 7, for generating the motion field residual;

the quantization module is used for quantizing the transformation motion field residual error and outputting the quantized transformation motion field residual error;

10. A video decoder, comprising:

an inter-prediction apparatus as claimed in claim 5, configured to generate the prediction frame from the restored motion field residuals and the prediction motion field.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the method of any one of claims 1 to 4.

12. An encoding device, characterized in that the device comprises a processor and a memory for storing a computer program, the processor being adapted to invoke and run the computer program from the memory such that the device performs the method of any of claims 1 to 3.

13. A decoding device, characterized in that the device comprises a processor and a memory, the memory being adapted to store a computer program, the processor being adapted to call and run the computer program from the memory, so that the device performs the method of claim 1 or 4.