CN111314698A

CN111314698A - Image coding processing method and device

Info

Publication number: CN111314698A
Application number: CN202010125221.7A
Authority: CN
Inventors: 亢润龙; 陆金刚; 方伟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2020-06-19

Abstract

The invention provides an image coding processing method and device, wherein the method comprises the following steps: carrying out down-sampling processing on an original image frame to obtain a target image frame; determining a target residual error of an image block of the target image frame; inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter corresponding to the target residual error and output by the target residual error network model, and determining the quantization parameter with the probability greater than a preset threshold value as a target quantization parameter; generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding position of the image block and the original image frame; the image coding is carried out on the original resolution image and the quantization parameter table, so that the problem that errors are generated in subsequent image coding due to the fact that the optimal quantization parameter is determined by training a neural network regressor to map a plurality of characteristics of texture information of the extracted image block and the determined optimal quantization parameter is inaccurate in the related technology can be solved.

Description

Image coding processing method and device

Technical Field

The present invention relates to the field of video processing, and in particular, to a method and an apparatus for processing image coding.

Background

Nowadays, high-resolution and high-frame-rate videos are increasingly popularized, and the increase of storage capacity and transmission bandwidth cannot meet the requirements of people on storing and transmitting high-resolution videos all the time. The promotion and innovation of video coding technology has not been stopped all the time.

Video coding is to reduce the consumption of the number of coded bits as much as possible while ensuring acceptable image quality for the human eye. In the encoding process, the characteristics of different encoding blocks, such as texture, shape, and the like, are different, and the encoding modes of adjacent blocks and adjacent frames are also different, so that it is difficult to ensure that the image quality of different encoding blocks is optimal. The bit number distribution and QP calculation of the coding block are based on the actual coding bit number of the coded reconstructed image and the coded block. Although the method can control the code rate within a certain quality range, the bit number allocation and the QP calculation lack consideration on information such as an actual coding mode, residual error and the like of an encoder, and an optimal quantization parameter is not calculated. Although the current encoder can reduce or increase the number of bits on some local coding blocks by means of the characteristic information such as texture, color and the like, the number of coding bits of the whole frame image is difficult to reasonably distribute.

In the related art, a neural network regressor is trained to map a plurality of extracted features to determine an optimal quantization parameter by extracting the plurality of features for capturing texture information of an image block, and the image block is encoded by using the determined optimal quantization parameter.

Aiming at the problem that the optimal quantization parameter is determined by training a neural network regressor to map a plurality of characteristics of texture information of the extracted image block in the related art, and the determined optimal quantization parameter is inaccurate, so that errors are generated in subsequent image coding, no solution is provided.

Disclosure of Invention

The embodiment of the invention provides an image coding processing method and device, which are used for at least solving the problem that errors are generated in subsequent image coding due to the fact that an optimal quantization parameter is determined by training a neural network regressor to map a plurality of characteristics of texture information of an extracted image block and the determined optimal quantization parameter is inaccurate in the related art.

According to an embodiment of the present invention, there is provided an image encoding processing method including:

carrying out down-sampling processing on an original image frame to obtain a target image frame;

determining a target residual error of an image block of the target image frame;

inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability larger than a preset threshold value is determined as a target quantization parameter;

generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding positions of the image blocks and the original image frame;

and carrying out image coding on the original resolution image and the quantization parameter table.

Optionally, before down-sampling the original image frame to obtain the target image frame, the method further includes:

acquiring a preset number of image frames and quantization parameters actually corresponding to the image frames;

and training an original residual error network model by using the preset number of image frames and the quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input into the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.

Optionally, determining a target residual of an image block of the target image frame comprises:

if the target image frame is an intra-frame prediction coding frame, determining a plurality of residual errors of each image block in the target image frame, and determining the minimum residual error in the residual errors as the target residual error;

if the target image frame is an inter-frame prediction coding frame, determining a space domain residual error and a time domain residual error of each image block in the target image frame, and determining a smaller residual error in the space domain residual error and the time domain residual error as the target residual error.

Optionally, determining a plurality of residuals for each image block in the target image frame, and determining a minimum residual of the plurality of spatial residuals as the target residual comprises:

determining a residual of the target image frame in an intra mode, wherein the intra mode comprises: DC mode, planar mode, horizontal mode, vertical mode;

determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode;

and determining a residual error corresponding to the optimal intra mode as the target residual error.

Optionally, determining the spatial residual and the temporal residual of each image block in the target image frame comprises:

determining a residual of the target image frame in an intra mode, wherein the intra mode comprises: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual;

determining a target matching block of a reference frame of the target image frame, determining the position of the target matching block as the position of the minimum residual, and determining the residual corresponding to the position of the minimum residual as the time domain residual.

Optionally, determining a target matching block of a reference frame of the target image frame comprises:

performing motion estimation on each image block in the target image frame by one of the following methods: diamond search, hexagonal search, full search, logarithmic search;

and determining the best matching block obtained by motion estimation as the target matching block.

Optionally, before determining the target residuals for the image blocks of the target image frame, the method further comprises:

and dividing the target image frame into the image blocks with preset sizes according to the speed and quality of coding.

Optionally, before generating the quantization parameter table of the original image frame according to the target quantization parameter at the corresponding position of the image block and the original image frame, the method further includes:

the quantization parameter is extended to a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.

According to another embodiment of the present invention, there is also provided an image encoding processing apparatus including:

the down-sampling module is used for performing down-sampling processing on the original image frame to obtain a target image frame;

a determining module for determining a target residual of an image block of the target image frame;

the input module is used for inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;

the generating module is used for generating the target quantization parameter into a quantization parameter table of the original image frame according to the corresponding positions of the image blocks and the original image frame;

and the image coding module is used for carrying out image coding on the original resolution image and the quantization parameter table.

Optionally, the apparatus further comprises:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a preset number of image frames and quantization parameters actually corresponding to the image frames;

the training module is used for training an original residual error network model by using the preset number of image frames and quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input to the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.

Optionally, the determining module includes:

the first determining sub-module is used for determining a plurality of residual errors of each image block in the target image frame if the target image frame is an intra-frame prediction coding frame, and determining the minimum residual error in the plurality of residual errors as the target residual error;

and the second determining submodule is used for determining a space domain residual error and a time domain residual error of each image block in the target image frame if the target image frame is an inter-frame prediction coding frame, and determining a smaller residual error in the space domain residual error and the time domain residual error as the target residual error.

Optionally, the first determining sub-module includes:

a first determining unit, configured to determine a residual of the target image frame in an intra mode, wherein the intra mode includes: DC mode, planar mode, horizontal mode, vertical mode;

the second determining unit is used for determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode;

and a third determining unit, configured to determine a residual corresponding to the optimal intra mode as the target residual.

Optionally, the second determining sub-module includes:

a fourth determining unit, configured to determine a residual of the target image frame in an intra mode, where the intra mode includes: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual;

a fifth determining unit, configured to determine a target matching block of a reference frame of the target image frame, determine a position of a minimum residual error from the position of the target matching block, and determine a residual error corresponding to the position of the minimum residual error as the time-domain residual error.

Optionally, the fifth determining unit is further configured to

Optionally, the apparatus further comprises:

and the dividing module is used for dividing the target image frame into the image blocks with preset sizes according to the speed and the quality of coding.

Optionally, the apparatus further comprises:

an extension module for extending the quantization parameter into a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.

According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the invention, the original image frame is subjected to down-sampling processing to obtain a target image frame; determining a target residual error of an image block of the target image frame; inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability larger than a preset threshold value is determined as a target quantization parameter; generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding positions of the image blocks and the original image frame; the method can solve the problem that errors are generated in subsequent image coding due to the fact that the optimal quantization parameters are determined by training a neural network regressor to map a plurality of characteristics of texture information of the extracted image blocks so as to determine the optimal quantization parameters, the determined optimal quantization parameters are inaccurate, the optimal quantization parameters are selected for coding the image blocks, the optimal coding bit number is distributed to each coding unit, and the total distortion after video coding is minimum.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a mobile terminal of an image encoding processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of image encoding processing according to an embodiment of the present invention;

FIG. 3 is a flow diagram of a neural network-based encoding method according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a residual network model according to an embodiment of the invention;

FIG. 5 is a block diagram of a video encoding device according to an embodiment of the present invention;

fig. 6 is a block diagram of an image encoding processing apparatus according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a hardware structure block diagram of a mobile terminal of an image coding processing method according to an embodiment of the present invention, as shown in fig. 1, a mobile terminal 10 may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the message receiving method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, an image encoding processing method operating in the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of an image encoding processing method according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:

step S202, carrying out downsampling processing on an original image frame to obtain a target image frame;

step S204, determining a target residual error of an image block of the target image frame;

further, if the target image frame is an intra-frame prediction coding frame, determining a plurality of residuals of each image block in the target image frame, and determining a minimum residual among the plurality of residuals as the target residual; specifically, a residual error of the target image frame in an intra-frame mode is determined, wherein the intra-frame mode includes: DC mode, planar mode, horizontal mode, vertical mode; determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode; and determining a residual error corresponding to the optimal intra mode as the target residual error.

If the target image frame is an inter-frame prediction coding frame, determining a spatial domain residual error and a time domain residual error of each image block in the target image frame, and further determining a residual error of the target image frame in an intra-frame mode, wherein the intra-frame mode comprises: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual; determining a target matching block of a reference frame of the target image frame, determining the position of the target matching block as the position of the minimum residual, and determining the residual corresponding to the position of the minimum residual as the time domain residual. Specifically, each image block in the target image frame is subjected to motion estimation through one of the following modes: diamond search, hexagonal search, full search, logarithmic search; determining the best matching block obtained by motion estimation as the target matching block; and then determining the smaller residual error of the space domain residual error and the time domain residual error as the target residual error.

Step S206, inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;

step S208, generating a quantization parameter table of the original image frame according to the target quantization parameter at the corresponding position of the image block and the original image frame;

step S210, performing image coding on the original resolution image and the quantization parameter table.

Through the steps S202 to S210, the problem that errors are generated in subsequent image coding due to inaccurate determined optimal quantization parameters determined by training a neural network regressor to map multiple features of texture information of the extracted image block to determine the optimal quantization parameters in the related art can be solved, the optimal quantization parameters are selected for coding the image block, and an optimal coding bit number is allocated to each coding unit, so that the total distortion after video coding is minimum.

Optionally, before downsampling an original image frame to obtain a target image frame, acquiring a predetermined number of image frames and quantization parameters corresponding to the image frames; and training an original residual error network model by using the preset number of image frames and the quantization parameters actually corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input into the original residual error network model, and the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.

In the embodiment of the invention, before determining the target residual of the image blocks of the target image frame, the target image frame is divided into the image blocks with the preset sizes according to the speed and quality of coding.

Optionally, before the target quantization parameter is generated into the quantization parameter table of the original image frame according to the corresponding position of the image block and the original image frame, the quantization parameter is expanded into a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.

Fig. 3 is a flowchart of an encoding method based on a neural network according to an embodiment of the present invention, as shown in fig. 3, the specific steps are as follows:

step S301, receiving a video sequence set, and completing downsampling operation in sequence from a first image frame. In particular, the down-sampling method used may include, but is not limited to: nearest neighbor interpolation, bilinear interpolation, mean interpolation, median interpolation and the like. And processing the Y component of the image frame by using any one of the down-sampling algorithms.

Step S302, each frame of Y component is sequentially downsampled according to the coding sequence, and the downsampled image frame is divided into blocks with the same size. The block size may be any one of 4 × 4, 8 × 8, 16 × 16, 32 × 32. Which block division mode is specifically adopted is further determined according to the performance and quality of coding. Step S303 is performed for each image block in turn in a raster scan order according to a predetermined block size.

Step S303, image block coding DC, planar, horizontal, and vertical modes, selecting one mode with the minimum distortion, and using the residual error as the input of the depth residual error network. Regardless of the I frame or P frame, the image block needs to compute the residual on the spatial domain. Residual calculation methods, including but not limited to minimum Mean Square Error (MSE), minimum mean absolute error (MAD), Sum of Absolute Difference (SAD), are chosen depending on the coding speed and coding quality. For each image block, residual values of DC mode, planar mode, horizontal mode, and vertical mode are calculated. Specifically, the use of these four intra prediction modes may be included, but not limited to, and the present proposal is explained in 4 modes. One mode with the smallest residual error is selected from the 4 modes as the optimal intra mode of the image block.

Step S304, judging whether the frame type of the current frame is a P frame or a B frame according to the GOP structure, if so, taking a residual value corresponding to the optimal intra-frame mode of the image block in the step S303 as the input of a depth residual network, and executing the step S307; otherwise, step S305 is performed.

Step S305, if the current frame is a P frame and a B frame, motion estimation is carried out on the image block, the search position with the minimum distortion is selected as the optimal mode between frames, and the best matching block on the reference frame is searched. The reference frame is obtained by down-sampling the reconstructed image, and the down-sampling method is the same as the method in step S301. In particular, the motion estimation methods used may include, but are not limited to: diamond search, hexagonal search, full search, logarithmic search, and the like. And (3) by utilizing any search method, completing the operation of motion estimation on the image block to obtain the position of the best matching block, namely the position of the minimum residual error, and using the position as the inter-frame optimal mode of the image block.

Step S306, select the mode with the minimum distortion from the intra-frame optimal mode and the inter-frame optimal mode, and use it as the input of the depth residual error network. And comparing the residual value of the optimal intra mode calculated in the step S303 and the residual value of the optimal inter mode calculated in the step S305 with each other to obtain the minimum residual value, and using the minimum residual value as the optimal encoding mode of the image block. And taking the residual value corresponding to the optimal coding mode of the image block as the input of the depth residual error network.

Based on the trained depth residual error network, the optimal quantization parameter ① is obtained through prediction of a network model by the residual error information input in step S304 or step S306. in order to avoid the situation that the quantization parameter is too large or too small, the proposal utilizes a code rate control algorithm to calculate the predicted quantization parameter ② of a frame of image, and then limits the quantization parameter ① to be within an interval range which can not be exceeded, wherein the interval range is [ quantization parameter ② -5, quantization parameter ② +5 ]. the MSE has high calculation precision and large calculation amount, and the obtained quantization parameter is more accurate as the input of the residual error network, and the interval range of the limited quantization parameter can be properly widened.

Step S308, judging whether the image block is the last image block of the image frame, if so, executing step S309; otherwise, return to execute step 303.

Step S309, sending the QpMap of the image frame to the encoder, that is, synthesizing the quantization parameters obtained by the above calculation into the QpMap according to the block sequence divided in step S302, and sending the QpMap to the encoder.

The training of the traditional neural network becomes more difficult as the network becomes deeper and deeper, and the optimization of the network becomes more and more difficult. The reason is that the deeper the network is, the more things the network needs to learn is, the slower the convergence speed is, the more and more obvious the gradient disappearance phenomenon is, and the training effect is not as good as that of the relatively shallow network. Fig. 4 is a schematic diagram of a residual error network model according to an embodiment of the present invention, and as shown in fig. 4, the training model adopted in the embodiment of the present invention is a residual error neural network, which can effectively solve the above problems, and the deeper the residual error network, the better the training effect will be. When the optimal performance is achieved, the redundant network layers are subjected to identity mapping. Therefore, the present proposal employs a residual neural network as a training model.

In the encoder, the functional relationship between the image residual D and the quantization parameter QP may be expressed as,

d ═ f (QP, w), where w is a weighting coefficient. Specifically, the relationship between the image residual and the quantization parameter QP is as follows:

the rate distortion optimization in video coding is to obtain a group of optimal coding parameters under the condition of meeting the limit of coding bit number, and the optimal reconstructed video quality can be obtained by using the parameters. Minimizing the rate-distortion cost function:

wherein D is_iIs the residual of the current image, R_iIs the number of coded bits and λ is the lagrange factor.

And R is_iAnd lambda memoryIn the following relationship:

wherein, α_iAnd ω_iIs related to the content of the video, and ω_iIn relation to the temporal prediction structure of the video, λ is related to the number of coded bits.

The JCT-VC proposal refers to the relationship between Lagrange factor lambda and quantization parameters, and is applied to an HM model. The functional formula is as follows:

λ＝α*ω_k*2^{((QP-12)/3.0)}(3)

according to the above formulas (1), (2) and (3), it can be determined that the image residual D and the quantization parameter QP have a functional relationship: d ═ f (QP, w), where w is the weight value.

The variable video content and the complex encoding algorithm make it impossible to accurately obtain the relevant parameters in practice. In practical application, the code rate control algorithm allocates target bit numbers to different coding units according to a target code rate. And determining quantization parameters of different coding units according to a relation model of R and lambda and QP. Such a method is difficult to allocate an optimal number of coding bits to each coding unit, and can only control fluctuation of image quality within a certain range.

Because the weight of each frame image is related to the video content characteristics, the time domain prediction structure and the coding bit number, the complex function mapping relation is difficult to achieve the optimal solution only by relying on the traditional code rate control algorithm. By training with the help of the neural network, a more ideal training model can be obtained, thereby providing the best quantization parameter for the coding of the image block.

And aiming at the ClassA-ClassE sequences, all video sequences are coded by adopting the optimal parameters, so that the optimal image quality is obtained under the limited code rate. And extracting residual information D of the coding sequences and corresponding quantization parameters QP to generate a training set of the residual neural network. And based on the trained residual error neural network model, predicting the optimal quantization parameter of the image block of the input image frame of the encoder, and encoding the image block by using the optimal quantization parameter.

Fig. 5 is a block diagram of a video encoding apparatus according to an embodiment of the present invention, as shown in fig. 5, including: downsampler 52, residual generator 54, network trainer 56, transformer 58, primary encoder 510, wherein,

the down sampler 52 is configured to receive the original resolution image, perform down sampling on the original resolution image, obtain a low resolution image, and send the low resolution image to the residual error generator.

And a residual generator 54 for sequentially calculating residual information for the low resolution image frames by image blocks. And each image block needs to calculate the residual information of the space domain and the time domain, a mode with smaller residual is selected as the optimal prediction mode of the image block, and the residual information in the optimal mode is sent to a network trainer.

The network trainer 56 predicts the optimal quantization parameter of the image block according to the residual information generated by the residual generator based on the trained target neural network training model, and sends the generated optimal quantization parameter to the converter.

And a converter 58, configured to generate a QpMap table of the original image frame according to the corresponding position of the original resolution image by using the quantization parameter generated by the network trainer 56, and send the QpMap table to the host encoder.

The main encoder 510 completes image encoding with respect to the input original resolution image and the corresponding QpMap table, and generates information such as encoding-related parameters and code streams.

Example 2

According to another embodiment of the present invention, there is also provided an image encoding processing apparatus, and fig. 6 is a block diagram of the image encoding processing apparatus according to the embodiment of the present invention, as shown in fig. 6, including:

a down-sampling module 62, configured to perform down-sampling on an original image frame to obtain a target image frame;

a determining module 64, configured to determine a target residual of an image block of the target image frame;

an input module 66, configured to input the target residual into a pre-trained target residual network model, so as to obtain a probability of each quantization parameter output by the target residual network model and corresponding to the target residual, where a quantization parameter with the probability greater than a predetermined threshold is determined as a target quantization parameter;

a generating module 68, configured to generate a quantization parameter table of the original image frame from the target quantization parameter according to the corresponding position of the image block and the original image frame;

an image encoding module 610, configured to perform image encoding on the original resolution image and the quantization parameter table.

Optionally, the apparatus further comprises:

the training module is used for training an original residual error network model by using the preset number of image frames and the quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input to the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.

Optionally, the determining module 64 includes:

Optionally, the first determining sub-module includes:

Optionally, the second determining sub-module includes:

Optionally, the fifth determining unit is further configured to

Optionally, the apparatus further comprises:

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, carrying out down-sampling processing on the original image frame to obtain a target image frame;

s2, determining a target residual error of the image block of the target image frame;

s3, inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;

s4, generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding position of the image block and the original image frame;

s5, image coding the original resolution image and the quantization parameter table.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Example 4

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image encoding processing method, comprising:

2. The method of claim 1, wherein before the downsampling the original image frame to obtain the target image frame, the method further comprises:

3. The method of claim 1, wherein determining a target residual for an image block of the target image frame comprises:

4. The method of claim 3, wherein determining a plurality of residuals for each image block in the target image frame, wherein determining a smallest residual of the plurality of spatial residuals as the target residual comprises:

5. The method of claim 3, wherein determining the spatial and temporal residuals for each image block in the target image frame comprises:

6. The method of claim 5, wherein the determining a target matching block for a reference frame of the target image frame comprises:

7. The method according to any of claims 1 to 6, wherein prior to said determining a target residual for an image block of said target image frame, said method further comprises:

8. The method according to any one of claims 1 to 6, wherein before generating the target quantization parameter table for the original image frame according to the corresponding positions of the image blocks and the original image frame, the method further comprises:

9. An image encoding processing apparatus characterized by comprising:

10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 8 when executed.

11. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 8.