CN111314698A - Image coding processing method and device - Google Patents

Image coding processing method and device Download PDF

Info

Publication number
CN111314698A
CN111314698A CN202010125221.7A CN202010125221A CN111314698A CN 111314698 A CN111314698 A CN 111314698A CN 202010125221 A CN202010125221 A CN 202010125221A CN 111314698 A CN111314698 A CN 111314698A
Authority
CN
China
Prior art keywords
target
image
residual error
determining
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010125221.7A
Other languages
Chinese (zh)
Inventor
亢润龙
陆金刚
方伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010125221.7A priority Critical patent/CN111314698A/en
Publication of CN111314698A publication Critical patent/CN111314698A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction

Abstract

The invention provides an image coding processing method and device, wherein the method comprises the following steps: carrying out down-sampling processing on an original image frame to obtain a target image frame; determining a target residual error of an image block of the target image frame; inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter corresponding to the target residual error and output by the target residual error network model, and determining the quantization parameter with the probability greater than a preset threshold value as a target quantization parameter; generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding position of the image block and the original image frame; the image coding is carried out on the original resolution image and the quantization parameter table, so that the problem that errors are generated in subsequent image coding due to the fact that the optimal quantization parameter is determined by training a neural network regressor to map a plurality of characteristics of texture information of the extracted image block and the determined optimal quantization parameter is inaccurate in the related technology can be solved.

Description

Image coding processing method and device
Technical Field
The present invention relates to the field of video processing, and in particular, to a method and an apparatus for processing image coding.
Background
Nowadays, high-resolution and high-frame-rate videos are increasingly popularized, and the increase of storage capacity and transmission bandwidth cannot meet the requirements of people on storing and transmitting high-resolution videos all the time. The promotion and innovation of video coding technology has not been stopped all the time.
Video coding is to reduce the consumption of the number of coded bits as much as possible while ensuring acceptable image quality for the human eye. In the encoding process, the characteristics of different encoding blocks, such as texture, shape, and the like, are different, and the encoding modes of adjacent blocks and adjacent frames are also different, so that it is difficult to ensure that the image quality of different encoding blocks is optimal. The bit number distribution and QP calculation of the coding block are based on the actual coding bit number of the coded reconstructed image and the coded block. Although the method can control the code rate within a certain quality range, the bit number allocation and the QP calculation lack consideration on information such as an actual coding mode, residual error and the like of an encoder, and an optimal quantization parameter is not calculated. Although the current encoder can reduce or increase the number of bits on some local coding blocks by means of the characteristic information such as texture, color and the like, the number of coding bits of the whole frame image is difficult to reasonably distribute.
In the related art, a neural network regressor is trained to map a plurality of extracted features to determine an optimal quantization parameter by extracting the plurality of features for capturing texture information of an image block, and the image block is encoded by using the determined optimal quantization parameter.
Aiming at the problem that the optimal quantization parameter is determined by training a neural network regressor to map a plurality of characteristics of texture information of the extracted image block in the related art, and the determined optimal quantization parameter is inaccurate, so that errors are generated in subsequent image coding, no solution is provided.
Disclosure of Invention
The embodiment of the invention provides an image coding processing method and device, which are used for at least solving the problem that errors are generated in subsequent image coding due to the fact that an optimal quantization parameter is determined by training a neural network regressor to map a plurality of characteristics of texture information of an extracted image block and the determined optimal quantization parameter is inaccurate in the related art.
According to an embodiment of the present invention, there is provided an image encoding processing method including:
carrying out down-sampling processing on an original image frame to obtain a target image frame;
determining a target residual error of an image block of the target image frame;
inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability larger than a preset threshold value is determined as a target quantization parameter;
generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding positions of the image blocks and the original image frame;
and carrying out image coding on the original resolution image and the quantization parameter table.
Optionally, before down-sampling the original image frame to obtain the target image frame, the method further includes:
acquiring a preset number of image frames and quantization parameters actually corresponding to the image frames;
and training an original residual error network model by using the preset number of image frames and the quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input into the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.
Optionally, determining a target residual of an image block of the target image frame comprises:
if the target image frame is an intra-frame prediction coding frame, determining a plurality of residual errors of each image block in the target image frame, and determining the minimum residual error in the residual errors as the target residual error;
if the target image frame is an inter-frame prediction coding frame, determining a space domain residual error and a time domain residual error of each image block in the target image frame, and determining a smaller residual error in the space domain residual error and the time domain residual error as the target residual error.
Optionally, determining a plurality of residuals for each image block in the target image frame, and determining a minimum residual of the plurality of spatial residuals as the target residual comprises:
determining a residual of the target image frame in an intra mode, wherein the intra mode comprises: DC mode, planar mode, horizontal mode, vertical mode;
determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode;
and determining a residual error corresponding to the optimal intra mode as the target residual error.
Optionally, determining the spatial residual and the temporal residual of each image block in the target image frame comprises:
determining a residual of the target image frame in an intra mode, wherein the intra mode comprises: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual;
determining a target matching block of a reference frame of the target image frame, determining the position of the target matching block as the position of the minimum residual, and determining the residual corresponding to the position of the minimum residual as the time domain residual.
Optionally, determining a target matching block of a reference frame of the target image frame comprises:
performing motion estimation on each image block in the target image frame by one of the following methods: diamond search, hexagonal search, full search, logarithmic search;
and determining the best matching block obtained by motion estimation as the target matching block.
Optionally, before determining the target residuals for the image blocks of the target image frame, the method further comprises:
and dividing the target image frame into the image blocks with preset sizes according to the speed and quality of coding.
Optionally, before generating the quantization parameter table of the original image frame according to the target quantization parameter at the corresponding position of the image block and the original image frame, the method further includes:
the quantization parameter is extended to a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.
According to another embodiment of the present invention, there is also provided an image encoding processing apparatus including:
the down-sampling module is used for performing down-sampling processing on the original image frame to obtain a target image frame;
a determining module for determining a target residual of an image block of the target image frame;
the input module is used for inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;
the generating module is used for generating the target quantization parameter into a quantization parameter table of the original image frame according to the corresponding positions of the image blocks and the original image frame;
and the image coding module is used for carrying out image coding on the original resolution image and the quantization parameter table.
Optionally, the apparatus further comprises:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a preset number of image frames and quantization parameters actually corresponding to the image frames;
the training module is used for training an original residual error network model by using the preset number of image frames and quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input to the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.
Optionally, the determining module includes:
the first determining sub-module is used for determining a plurality of residual errors of each image block in the target image frame if the target image frame is an intra-frame prediction coding frame, and determining the minimum residual error in the plurality of residual errors as the target residual error;
and the second determining submodule is used for determining a space domain residual error and a time domain residual error of each image block in the target image frame if the target image frame is an inter-frame prediction coding frame, and determining a smaller residual error in the space domain residual error and the time domain residual error as the target residual error.
Optionally, the first determining sub-module includes:
a first determining unit, configured to determine a residual of the target image frame in an intra mode, wherein the intra mode includes: DC mode, planar mode, horizontal mode, vertical mode;
the second determining unit is used for determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode;
and a third determining unit, configured to determine a residual corresponding to the optimal intra mode as the target residual.
Optionally, the second determining sub-module includes:
a fourth determining unit, configured to determine a residual of the target image frame in an intra mode, where the intra mode includes: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual;
a fifth determining unit, configured to determine a target matching block of a reference frame of the target image frame, determine a position of a minimum residual error from the position of the target matching block, and determine a residual error corresponding to the position of the minimum residual error as the time-domain residual error.
Optionally, the fifth determining unit is further configured to
Performing motion estimation on each image block in the target image frame by one of the following methods: diamond search, hexagonal search, full search, logarithmic search;
and determining the best matching block obtained by motion estimation as the target matching block.
Optionally, the apparatus further comprises:
and the dividing module is used for dividing the target image frame into the image blocks with preset sizes according to the speed and the quality of coding.
Optionally, the apparatus further comprises:
an extension module for extending the quantization parameter into a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.
According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the invention, the original image frame is subjected to down-sampling processing to obtain a target image frame; determining a target residual error of an image block of the target image frame; inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability larger than a preset threshold value is determined as a target quantization parameter; generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding positions of the image blocks and the original image frame; the method can solve the problem that errors are generated in subsequent image coding due to the fact that the optimal quantization parameters are determined by training a neural network regressor to map a plurality of characteristics of texture information of the extracted image blocks so as to determine the optimal quantization parameters, the determined optimal quantization parameters are inaccurate, the optimal quantization parameters are selected for coding the image blocks, the optimal coding bit number is distributed to each coding unit, and the total distortion after video coding is minimum.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a mobile terminal of an image encoding processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method of image encoding processing according to an embodiment of the present invention;
FIG. 3 is a flow diagram of a neural network-based encoding method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a residual network model according to an embodiment of the invention;
FIG. 5 is a block diagram of a video encoding device according to an embodiment of the present invention;
fig. 6 is a block diagram of an image encoding processing apparatus according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a hardware structure block diagram of a mobile terminal of an image coding processing method according to an embodiment of the present invention, as shown in fig. 1, a mobile terminal 10 may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the message receiving method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, an image encoding processing method operating in the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of an image encoding processing method according to an embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, carrying out downsampling processing on an original image frame to obtain a target image frame;
step S204, determining a target residual error of an image block of the target image frame;
further, if the target image frame is an intra-frame prediction coding frame, determining a plurality of residuals of each image block in the target image frame, and determining a minimum residual among the plurality of residuals as the target residual; specifically, a residual error of the target image frame in an intra-frame mode is determined, wherein the intra-frame mode includes: DC mode, planar mode, horizontal mode, vertical mode; determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode; and determining a residual error corresponding to the optimal intra mode as the target residual error.
If the target image frame is an inter-frame prediction coding frame, determining a spatial domain residual error and a time domain residual error of each image block in the target image frame, and further determining a residual error of the target image frame in an intra-frame mode, wherein the intra-frame mode comprises: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual; determining a target matching block of a reference frame of the target image frame, determining the position of the target matching block as the position of the minimum residual, and determining the residual corresponding to the position of the minimum residual as the time domain residual. Specifically, each image block in the target image frame is subjected to motion estimation through one of the following modes: diamond search, hexagonal search, full search, logarithmic search; determining the best matching block obtained by motion estimation as the target matching block; and then determining the smaller residual error of the space domain residual error and the time domain residual error as the target residual error.
Step S206, inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;
step S208, generating a quantization parameter table of the original image frame according to the target quantization parameter at the corresponding position of the image block and the original image frame;
step S210, performing image coding on the original resolution image and the quantization parameter table.
Through the steps S202 to S210, the problem that errors are generated in subsequent image coding due to inaccurate determined optimal quantization parameters determined by training a neural network regressor to map multiple features of texture information of the extracted image block to determine the optimal quantization parameters in the related art can be solved, the optimal quantization parameters are selected for coding the image block, and an optimal coding bit number is allocated to each coding unit, so that the total distortion after video coding is minimum.
Optionally, before downsampling an original image frame to obtain a target image frame, acquiring a predetermined number of image frames and quantization parameters corresponding to the image frames; and training an original residual error network model by using the preset number of image frames and the quantization parameters actually corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input into the original residual error network model, and the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.
In the embodiment of the invention, before determining the target residual of the image blocks of the target image frame, the target image frame is divided into the image blocks with the preset sizes according to the speed and quality of coding.
Optionally, before the target quantization parameter is generated into the quantization parameter table of the original image frame according to the corresponding position of the image block and the original image frame, the quantization parameter is expanded into a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.
Fig. 3 is a flowchart of an encoding method based on a neural network according to an embodiment of the present invention, as shown in fig. 3, the specific steps are as follows:
step S301, receiving a video sequence set, and completing downsampling operation in sequence from a first image frame. In particular, the down-sampling method used may include, but is not limited to: nearest neighbor interpolation, bilinear interpolation, mean interpolation, median interpolation and the like. And processing the Y component of the image frame by using any one of the down-sampling algorithms.
Step S302, each frame of Y component is sequentially downsampled according to the coding sequence, and the downsampled image frame is divided into blocks with the same size. The block size may be any one of 4 × 4, 8 × 8, 16 × 16, 32 × 32. Which block division mode is specifically adopted is further determined according to the performance and quality of coding. Step S303 is performed for each image block in turn in a raster scan order according to a predetermined block size.
Step S303, image block coding DC, planar, horizontal, and vertical modes, selecting one mode with the minimum distortion, and using the residual error as the input of the depth residual error network. Regardless of the I frame or P frame, the image block needs to compute the residual on the spatial domain. Residual calculation methods, including but not limited to minimum Mean Square Error (MSE), minimum mean absolute error (MAD), Sum of Absolute Difference (SAD), are chosen depending on the coding speed and coding quality. For each image block, residual values of DC mode, planar mode, horizontal mode, and vertical mode are calculated. Specifically, the use of these four intra prediction modes may be included, but not limited to, and the present proposal is explained in 4 modes. One mode with the smallest residual error is selected from the 4 modes as the optimal intra mode of the image block.
Step S304, judging whether the frame type of the current frame is a P frame or a B frame according to the GOP structure, if so, taking a residual value corresponding to the optimal intra-frame mode of the image block in the step S303 as the input of a depth residual network, and executing the step S307; otherwise, step S305 is performed.
Step S305, if the current frame is a P frame and a B frame, motion estimation is carried out on the image block, the search position with the minimum distortion is selected as the optimal mode between frames, and the best matching block on the reference frame is searched. The reference frame is obtained by down-sampling the reconstructed image, and the down-sampling method is the same as the method in step S301. In particular, the motion estimation methods used may include, but are not limited to: diamond search, hexagonal search, full search, logarithmic search, and the like. And (3) by utilizing any search method, completing the operation of motion estimation on the image block to obtain the position of the best matching block, namely the position of the minimum residual error, and using the position as the inter-frame optimal mode of the image block.
Step S306, select the mode with the minimum distortion from the intra-frame optimal mode and the inter-frame optimal mode, and use it as the input of the depth residual error network. And comparing the residual value of the optimal intra mode calculated in the step S303 and the residual value of the optimal inter mode calculated in the step S305 with each other to obtain the minimum residual value, and using the minimum residual value as the optimal encoding mode of the image block. And taking the residual value corresponding to the optimal coding mode of the image block as the input of the depth residual error network.
Based on the trained depth residual error network, the optimal quantization parameter ① is obtained through prediction of a network model by the residual error information input in step S304 or step S306. in order to avoid the situation that the quantization parameter is too large or too small, the proposal utilizes a code rate control algorithm to calculate the predicted quantization parameter ② of a frame of image, and then limits the quantization parameter ① to be within an interval range which can not be exceeded, wherein the interval range is [ quantization parameter ② -5, quantization parameter ② +5 ]. the MSE has high calculation precision and large calculation amount, and the obtained quantization parameter is more accurate as the input of the residual error network, and the interval range of the limited quantization parameter can be properly widened.
Step S308, judging whether the image block is the last image block of the image frame, if so, executing step S309; otherwise, return to execute step 303.
Step S309, sending the QpMap of the image frame to the encoder, that is, synthesizing the quantization parameters obtained by the above calculation into the QpMap according to the block sequence divided in step S302, and sending the QpMap to the encoder.
The training of the traditional neural network becomes more difficult as the network becomes deeper and deeper, and the optimization of the network becomes more and more difficult. The reason is that the deeper the network is, the more things the network needs to learn is, the slower the convergence speed is, the more and more obvious the gradient disappearance phenomenon is, and the training effect is not as good as that of the relatively shallow network. Fig. 4 is a schematic diagram of a residual error network model according to an embodiment of the present invention, and as shown in fig. 4, the training model adopted in the embodiment of the present invention is a residual error neural network, which can effectively solve the above problems, and the deeper the residual error network, the better the training effect will be. When the optimal performance is achieved, the redundant network layers are subjected to identity mapping. Therefore, the present proposal employs a residual neural network as a training model.
In the encoder, the functional relationship between the image residual D and the quantization parameter QP may be expressed as,
d ═ f (QP, w), where w is a weighting coefficient. Specifically, the relationship between the image residual and the quantization parameter QP is as follows:
the rate distortion optimization in video coding is to obtain a group of optimal coding parameters under the condition of meeting the limit of coding bit number, and the optimal reconstructed video quality can be obtained by using the parameters. Minimizing the rate-distortion cost function:
Figure BDA0002394199250000121
wherein D isiIs the residual of the current image, RiIs the number of coded bits and λ is the lagrange factor.
And R isiAnd lambda memoryIn the following relationship:
Figure BDA0002394199250000122
wherein, αiAnd ωiIs related to the content of the video, and ωiIn relation to the temporal prediction structure of the video, λ is related to the number of coded bits.
The JCT-VC proposal refers to the relationship between Lagrange factor lambda and quantization parameters, and is applied to an HM model. The functional formula is as follows:
λ=α*ωk*2((QP-12)/3.0)(3)
according to the above formulas (1), (2) and (3), it can be determined that the image residual D and the quantization parameter QP have a functional relationship: d ═ f (QP, w), where w is the weight value.
The variable video content and the complex encoding algorithm make it impossible to accurately obtain the relevant parameters in practice. In practical application, the code rate control algorithm allocates target bit numbers to different coding units according to a target code rate. And determining quantization parameters of different coding units according to a relation model of R and lambda and QP. Such a method is difficult to allocate an optimal number of coding bits to each coding unit, and can only control fluctuation of image quality within a certain range.
Because the weight of each frame image is related to the video content characteristics, the time domain prediction structure and the coding bit number, the complex function mapping relation is difficult to achieve the optimal solution only by relying on the traditional code rate control algorithm. By training with the help of the neural network, a more ideal training model can be obtained, thereby providing the best quantization parameter for the coding of the image block.
And aiming at the ClassA-ClassE sequences, all video sequences are coded by adopting the optimal parameters, so that the optimal image quality is obtained under the limited code rate. And extracting residual information D of the coding sequences and corresponding quantization parameters QP to generate a training set of the residual neural network. And based on the trained residual error neural network model, predicting the optimal quantization parameter of the image block of the input image frame of the encoder, and encoding the image block by using the optimal quantization parameter.
Fig. 5 is a block diagram of a video encoding apparatus according to an embodiment of the present invention, as shown in fig. 5, including: downsampler 52, residual generator 54, network trainer 56, transformer 58, primary encoder 510, wherein,
the down sampler 52 is configured to receive the original resolution image, perform down sampling on the original resolution image, obtain a low resolution image, and send the low resolution image to the residual error generator.
And a residual generator 54 for sequentially calculating residual information for the low resolution image frames by image blocks. And each image block needs to calculate the residual information of the space domain and the time domain, a mode with smaller residual is selected as the optimal prediction mode of the image block, and the residual information in the optimal mode is sent to a network trainer.
The network trainer 56 predicts the optimal quantization parameter of the image block according to the residual information generated by the residual generator based on the trained target neural network training model, and sends the generated optimal quantization parameter to the converter.
And a converter 58, configured to generate a QpMap table of the original image frame according to the corresponding position of the original resolution image by using the quantization parameter generated by the network trainer 56, and send the QpMap table to the host encoder.
The main encoder 510 completes image encoding with respect to the input original resolution image and the corresponding QpMap table, and generates information such as encoding-related parameters and code streams.
Example 2
According to another embodiment of the present invention, there is also provided an image encoding processing apparatus, and fig. 6 is a block diagram of the image encoding processing apparatus according to the embodiment of the present invention, as shown in fig. 6, including:
a down-sampling module 62, configured to perform down-sampling on an original image frame to obtain a target image frame;
a determining module 64, configured to determine a target residual of an image block of the target image frame;
an input module 66, configured to input the target residual into a pre-trained target residual network model, so as to obtain a probability of each quantization parameter output by the target residual network model and corresponding to the target residual, where a quantization parameter with the probability greater than a predetermined threshold is determined as a target quantization parameter;
a generating module 68, configured to generate a quantization parameter table of the original image frame from the target quantization parameter according to the corresponding position of the image block and the original image frame;
an image encoding module 610, configured to perform image encoding on the original resolution image and the quantization parameter table.
Optionally, the apparatus further comprises:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a preset number of image frames and quantization parameters actually corresponding to the image frames;
the training module is used for training an original residual error network model by using the preset number of image frames and the quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input to the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.
Optionally, the determining module 64 includes:
the first determining sub-module is used for determining a plurality of residual errors of each image block in the target image frame if the target image frame is an intra-frame prediction coding frame, and determining the minimum residual error in the plurality of residual errors as the target residual error;
and the second determining submodule is used for determining a space domain residual error and a time domain residual error of each image block in the target image frame if the target image frame is an inter-frame prediction coding frame, and determining a smaller residual error in the space domain residual error and the time domain residual error as the target residual error.
Optionally, the first determining sub-module includes:
a first determining unit, configured to determine a residual of the target image frame in an intra mode, wherein the intra mode includes: DC mode, planar mode, horizontal mode, vertical mode;
the second determining unit is used for determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode;
and a third determining unit, configured to determine a residual corresponding to the optimal intra mode as the target residual.
Optionally, the second determining sub-module includes:
a fourth determining unit, configured to determine a residual of the target image frame in an intra mode, where the intra mode includes: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual;
a fifth determining unit, configured to determine a target matching block of a reference frame of the target image frame, determine a position of a minimum residual error from the position of the target matching block, and determine a residual error corresponding to the position of the minimum residual error as the time-domain residual error.
Optionally, the fifth determining unit is further configured to
Performing motion estimation on each image block in the target image frame by one of the following methods: diamond search, hexagonal search, full search, logarithmic search;
and determining the best matching block obtained by motion estimation as the target matching block.
Optionally, the apparatus further comprises:
and the dividing module is used for dividing the target image frame into the image blocks with preset sizes according to the speed and the quality of coding.
Optionally, the apparatus further comprises:
an extension module for extending the quantization parameter into a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, carrying out down-sampling processing on the original image frame to obtain a target image frame;
s2, determining a target residual error of the image block of the target image frame;
s3, inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;
s4, generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding position of the image block and the original image frame;
s5, image coding the original resolution image and the quantization parameter table.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Example 4
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, carrying out down-sampling processing on the original image frame to obtain a target image frame;
s2, determining a target residual error of the image block of the target image frame;
s3, inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;
s4, generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding position of the image block and the original image frame;
s5, image coding the original resolution image and the quantization parameter table.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. An image encoding processing method, comprising:
carrying out down-sampling processing on an original image frame to obtain a target image frame;
determining a target residual error of an image block of the target image frame;
inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability larger than a preset threshold value is determined as a target quantization parameter;
generating a quantization parameter table of the original image frame by the target quantization parameter according to the corresponding positions of the image blocks and the original image frame;
and carrying out image coding on the original resolution image and the quantization parameter table.
2. The method of claim 1, wherein before the downsampling the original image frame to obtain the target image frame, the method further comprises:
acquiring a preset number of image frames and quantization parameters actually corresponding to the image frames;
and training an original residual error network model by using the preset number of image frames and the quantization parameters corresponding to the image frames to obtain the target residual error network model, wherein the preset number of image frames are input into the original residual error network model, the trained target residual error network model outputs target quantization parameters corresponding to the target residual error, and the target quantization parameters and the quantization parameters actually corresponding to the target residual error meet a preset target function.
3. The method of claim 1, wherein determining a target residual for an image block of the target image frame comprises:
if the target image frame is an intra-frame prediction coding frame, determining a plurality of residual errors of each image block in the target image frame, and determining the minimum residual error in the residual errors as the target residual error;
if the target image frame is an inter-frame prediction coding frame, determining a space domain residual error and a time domain residual error of each image block in the target image frame, and determining a smaller residual error in the space domain residual error and the time domain residual error as the target residual error.
4. The method of claim 3, wherein determining a plurality of residuals for each image block in the target image frame, wherein determining a smallest residual of the plurality of spatial residuals as the target residual comprises:
determining a residual of the target image frame in an intra mode, wherein the intra mode comprises: DC mode, planar mode, horizontal mode, vertical mode;
determining the intra-frame mode corresponding to the minimum residual error as the optimal intra-frame mode;
and determining a residual error corresponding to the optimal intra mode as the target residual error.
5. The method of claim 3, wherein determining the spatial and temporal residuals for each image block in the target image frame comprises:
determining a residual of the target image frame in an intra mode, wherein the intra mode comprises: DC mode, planar mode, horizontal mode, vertical mode; determining an intra-frame mode corresponding to the minimum residual value as an optimal intra-frame mode, and determining a residual corresponding to the optimal intra-frame mode as the spatial domain residual;
determining a target matching block of a reference frame of the target image frame, determining the position of the target matching block as the position of the minimum residual, and determining the residual corresponding to the position of the minimum residual as the time domain residual.
6. The method of claim 5, wherein the determining a target matching block for a reference frame of the target image frame comprises:
performing motion estimation on each image block in the target image frame by one of the following methods: diamond search, hexagonal search, full search, logarithmic search;
and determining the best matching block obtained by motion estimation as the target matching block.
7. The method according to any of claims 1 to 6, wherein prior to said determining a target residual for an image block of said target image frame, said method further comprises:
and dividing the target image frame into the image blocks with preset sizes according to the speed and quality of coding.
8. The method according to any one of claims 1 to 6, wherein before generating the target quantization parameter table for the original image frame according to the corresponding positions of the image blocks and the original image frame, the method further comprises:
the quantization parameter is extended to a quantization parameter interval by increasing and decreasing the quantization parameter by a predetermined value.
9. An image encoding processing apparatus characterized by comprising:
the down-sampling module is used for performing down-sampling processing on the original image frame to obtain a target image frame;
a determining module for determining a target residual of an image block of the target image frame;
the input module is used for inputting the target residual error into a pre-trained target residual error network model to obtain the probability of each quantization parameter output by the target residual error network model and corresponding to the target residual error, wherein the quantization parameter with the probability greater than a preset threshold value is determined as a target quantization parameter;
the generating module is used for generating the target quantization parameter into a quantization parameter table of the original image frame according to the corresponding positions of the image blocks and the original image frame;
and the image coding module is used for carrying out image coding on the original resolution image and the quantization parameter table.
10. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 8 when executed.
11. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 8.
CN202010125221.7A 2020-02-27 2020-02-27 Image coding processing method and device Withdrawn CN111314698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010125221.7A CN111314698A (en) 2020-02-27 2020-02-27 Image coding processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010125221.7A CN111314698A (en) 2020-02-27 2020-02-27 Image coding processing method and device

Publications (1)

Publication Number Publication Date
CN111314698A true CN111314698A (en) 2020-06-19

Family

ID=71147848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010125221.7A Withdrawn CN111314698A (en) 2020-02-27 2020-02-27 Image coding processing method and device

Country Status (1)

Country Link
CN (1) CN111314698A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726613A (en) * 2020-06-30 2020-09-29 福州大学 Video coding optimization method based on just noticeable difference
CN112001854A (en) * 2020-07-14 2020-11-27 浙江大华技术股份有限公司 Method for repairing coded image and related system and device
CN112149266A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining network model quantization strategy
CN113329228A (en) * 2021-05-27 2021-08-31 杭州朗和科技有限公司 Video encoding method, decoding method, device, electronic device and storage medium
CN114745556A (en) * 2022-02-07 2022-07-12 浙江智慧视频安防创新中心有限公司 Encoding method, encoding device, digital video film system, electronic device, and storage medium
WO2022194137A1 (en) * 2021-03-17 2022-09-22 华为技术有限公司 Video image encoding method, video image decoding method and related devices
WO2022222767A1 (en) * 2021-04-21 2022-10-27 北京汇钧科技有限公司 Data processing method and apparatus
CN116760988A (en) * 2023-08-18 2023-09-15 瀚博半导体(上海)有限公司 Video coding method and device based on human visual system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107396124A (en) * 2017-08-29 2017-11-24 南京大学 Video-frequency compression method based on deep neural network
CN108780499A (en) * 2016-03-09 2018-11-09 索尼公司 The system and method for video processing based on quantization parameter
CN109819252A (en) * 2019-03-20 2019-05-28 福州大学 A kind of quantization parameter Cascading Methods not depending on gop structure
US20190230354A1 (en) * 2016-06-24 2019-07-25 Korea Advanced Institute Of Science And Technology Encoding and decoding methods and devices including cnn-based in-loop filter
CN110198444A (en) * 2019-04-16 2019-09-03 浙江大华技术股份有限公司 Video frame coding method, coding video frames equipment and the device with store function
CN110637460A (en) * 2017-07-11 2019-12-31 索尼公司 Visual quality preserving quantitative parameter prediction using deep neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108780499A (en) * 2016-03-09 2018-11-09 索尼公司 The system and method for video processing based on quantization parameter
US20190230354A1 (en) * 2016-06-24 2019-07-25 Korea Advanced Institute Of Science And Technology Encoding and decoding methods and devices including cnn-based in-loop filter
CN110637460A (en) * 2017-07-11 2019-12-31 索尼公司 Visual quality preserving quantitative parameter prediction using deep neural networks
CN107396124A (en) * 2017-08-29 2017-11-24 南京大学 Video-frequency compression method based on deep neural network
CN109819252A (en) * 2019-03-20 2019-05-28 福州大学 A kind of quantization parameter Cascading Methods not depending on gop structure
CN110198444A (en) * 2019-04-16 2019-09-03 浙江大华技术股份有限公司 Video frame coding method, coding video frames equipment and the device with store function

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726613A (en) * 2020-06-30 2020-09-29 福州大学 Video coding optimization method based on just noticeable difference
CN111726613B (en) * 2020-06-30 2021-07-27 福州大学 Video coding optimization method based on just noticeable difference
CN112001854A (en) * 2020-07-14 2020-11-27 浙江大华技术股份有限公司 Method for repairing coded image and related system and device
CN112149266A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining network model quantization strategy
WO2022194137A1 (en) * 2021-03-17 2022-09-22 华为技术有限公司 Video image encoding method, video image decoding method and related devices
WO2022222767A1 (en) * 2021-04-21 2022-10-27 北京汇钧科技有限公司 Data processing method and apparatus
CN113329228A (en) * 2021-05-27 2021-08-31 杭州朗和科技有限公司 Video encoding method, decoding method, device, electronic device and storage medium
CN113329228B (en) * 2021-05-27 2024-04-26 杭州网易智企科技有限公司 Video encoding method, decoding method, device, electronic equipment and storage medium
CN114745556A (en) * 2022-02-07 2022-07-12 浙江智慧视频安防创新中心有限公司 Encoding method, encoding device, digital video film system, electronic device, and storage medium
CN114745556B (en) * 2022-02-07 2024-04-02 浙江智慧视频安防创新中心有限公司 Encoding method, encoding device, digital retina system, electronic device, and storage medium
CN116760988A (en) * 2023-08-18 2023-09-15 瀚博半导体(上海)有限公司 Video coding method and device based on human visual system
CN116760988B (en) * 2023-08-18 2023-11-10 瀚博半导体(上海)有限公司 Video coding method and device based on human visual system

Similar Documents

Publication Publication Date Title
CN111314698A (en) Image coding processing method and device
CN101945280B (en) Method and device for selecting encoding types and predictive modes for encoding video data
CN104301724B (en) Method for processing video frequency, encoding device and decoding device
CN101540912B (en) Selection of coding type for coding video data and of predictive mode
US20210274214A1 (en) Method and apparatus for encoding/decoding image
Zhang et al. Low complexity HEVC INTRA coding for high-quality mobile video communication
CN103248895B (en) A kind of quick mode method of estimation for HEVC intraframe coding
CN104969552A (en) Intra prediction mode decision with reduced storage
CN1823531B (en) Method and apparatus for encoding moving pictures
RU2573747C2 (en) Video encoding method and apparatus, video decoding method and apparatus and programmes therefor
CN107846593B (en) Rate distortion optimization method and device
CN103327325A (en) Intra-frame prediction mode rapid self-adaptation selection method based on HEVC standard
CN104853191B (en) A kind of HEVC fast encoding method
CN112055203B (en) Inter-frame prediction method, video coding method and related devices
CN108012163A (en) The bit rate control method and device of Video coding
CN111263144B (en) Motion information determination method and equipment
CN110365988B (en) H.265 coding method and device
CN103596003B (en) Interframe predication quick mode selecting method for high-performance video coding
CN104038769A (en) Rate control method for intra-frame coding
CN104333755B (en) The CU based on SKIP/Merge RD Cost of B frames shifts to an earlier date terminating method in HEVC
KR20130103140A (en) Preprocessing method before image compression, adaptive motion estimation for improvement of image compression rate, and image data providing method for each image service type
US11395002B2 (en) Prediction direction selection method and apparatus in image encoding, and storage medium
US9332266B2 (en) Method for prediction in image encoding and image encoding apparatus applying the same
CN109688411B (en) Video coding rate distortion cost estimation method and device
Ukhanova et al. Game-theoretic rate-distortion-complexity optimization for HEVC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200619