CN109068138B

CN109068138B - Video image processing method and device, electronic equipment and storage medium

Info

Publication number: CN109068138B
Application number: CN201810892284.8A
Authority: CN
Inventors: 鲁国; 欧阳万里; 徐东; 张小云; 高志勇; 孙明庭
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2021-12-24
Anticipated expiration: 2038-08-07
Also published as: CN109068138A

Abstract

The present disclosure relates to a method and an apparatus for processing a video image, an electronic device, and a storage medium, wherein the method comprises: aiming at a target decoding frame to be restored at the current moment, carrying out state estimation processing on a reference frame associated with the target decoding frame to obtain prior estimation of the current moment, wherein the reference frame at least comprises a restoring frame at the previous moment; and obtaining a recovery frame of the current moment at least according to the prior estimation of the current moment. According to the embodiment of the disclosure, the distortion rate of the restored frame can be reduced.

Description

Video image processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing a video image, an electronic device, and a storage medium.

Background

The video compression technology can be widely applied to transmission processing of various videos, and transmission efficiency can be improved. However, the video after video compression will cause distortion to a certain extent, which affects the viewing effect of the video.

In the related art, the video image may be processed in combination with the time domain information, including processing according to the decoded frame of the previous time at the current time, so as to reduce the distortion of the video image. However, in the related art, the processing procedures of the decoded frames at different moments of the video are independent from each other, and the restored frames obtained after the processing still have a serious distortion problem.

Disclosure of Invention

The disclosure provides a video image processing method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a method for processing a video image, including:

aiming at a target decoding frame to be restored at the current moment, carrying out state estimation processing on a reference frame associated with the target decoding frame to obtain prior estimation of the current moment, wherein the reference frame at least comprises a restoring frame at the previous moment;

and obtaining a recovery frame of the current moment at least according to the prior estimation of the current moment.

In one possible implementation, the method further includes:

determining a measurement value of the target decoding frame according to the target decoding frame;

wherein, the obtaining the recovery frame of the current time at least according to the prior estimation of the current time comprises:

and carrying out fusion processing on the prior estimation at the current moment and the measured value of the target decoding frame to obtain a recovery frame at the current moment.

In one possible implementation, the method further includes:

performing matrix transformation processing on the reference frame to obtain a state transition matrix at the current moment;

determining Kalman gain at the current moment according to the state transition matrix and the covariance matrix at the previous moment;

the fusion processing of the prior estimation at the current time and the measurement value of the target decoding frame to obtain a restored frame at the current time includes:

and according to the Kalman gain of the current moment, carrying out weighted fusion processing on the prior estimation of the current moment and the measurement value of the target decoding frame to obtain a recovery frame of the current moment.

In a possible implementation manner, the performing state estimation processing on the reference frame associated with the target decoded frame to obtain an a priori estimate of the current time includes:

and inputting the recovery frame of the previous moment into a first state estimation network for processing to obtain prior estimation of the current moment.

In a possible implementation manner, the performing matrix transformation processing on the reference frame to obtain a state transition matrix at the current time includes:

and inputting the recovery frame at the previous moment into a first matrix transformation network to obtain a state transition matrix at the current moment.

In one possible implementation, the reference frame further includes the target decoded frame,

wherein, the performing state estimation processing on the reference frame associated with the target decoding frame to obtain a prior estimation corresponding to the current time includes:

and inputting the recovery frame and the target decoding frame at the previous moment into the second state estimation network to obtain the prior estimation corresponding to the current moment.

the matrix transformation processing on the reference frame to obtain the state transition matrix at the current time includes:

and inputting the recovery frame and the target decoding frame at the previous moment into a second matrix transformation network to obtain a state transition matrix at the current moment.

In a possible implementation manner, the determining, according to the target decoding frame, a measurement value of the target decoding frame includes:

obtaining a prediction residual error and a prediction frame from the target decoding frame;

inputting the target decoding frame and the prediction residual into a residual recovery network to obtain a recovered prediction residual;

and carrying out fusion processing on the recovered prediction residual and the prediction frame to obtain a measured value of the target decoding frame.

In one possible implementation, the first state estimation network includes a first convolutional network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolutional network,

inputting the restored frame of the previous moment into a first state estimation network for processing to obtain a prior estimation of the current moment, wherein the method comprises the following steps:

and processing the restored frame at the previous moment in sequence through a first convolution network, a first linear rectification ReLU network, a normalization network, a first residual block network, an inverse normalization network and a second convolution network of the first state estimation network to obtain the prior estimation at the current moment.

In one possible implementation, the first matrix transformation network includes: a third convolutional network, a second linear rectification ReLU network, a second residual block network, a fourth convolutional network, a matrix transform network and a first fusion network,

wherein, the inputting the restored frame of the previous moment into the first matrix transformation network to obtain the state transition matrix of the current moment includes:

and processing the restored frame at the previous moment in sequence through the third convolutional network, the second linear rectification ReLU network, the second residual block network, the fourth convolutional network, the matrix transformation network and the first fusion network to obtain the state transition matrix at the current moment.

In one possible implementation, the second state network includes: a first splicing network, a first convolution network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, a second convolution network, and a second fusion network,

inputting the restored frame at the previous moment and the target decoding frame into the second state estimation network to obtain a prior estimation corresponding to the current moment, wherein the method comprises the following steps:

and processing the restored frame and the target decoding frame at the previous moment in turn through the first splicing network, the first convolution network, the first linear rectification ReLU network, the normalization network, the first residual block network, the inverse normalization network, the second convolution network and the second fusion network to obtain the prior estimation at the current moment.

In one possible implementation, the second matrix transformation network includes: a first splicing network, a third convolution network, a second linear rectification ReLU network, a second residual block network, a fourth convolution network, a matrix transformation network and a first fusion network,

wherein, the inputting the restored frame at the previous moment and the target decoding frame into a second matrix transformation network to obtain the state transition matrix at the current moment includes:

and processing the restored frame and the target decoding frame at the previous moment in sequence through the first splicing network, the third convolution network, the second linear rectification ReLU network, the second residual block network, the fourth convolution network, the matrix transformation network and the first fusion network to obtain the state transition matrix at the current moment.

In one possible implementation, the residual error recovery network includes: a second stitching network, a fifth convolutional network, a third linearly rectified ReLU network, a normalization network, a third residual block network, an inverse normalization network, and a sixth convolutional network,

inputting the target decoded frame and the prediction residual into a residual recovery network to obtain a recovered prediction residual, wherein the method comprises the following steps:

and processing the target decoding frame and the prediction residual in sequence through the second splicing network, the fifth convolution network, the third linear rectification ReLU network, the normalization network, the third residual block network, the inverse normalization network and the sixth convolution network to obtain the recovered prediction residual.

In one possible implementation, the method further includes:

inputting a recovery frame of a second moment into a first state estimation network for processing to obtain a first training prior estimation of the first moment, wherein the second moment is a previous moment of the first moment;

determining a first network loss of the first state estimation network according to a first training prior estimation at the first moment and an original frame at the first moment;

and adjusting the parameter value of the first state estimation network according to the first network loss.

In one possible implementation, the method further includes:

inputting a recovery frame corresponding to a second moment into a first matrix transformation network for processing to obtain a first training state transition matrix of the first moment, wherein the second moment is a moment before the first moment;

determining a second network loss of the first matrix transformation network according to a first training state transition matrix at the first moment and prior estimation at the first moment;

and adjusting the parameter value of the first matrix transformation network according to the second network loss.

In one possible implementation, the method further includes:

inputting a decoding frame corresponding to a first moment and a recovery frame corresponding to a second moment into the second state estimation network for processing to obtain a second training prior estimation of the first moment, wherein the second moment is a previous moment of the first moment;

determining a third network loss of the second state estimation network according to a second training prior estimation at the first moment and an original frame at the first moment;

and adjusting the parameter value of the second state estimation network according to the third network loss.

In one possible implementation, the method further includes:

inputting a decoding frame corresponding to a first moment and a recovery frame corresponding to a second moment into the second matrix transformation network for processing to obtain a second training state transition matrix of the first moment, wherein the second moment is a moment before the first moment;

determining a fourth network loss of the second matrix transformation network according to a second training state transition matrix at the first moment and prior estimation at the first moment;

and adjusting the parameter value of the second matrix transformation network according to the fourth network loss.

In one possible implementation, the method further includes:

acquiring a prediction frame and a prediction residual corresponding to a decoding frame at a first moment;

determining a non-quantized residual error of the first moment according to the original frame and the predicted frame of the first moment;

inputting the decoded frame at the first moment and the prediction residual error corresponding to the decoded frame at the first moment into the residual error recovery network for processing to obtain a training prediction residual error recovered at the first moment;

determining a fifth network loss of the residual error recovery model network according to the training prediction residual error recovered at the first moment, the prediction residual error corresponding to the decoded frame at the first moment and the quantization-free residual error at the first moment;

and adjusting the parameter value of the residual error recovery model network according to the fifth network loss.

In one possible implementation, the method further includes:

and training the first state estimation network according to the prior estimation of the current moment and the original frame of the current moment.

In one possible implementation, the method further includes:

and training the first matrix transformation network according to the state transition matrix at the current moment and the prior estimation at the current moment.

In one possible implementation, the method further includes:

and training the second state estimation network according to the prior estimation of the current moment and the original frame of the current moment.

In one possible implementation, the method further includes:

and training the second matrix transformation network according to the state transition matrix at the current moment and the prior estimation at the current moment.

In one possible implementation, the method further includes:

and training the residual error recovery model network according to the measured value of the current moment, the decoded frame of the current moment and the original frame of the current moment.

In a possible implementation manner, the determining, according to the state transition matrix and the covariance matrix at the previous time, a kalman gain at the current time includes:

updating the covariance matrix at the previous moment according to the state transition matrix to obtain a prior covariance matrix at the current moment;

and determining the Kalman gain of the current moment according to the prior covariance matrix of the current moment.

According to an aspect of the present disclosure, there is provided a video image processing apparatus including:

the first processing module is used for carrying out state estimation processing on a reference frame associated with a target decoding frame to be restored at the current moment so as to obtain prior estimation at the current moment, wherein the reference frame at least comprises a restoration frame at the previous moment;

and the second processing module is used for obtaining the recovery frame at the current moment at least according to the prior estimation at the current moment.

In one possible implementation, the apparatus further includes:

a first determining module, configured to determine, according to the target decoded frame, a measurement value of the target decoded frame;

the second processing module is further configured to perform fusion processing on the prior estimation at the current time and the measurement value of the target decoding frame to obtain a restored frame at the current time.

In one possible implementation, the apparatus further includes:

the third processing module is used for carrying out matrix transformation processing on the reference frame to obtain a state transition matrix at the current moment;

the second determination module is used for determining the Kalman gain at the current moment according to the state transition matrix and the covariance matrix at the previous moment;

the second processing module is further configured to perform weighted fusion processing on the prior estimation at the current time and the measurement value of the target decoding frame according to the kalman gain at the current time to obtain a restored frame at the current time.

In one possible implementation manner, the first processing module includes:

and the first processing submodule is used for inputting the restored frame at the previous moment into a first state estimation network for processing to obtain the prior estimation at the current moment.

In one possible implementation manner, the third processing module includes:

and the second processing submodule is used for inputting the recovery frame at the previous moment into the first matrix transformation network to obtain the state transition matrix at the current moment.

the first processing module comprises:

and the third processing sub-module is used for inputting the restored frame at the previous moment and the target decoding frame into the second state estimation network to obtain the prior estimation corresponding to the current moment.

the third processing module comprises:

and the fourth processing submodule is used for inputting the restored frame at the previous moment and the target decoding frame into a second matrix transformation network to obtain a state transition matrix at the current moment.

In one possible implementation manner, the first determining module includes:

a first obtaining sub-module, configured to obtain a prediction residual and a prediction frame from the target decoded frame;

the second obtaining submodule is used for inputting the target decoding frame and the prediction residual into a residual recovery network to obtain a recovered prediction residual;

and the first fusion submodule is used for carrying out fusion processing on the recovered prediction residual and the prediction frame to obtain a measured value of the target decoding frame.

the first processing submodule is configured to sequentially process the restored frame at the previous time through a first convolution network, a first linear rectification ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolution network of the first state estimation network, so as to obtain a priori estimation at the current time.

and the second processing submodule is used for sequentially processing the restored frame at the previous moment through the third convolutional network, the second linear rectification ReLU network, the second residual block network, the fourth convolutional network, the matrix transformation network and the first fusion network to obtain the state transition matrix at the current moment.

and the third processing submodule is used for sequentially processing the restored frame and the target decoding frame at the previous moment through the first splicing network, the first convolution network, the first linear rectification ReLU network, the normalization network, the first residual block network, the inverse normalization network, the second convolution network and the second fusion network to obtain the prior estimation of the current moment.

and the fourth processing submodule is used for sequentially processing the restored frame and the target decoding frame at the previous moment through the first splicing network, the third convolution network, the second linear rectification ReLU network, the second residual block network, the fourth convolution network, the matrix transformation network and the first fusion network to obtain the state transition matrix at the current moment.

the second obtaining submodule is used for sequentially processing the target decoding frame and the prediction residual through the second splicing network, the fifth convolution network, the third linear rectification ReLU network, the normalization network, the third residual block network, the inverse normalization network and the sixth convolution network to obtain a recovered prediction residual.

In one possible implementation, the apparatus further includes:

a fourth processing module, configured to input a recovered frame at a second time into a first state estimation network for processing, so as to obtain a first training prior estimate at the first time, where the second time is a time before the first time;

a third determining module, configured to determine a first network loss of the first state estimation network according to the first training prior estimation at the first time and an original frame at the first time;

and the first adjusting module is used for adjusting the parameter value of the first state estimation network according to the first network loss.

In one possible implementation, the apparatus further includes:

a fifth processing module, configured to input a recovered frame corresponding to a second time into the first matrix transformation network for processing, so as to obtain a first training state transition matrix at the first time, where the second time is a time before the first time;

a fourth determining module, configured to determine a second network loss of the first matrix transformation network according to the first training state transition matrix at the first time and the prior estimation at the first time;

and the second adjusting module is used for adjusting the parameter value of the first matrix transformation network according to the second network loss.

In one possible implementation, the apparatus further includes:

a sixth processing module, configured to input a decoded frame corresponding to a first time and a restored frame corresponding to a second time into the second state estimation network for processing, so as to obtain a second training prior estimation of the first time, where the second time is a time before the first time;

a fifth determining module, configured to determine a third network loss of the second state estimation network according to a second training prior estimation at the first time and an original frame at the first time;

and the third adjusting module is used for adjusting the parameter value of the second state estimation network according to the third network loss.

In one possible implementation, the apparatus further includes:

a seventh processing module, configured to input a decoded frame corresponding to a first time and a restored frame corresponding to a second time into the second matrix transformation network for processing, so as to obtain a second training state transition matrix at the first time, where the second time is a time before the first time;

a sixth determining module, configured to determine a fourth network loss of the second matrix transformation network according to the second training state transition matrix at the first time and the prior estimation at the first time;

and the fourth adjusting module is used for adjusting the parameter value of the second matrix transformation network according to the fourth network loss.

In one possible implementation, the apparatus further includes:

the acquisition module is used for acquiring a prediction frame and a prediction residual error corresponding to the decoding frame at the first moment;

a seventh determining module, configured to determine a quantization-free residual at the first time according to the original frame and the predicted frame at the first time;

an eighth processing module, configured to input the decoded frame at the first time and the prediction residual corresponding to the decoded frame at the first time into the residual recovery network for processing, so as to obtain a training prediction residual recovered at the first time;

an eighth determining module, configured to determine a fifth network loss of the residual recovery model network according to the training prediction residual restored at the first time, the prediction residual corresponding to the decoded frame at the first time, and the quantization-free residual at the first time;

and the fifth adjusting module is used for adjusting the parameter value of the residual error recovery model network according to the fifth network loss.

In one possible implementation, the apparatus further includes:

and the first training module is used for training the first state estimation network according to the prior estimation of the current moment and the original frame of the current moment.

In one possible implementation, the apparatus further includes:

and the second training module is used for training the first matrix transformation network according to the state transition matrix at the current moment and the prior estimation at the current moment.

In one possible implementation, the apparatus further includes:

and the third training module is used for training the second state estimation network according to the prior estimation of the current moment and the original frame of the current moment.

In one possible implementation, the apparatus further includes:

and the fourth training module is used for training the second matrix transformation network according to the state transition matrix at the current moment and the prior estimation at the current moment.

In one possible implementation, the apparatus further includes:

and the fifth training module is used for training the residual error recovery model network according to the measured value at the current moment, the decoded frame at the current moment and the original frame at the current moment.

In one possible implementation manner, the second determining module includes:

the fifth processing submodule is used for updating the covariance matrix at the previous moment according to the state transition matrix to obtain a prior covariance matrix at the current moment;

and the determining submodule is used for determining the Kalman gain of the current moment according to the prior covariance matrix of the current moment.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the above-described video image processing method is performed.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method of processing video images.

The video image processing method and apparatus, the electronic device, and the storage medium provided in the embodiments of the present disclosure may combine the previous-time restored frame and the target decoded frame to restore the target decoded frame at the current time, that is, the embodiments of the present disclosure may recursively restore the target decoded frame at the current time using the restored frames at all times before the current time in the video, and may reduce the distortion rate of the restored frame.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a method of processing a video image in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of processing video images in accordance with an exemplary embodiment;

FIG. 3 is a flow chart illustrating a method of processing video images in accordance with an exemplary embodiment;

FIG. 4 is a flow chart illustrating a method of processing video images in accordance with an exemplary embodiment;

FIG. 5 is a schematic network diagram of an exemplary first state estimation network;

FIG. 6 is a network diagram of an exemplary second state estimation network;

FIG. 7 is a schematic diagram of an exemplary first matrixing network;

FIG. 8 is a schematic diagram of an exemplary second matrix network;

FIG. 9 is a flow chart illustrating a method of processing video images in accordance with an exemplary embodiment;

FIG. 10 is a schematic diagram of an exemplary network architecture for a residual error recovery network;

FIG. 11 is a schematic diagram of a network for video image processing, according to an exemplary embodiment;

FIG. 12 is a block diagram illustrating a network for video image processing in accordance with an exemplary embodiment;

FIG. 13 is a schematic diagram of a network for video image processing, according to an exemplary embodiment;

FIG. 14 is a block diagram illustrating a network for video image processing in accordance with an exemplary embodiment;

fig. 15 is a schematic configuration diagram illustrating a video image processing apparatus according to an exemplary embodiment;

fig. 16 is a schematic configuration diagram showing a video image processing apparatus according to an exemplary embodiment;

FIG. 17 is a block diagram illustrating an electronic device 800 in accordance with an exemplary embodiment;

FIG. 18 is a block diagram illustrating an electronic device 1900 according to an example embodiment.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 is a flowchart illustrating a method of processing a video image according to an exemplary embodiment. The method can be applied to terminal equipment or a server. As shown in fig. 1, the method for processing the video image may include:

step 11, performing state estimation processing on a reference frame associated with a target decoding frame to be restored at the current moment to obtain prior estimation at the current moment, wherein the reference frame at least comprises a restoration frame at the previous moment.

And the target decoding frame is a decoding frame corresponding to the decoded video at the current moment. The restored frame is an image frame obtained by processing a video image of the decoded frame.

For example, when a target decoded frame to be restored at the current time is processed, a restored frame at the previous time may be obtained (after a video image of the decoded frame at the previous time is processed to obtain a restored frame, the restored frame at the previous time may be buffered in the buffer). And performing state estimation processing on the recovered frame at the previous moment to obtain prior estimation of the current moment.

For example, the state estimation processing may be performed on the recovered frame at the previous time through a prior state estimation network of the kalman model, so as to obtain a prior estimation at the current time. Or, a Kalman model can be trained through a sample video to obtain a deep Kalman model, the deep Kalman model can include a state estimation network, a matrix transformation network and a residual error recovery network, and the state estimation processing can be performed on the recovered frame at the previous moment through the state estimation network in the deep Kalman model to obtain the prior estimation at the current moment.

And step 12, obtaining a recovery frame of the current time at least according to the prior estimation of the current time.

For example, the priori estimate of the current time and the target decoding frame of the current time may be fused to obtain the restored frame of the current time. Or, a measurement value corresponding to the target decoded frame at the current time (the measurement value may be obtained by performing residual error repairing processing on the target decoded frame) may be determined, and the priori estimation at the current time and the measurement value corresponding to the target decoded frame may be subjected to fusion processing to obtain a restored frame at the current time.

In this embodiment, when processing the target decoded frame at the current time, the state estimation processing may be performed on a reference frame (a previous-time restored frame) associated with the target decoded frame to obtain a priori estimation at the current time, and the restored frame at the current time is obtained according to the priori estimation at the current time.

The processing method for the video image provided by the embodiment of the disclosure can restore the prior estimation of the current time by combining the restored frame of the previous time, and further restore the target decoded frame, that is, the embodiment of the disclosure can recursively restore the target decoded frame of the current time by using the restored frames of all the times before the current time in the video, and because the restored frames have more abundant and accurate reference information, the distortion rate of the restored frames can be reduced.

Fig. 2 is a flowchart illustrating a method of processing a video image according to an exemplary embodiment.

In one possible implementation manner, referring to fig. 2, the method may further include:

and step 13, determining the measurement value of the target decoding frame according to the target decoding frame.

For example, the target decoded frame may be determined to be a measurement; or the target decoding frame can be processed by adopting a DnCNN (feedforward noise Reduction convolutional neural network), an ARCNN (Artifacts Reduction CNN, artifact elimination neural network), a MemNet (image super-resolution model) and other modes to obtain a measured value of the target decoding frame; or, the target decoding frame can be processed through a residual error recovery network in the depth Kalman model to obtain a measurement value of the target decoding frame.

In this embodiment, the step 12 can be implemented by the following step 121:

and step 121, performing fusion processing on the prior estimation at the current moment and the measurement value of the target decoding frame to obtain a recovery frame at the current moment.

In this embodiment, when processing a target decoded frame at a current time, state estimation processing may be performed on a reference frame (a restored frame at a previous time) associated with the target decoded frame to obtain a prior estimation at the current time, and after determining a measurement value of the target decoded frame according to the target decoded frame, fusion processing is performed on the prior estimation at the current time and the measurement value of the target decoded frame to obtain a restored frame at the current time.

The processing method for the video image provided by the embodiment of the present disclosure may combine the previous-time restored frame and the target decoded frame to restore the target decoded frame at the current time, that is, the embodiment of the present disclosure may recursively restore the current-time target decoded frame using the restored frames at all times before the current time in the video, and may reduce the distortion rate of the restored frame.

Fig. 3 is a flowchart illustrating a method of processing a video image according to an exemplary embodiment.

In one possible implementation manner, referring to fig. 3, the method may further include:

and 14, performing matrix transformation processing on the reference frame to obtain a state transition matrix at the current moment.

The restored frame at the previous moment can be subjected to matrix transformation processing to obtain a state transition matrix at the current moment. For example, the restored frame at the previous time may be input into a jacobian matrix, and the state transition moment at the current time is calculated through the jacobian matrix; or processing the recovery frame at the previous moment through a matrix transformation network in the deep Kalman model to obtain a state transition matrix at the current moment.

And step 15, determining the Kalman gain at the current moment according to the state transition matrix and the covariance matrix at the previous moment.

Fig. 4 is a flowchart illustrating a video image processing method according to an exemplary embodiment, in which, as illustrated in fig. 4, a method of determining a kalman gain at a current time may include:

and 151, updating the covariance matrix at the previous moment according to the state transition matrix to obtain a prior covariance matrix at the current moment.

And 152, determining the Kalman gain of the current moment according to the prior covariance matrix of the current moment.

The covariance matrix at the previous moment can be determined according to the kalman gain at the previous moment and the prior covariance matrix at the previous moment, and the process of determining the covariance matrix at the previous moment can refer to the following formula (one).

Where t may represent the current time, t-1 may represent the previous time, P_t-1Can represent the covariance matrix, K, of the previous moment_t-1The kalman gain at the previous time instant may be represented,

the prior covariance matrix at the previous time instant can be represented, and H can represent a measurement matrix, which is a constant matrix that does not change over time.

After the covariance matrix at the previous moment is obtained, the prior covariance matrix at the current moment can be determined according to the state transition matrix and the covariance matrix at the previous moment, and the process of determining the prior covariance matrix at the current moment can refer to the formula (II).

Wherein, the above

Can represent the prior covariance matrix at the current time, A_tCan represent the state transition matrix, Q, at the current time_t-1The covariance matrix of the process noise, which is a constant matrix that does not change over time, may be represented.

After the prior covariance matrix at the current moment is obtained, the kalman gain at the current moment can be determined according to the prior covariance matrix at the current moment, and the process of determining the kalman gain can refer to formula (iii).

Wherein, K is_tMay represent the Kalman gain, U, at the present time_tThe covariance matrix of the measurement noise, which is a constant matrix that does not change with time, can be represented.

In the embodiment of the present disclosure, the step 121 may be implemented by the following step 1211:

and 1211, performing weighted fusion processing on the prior estimation value at the current time and the measurement value of the target decoding frame according to the kalman gain at the current time to obtain a restored frame at the current time.

For example, according to the kalman gain at the current time, the process of obtaining the restored frame at the current time by performing weighted fusion processing on the apriori estimation at the current time and the measurement value of the target decoding frame may refer to the following formula (iv).

Wherein, the above

A recovery frame that may indicate the current time, as described above

Can represent a priori estimates of the current time, Z_tMay represent a measurement of the target decoded frame.

In this embodiment, when processing a target decoded frame at a current time, state estimation processing may be performed on a reference frame (a restored frame at a previous time) associated with the target decoded frame to obtain a prior estimation at the current time, matrix transformation processing may be performed on the restored frame at the previous time to obtain a state transition matrix at the current time, a kalman gain at the current time is determined according to the state transition matrix and a covariance matrix at the previous time, and after determining a measurement value of the target decoded frame, weighted fusion processing may be performed on the prior estimation at the current time and the measurement value of the target decoded frame according to the kalman gain at the current time to obtain the restored frame at the current time.

The processing method for the video image provided by the embodiment of the present disclosure may recover the target decoded frame at the current time by combining the recovered frame at the previous time, that is, the embodiment of the present disclosure may utilize the recovered frames at all times before the current time in the video to recursively recover the target decoded frame at the current time, and in the process of recursively recovering the target decoded frame, the error accumulation problem in the recursive process may be solved through the kalman gain, so that the processing on the video image may be implemented, and the distortion rate of the recovered frame may be reduced.

In one possible implementation, the step 11 may include:

The first state estimation network may be configured to process a previous recovery frame, extract features such as pixel features and motion trajectories in the previous recovery frame, and obtain a priori estimation of a current time based on the previous recovery frame. The first state estimation network may refer to the following formula (five).

Wherein, the above

A restored frame of a previous time instant may be represented,

may represent a priori estimates of the current time instant, theta_f1The parameter may be represented.

In one possible implementation, the first state estimation network may include a first convolution network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolution network,

the above inputting the restored frame at the previous time into the first state estimation network for processing to obtain the prior estimation at the current time may include:

The first convolution network may be a convolution network including a plurality of convolution filters, for example: the first convolution network may include 64 convolution filters of size 3 x 3.

The first linear rectification ReLU network may be any one of a leaky linear rectification network, a leaky random linear rectification network, and a noisy linear rectification network.

The first residual block network may be a network including a plurality of residual block units. Wherein the residual block network may comprise two convolution functions and two ReLU functions. For example, the first residual block network may be a network comprising 6 residual block units.

The second convolution network may be a convolution network including a convolution filter, such as: the second convolutional network may include 1 convolutional filter of size 3 × 3.

Fig. 5 is a schematic diagram of an exemplary network structure of a first state estimation network, and referring to fig. 5, inputting the recovered frame at the previous time into the first state estimation network for processing, so as to obtain an a priori estimate of the current time, which may include the following steps:

a first convolution network (64 convolution filters with the size of 3 multiplied by 3) performs convolution processing on the recovery frame at the previous moment to obtain a recovery frame after the convolution processing; the first linear rectification ReLU network performs linear processing on the convolution-processed restoration frame to obtain a linear-processed restoration frame; the normalization network performs normalization processing on the linearly processed recovery frame to obtain a recovery frame after the normalization processing; the first residual block network (6 residual block units) carries out residual processing on the restored frame after the normalization processing to obtain a restored frame after the residual processing; the inverse normalization network carries out inverse normalization processing on the restored frame after the residual error processing to obtain an inverse normalized restored frame; and the second convolution network performs convolution processing on the restored frame after the inverse normalization to obtain prior estimation of the current moment.

Since the restored frame at the previous time has more accurate and robust time information (e.g., motion trajectory) relative to the decoded frame at the previous time, the embodiment of the present disclosure may obtain a more accurate a priori estimate of the current time by processing the restored frame at the previous time through the first state estimation network.

In a possible implementation manner, the reference frame may further include the target decoded frame, and step 11 may include:

The second state estimation network may be configured to process a target decoded frame at a previous time, where the previous frame is a restored frame level, and the target decoded frame at the current time is a target decoded frame at the current time, and extract features such as pixel features and motion trajectories in the previous frame and the target decoded frame at the current time, so as to obtain a priori estimation of the current time based on the previous frame and the target decoded frame at the current time. The second state estimation network may refer to the following equation (six).

Wherein, the above

The target decoding frame, theta, may represent the current time_f2The parameter may be represented.

In one possible implementation, the second state network may include: a first splicing network, a first convolution network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, a second convolution network, and a second fusion network,

inputting the restored frame at the previous time and the target decoding frame into the second state estimation network to obtain a priori estimation corresponding to the current time, where the method may include:

The first splicing network may be configured to splice a target decoded frame at a current time and a restored frame at a previous time.

The second fusion network may be configured to add the target decoded frame at the current time and the output of the second convolution network to obtain a priori estimate of the current time.

Fig. 6 is a network architecture diagram of an exemplary second state estimation network.

For example, referring to fig. 6, the above inputting the restored frame at the previous time and the target decoded frame into the second state estimation network to obtain the prior estimate corresponding to the current time may include the following steps:

the first splicing network can splice the restored frame and the target decoding frame at the previous moment to obtain a spliced frame; the first convolution network (64 convolution filters with the size of 3 multiplied by 3) performs convolution processing on the spliced frame to obtain the spliced frame after the convolution processing; the first linear rectification ReLU network performs linear processing on the spliced frame after the convolution processing to obtain a spliced frame after the linear processing; the normalization network performs normalization processing on the linearly processed spliced frame to obtain a spliced frame after the normalization processing; the first residual block network (6 residual block units) carries out residual processing on the spliced frame after the normalization processing to obtain a spliced frame after the residual processing; the inverse normalization network carries out inverse normalization processing on the spliced frame after the residual error processing to obtain an inverse normalized spliced frame; the second convolution network performs convolution processing on the restored frame after the inverse normalization to obtain an inverse normalized spliced frame after the convolution processing; and the second fusion network adds the inverse normalized splicing frame and the target decoding frame at the current moment to obtain prior estimation at the current moment.

Because the restored frame at the previous moment has more accurate and robust time information (for example, a motion track) relative to the decoded frame at the previous moment, and the target decoded frame at the current moment can make up for a complex scene occluded in the restored frame at the previous moment, the restored frame at the previous moment and the target decoded frame at the current moment are processed by the second state estimation network in the embodiment of the disclosure, so that more accurate prior estimation can be obtained.

In a possible implementation manner, the step 14 may include:

The first matrix transformation network may be configured to process the recovered frame at the previous time to obtain a state transition matrix at the current time based on the recovered frame at the previous time. The first matrix transformation network may refer to the following formula (seven).

Wherein, A is as described above_tMay represent the state transition matrix, theta, at the current time_m1The parameter may be represented.

In a possible implementation manner, the first matrix transformation network may include: a third convolutional network, a second linear rectification ReLU network, a second residual block network, a fourth convolutional network, a matrix transform network and a first fusion network,

The third convolutional network may be a convolutional network including a convolutional filter, for example: the first convolution network may include 64 convolution filters of size 3 x 3. The third convolutional network may be the same as the first convolutional network or different from the first convolutional network (for example, the first convolutional network may be a network including 64 convolutional filters with a size of 3 × 3, and the third convolutional network may be a network including 32 convolutional filters with a size of 3 × 3), which is not limited in this embodiment of the disclosure.

The second linear rectification ReLU network may be any one of a leaky linear rectification network, a leaky random linear rectification network, and a noisy linear rectification network.

The second residual block network may be a network including a plurality of residual block units. Wherein the residual block network may comprise two convolution functions and two ReLU functions. For example, the first residual block network may be a network comprising 3 residual block units. It should be noted that the second residual block network may be the same as the first residual block network (for example, the first residual block network and the second residual block network are both networks having 6 residual block units), or may be different from the first residual block network, which is not limited in this disclosure.

The fourth convolutional network may be a convolutional network including a convolutional filter, for example: the fourth convolutional network may include a plurality of convolutional filters of size 3 × 3. The fourth convolutional network may be the same as the second convolutional network or different from the second convolutional network, and the embodiment of the present disclosure does not limit this.

The above matrix transformation network may be configured to matrix-transform the output of the fourth convolutional network according to the size of the recovered frame at the previous time. For example: if the size of the restored frame at the previous time is mxn, the matrix transform network may perform matrix transform on the output of the fourth convolutional network to obtain a matrix of mn × mn.

The first fusion network may be configured to multiply the restored frame at the previous time by the output of the fourth convolution network, so as to obtain a state transition matrix at the current time.

Fig. 7 is a schematic diagram of an exemplary first matrix network.

For example, referring to fig. 7, the inputting the recovered frame at the previous time into the first matrix transformation network to obtain the state transition matrix at the current time may include the following steps:

a third convolution network (64 convolution filters with the size of 3 multiplied by 3) performs convolution processing on the restored frame to obtain a restored frame after the convolution processing; the second linear rectification ReLU network performs linear processing on the restored frame after the convolution processing to obtain a restored frame after the linear processing; the second residual block network (3 residual block units) carries out residual processing on the restored frame after the linear processing to obtain a restored frame after the residual processing; the fourth convolution network performs convolution processing on the restored frame after residual error processing to obtain residual error after convolution processing; processing the residual error after the convolution processing by a matrix transformation network to obtain an output matrix; and the first fusion network multiplies the output matrix by the recovery frame at the previous moment to obtain a state transition matrix at the current moment.

Since the restored frame at the previous time has more accurate and robust time information (e.g., motion trajectory) relative to the decoded frame at the previous time, the embodiment of the disclosure processes the restored frame at the previous time through the first matrix transformation network, so as to obtain a more accurate state transition matrix and reduce the computational complexity.

In one possible implementation, the reference frame may further include the target decoding frame, where the step 14 may include:

The second matrix transformation network may be configured to process the previous-time restored frame and the current-time decoded frame, so as to obtain the current-time state transition matrix based on the previous-time restored frame and the current-time decoded frame. The second matrix transformation network may refer to the following equation (eight).

Wherein, theta_m2The parameter may be represented. In a possible implementation manner, the second matrix transformation network may include: a first splicing network, a third convolution network, a second linear rectification ReLU network, a second residual block network, a fourth convolution network, a matrix transformation network and a first fusion network,

the inputting the restored frame at the previous time and the target decoded frame into a second matrix transformation network to obtain the state transition matrix at the current time may include:

Fig. 8 is a schematic diagram of an exemplary second matrix network.

For example, referring to fig. 8, inputting the recovered frame at the previous time and the target decoded frame into a second matrix transformation network to obtain a state transition matrix at the current time, may include the following steps:

the first splicing network can splice the restored frame and the target decoding frame at the previous moment to obtain a spliced frame; a third convolution network (64 convolution filters with the size of 3 multiplied by 3) performs convolution processing on the spliced frame to obtain a spliced frame after the convolution processing; the second linear rectification ReLU network performs linear processing on the spliced frame after the convolution processing to obtain a spliced frame after the linear processing; the second residual block network (3 residual block units) carries out residual processing on the restored frames after the linear processing to obtain spliced frames after the residual processing; the fourth convolution network performs convolution processing on the restored frame after residual error processing to obtain residual error after convolution processing; processing the residual error after the convolution processing by a matrix transformation network to obtain an output matrix; and the first fusion network multiplies the output matrix frame by the recovery frame at the previous moment to obtain the state transition matrix at the current moment.

Because the restored frame at the previous moment has more accurate and robust time information (for example, a motion track) relative to the decoded frame at the previous moment, and the target decoded frame at the current moment can make up for a complex scene blocked in the restored frame at the previous moment, the restored frame at the previous moment and the target decoded frame at the current moment are processed through the second matrix transformation network in the embodiment of the disclosure, so that a more accurate state transformation matrix can be obtained.

Fig. 9 is a flow chart illustrating a method of processing a video image according to an exemplary embodiment.

In one possible implementation, referring to fig. 9, the step 13 may include:

step 131, obtaining a prediction residual and a prediction frame from the target decoding frame.

Step 132, inputting the target decoded frame and the prediction residual into a residual recovery network to obtain a recovered prediction residual.

And step 133, performing fusion processing on the restored prediction residual and the prediction frame to obtain a measurement value of the target decoding frame.

The target decoded frame may include a prediction residual and a predicted frame. The target decoded frame is parsed according to the encoding standard of the encoder, and the prediction residual and the prediction frame can be obtained from the target decoded frame. The prediction residual is a quantized residual, and the terminal may input the prediction residual into a residual recovery network for recovery processing to obtain a recovered prediction residual, which is a non-quantized residual. The residual error recovery network can refer to the following equation (nine).

Wherein the content of the first and second substances,

the prediction residual after restoration can be represented,

may represent the prediction residual, theta, at the current time instant_zThe parameter may be represented.

And after the recovered prediction residual is obtained, the recovered prediction residual and the prediction frame are subjected to fusion processing, so that the measured value of the target decoding frame can be obtained.

In a possible implementation manner, the residual error recovery network may include: a second stitching network, a fifth convolutional network, a third linearly rectified ReLU network, a normalization network, a third residual block network, an inverse normalization network, and a sixth convolutional network,

the above inputting the target decoded frame and the prediction residual into the residual recovery network to obtain the recovered prediction residual may include:

and sequentially processing the target decoding frame and the prediction residual error through the second splicing network, the fifth convolution network, the third linear rectification ReLU network, the normalization network, the third residual error block network, the inverse normalization network and the sixth convolution network to obtain a measured value of the target decoding frame.

The second splicing network may be configured to splice the target decoded frame and the prediction residual to obtain a spliced frame.

The fifth convolutional network may be a convolutional network including a convolutional filter, for example: the fifth convolutional network may include 64 convolutional filters of size 3 × 3. The fifth convolutional network may be the same as the first convolutional network, or may be different from the first convolutional network (for example, the first convolutional network may be a network including 64 convolutional filters with a size of 3 × 3, and the fifth convolutional network may be a network including 32 convolutional filters with a size of 3 × 3), and the embodiment of the present disclosure does not limit this.

The third residual block network may be a network including a plurality of residual block units. Wherein the residual block network may comprise two convolution functions and two ReLU functions. For example, the third residual block network may be a network including 6 residual block units. It should be noted that the third residual block network may be the same as the first residual block network (for example, the first residual block network and the third residual block network are both networks having 6 residual block units), or may be different from the first residual block network, which is not limited in this disclosure.

The sixth convolutional network may be a convolutional network including a convolutional filter, for example: the sixth convolutional network may include 1 convolutional filter of size 3 × 3. It should be noted that the sixth convolutional network may be the same as the second convolutional network or different from the second convolutional network, and details of the embodiment of the present disclosure are not repeated herein.

Fig. 10 is a network structure diagram of an exemplary residual error recovery network.

For example, referring to fig. 10, the above inputting the target decoded frame and the prediction residual into a residual recovery network to obtain a recovered prediction residual may include the following steps:

the first splicing network can splice the target decoding frame and the prediction residual error to obtain a spliced frame; a fifth convolution network (64 convolution filters with the size of 3 multiplied by 3) performs convolution processing on the spliced frame to obtain a spliced frame after the convolution processing; the third linear rectification ReLU network performs linear processing on the spliced frame after the convolution processing to obtain a spliced frame after the linear processing; the normalization network performs normalization processing on the linearly processed spliced frame to obtain a spliced frame after the normalization processing; the third residual block network (6 residual block units) performs residual processing on the spliced frame after the normalization processing to obtain a spliced frame after the residual processing; the inverse normalization network carries out inverse normalization processing on the spliced frame after the residual error processing to obtain an inverse normalized spliced frame; and the sixth convolution network performs convolution processing on the spliced frame after the inverse normalization to obtain a recovered prediction residual.

According to the embodiment of the disclosure, the prediction residual in the target decoded frame is restored from quantization to non-quantization by the residual restoration module, so that the measurement value of the target decoded frame (the target decoded frame without the influence of quantization) is closer to the original frame, and the distortion rate of the obtained restored frame is further reduced.

It should be noted that the network for video image processing may include a state estimation network, a matrix transformation network, a measurement value network, and an update network, where the state estimation network may be a first state estimation network or a second state estimation network, the matrix transformation network may be a first matrix transformation network or a second matrix transformation network, and the measurement value network may be a first measurement value network or a second measurement value network, where the first measurement value network is used to determine that the target decoded frame at the current time is the measurement value at the current time, and the second measurement value network may include a residual error recovery network.

In order that those skilled in the art will better understand the present disclosure, embodiments of the present disclosure are described below by way of several examples.

FIG. 11 is a schematic diagram of a network for video image processing, according to an exemplary embodiment; FIG. 12 is a block diagram illustrating a network for video image processing in accordance with an exemplary embodiment; FIG. 13 is a schematic diagram of a network for video image processing, according to an exemplary embodiment; fig. 14 is a schematic diagram illustrating a network for video image processing according to an exemplary embodiment.

Example one, referring to fig. 11, a network for video image processing may include: a first state estimation network, a first matrix transformation network, a first measurement network, and an update network.

In example one, the first state estimation network performs state processing on a recovered frame at a previous time to obtain a priori estimation at a current time. The first matrix transformation network carries out transformation processing on the recovered frame at the previous moment to obtain a state transformation matrix at the current moment. And the updating module obtains the Kalman gain at the current moment according to the state transition matrix, and performs weighted fusion on the target decoding frame at the current moment and the prior estimation at the current moment according to the Kalman gain at the current moment to obtain the recovery frame at the current moment.

Example two, referring to fig. 12, a network for video image processing may include: a first state estimation network, a first matrix transformation network, a second measured value network and an updating network.

In example two, the first state estimation network performs state processing on the recovered frame at the previous time to obtain a priori estimation at the current time. The first matrix transformation network carries out transformation processing on the recovered frame at the previous moment to obtain a state transformation matrix at the current moment. And the residual error recovery network processes the predicted residual error of the target decoding frame at the current moment to obtain a recovered predicted residual error, and the second measurement value network obtains the measurement value of the target decoding frame according to the recovered predicted residual error and the predicted frame included by the target decoding frame. And the updating module obtains the Kalman gain at the current moment according to the state transition matrix, and performs weighted fusion on the measurement value at the current moment and the prior estimation at the current moment according to the Kalman gain at the current moment to obtain a recovery frame at the current moment.

Example three, referring to fig. 13, a network for video image processing includes: a second state estimation network, a second matrix transformation network, a first measurement network, and an update network.

In example three, the second state estimation network performs state processing on the previous-time restored frame and the current-time target decoded frame to obtain a priori estimation of the current time. And the second matrix transformation network carries out transformation processing on the recovery frame at the previous moment and the target decoding frame at the current moment to obtain a state transformation matrix at the current moment. And the updating module obtains the Kalman gain at the current moment according to the state transition matrix, and performs weighted fusion on the target decoding frame at the current moment and the prior estimation at the current moment according to the Kalman gain at the current moment to obtain the recovery frame at the current moment.

Example four, referring to fig. 14, a network for video image processing includes: a second state estimation network, a second matrix transformation network, a second measurement network, and an update network.

In example four, the second state estimation network performs state processing on the previous-time restored frame and the current-time target decoded frame to obtain a priori estimation of the current time. And the second matrix transformation network carries out transformation processing on the recovery frame at the previous moment and the target decoding frame at the current moment to obtain a state transformation matrix at the current moment. And the residual error recovery network processes the predicted residual error of the target decoding frame at the current moment to obtain a recovered predicted residual error, and the second measurement value network obtains the measurement value of the target decoding frame according to the recovered predicted residual error and the predicted frame included by the target decoding frame. And the updating module obtains the Kalman gain at the current moment according to the state transition matrix, and performs weighted fusion on the measurement value at the current moment and the prior estimation at the current moment according to the Kalman gain at the current moment to obtain a recovery frame at the current moment.

In one possible implementation, the step of training the first state estimation network in the video image processing method may include:

In the embodiments of the present disclosure, the first state estimation network may be trained by a plurality of sample frames:

the sample frame may include an original frame at a first time and a restored frame at a second time.

Inputting the restored frame at the second moment into a first state estimation network for processing, wherein the output of the first state estimation network is a first training prior estimation, determining a first network loss of the first state estimation network according to the first training prior estimation and the original frame at the first moment, and adjusting parameters in the first state estimation network when the first network loss meets an adjustment threshold (the first network loss is greater than the threshold): theta_f1. And repeating the process of inputting the recovery frame at the second moment into the adjusted first state estimation network and training the adjusted first state estimation network until the first training prior estimation output by the first state estimation network with the adjusted parameters does not meet the adjustment threshold (the first network loss is less than the threshold, and at this moment, the first training prior estimation can be considered to approach the original frame infinitely), wherein the adjusted first state estimation network is the trained first state estimation network.

In a possible implementation, the step of training the second state estimation network in the video image processing method may include:

In the embodiments of the present disclosure, the second state estimation network may be trained by multiple sample frames:

the sample frame may include an original frame and a decoded frame at a first time, and a restored frame at a second time.

Inputting the decoded frame at the first moment and the restored frame at the second moment into a second state estimation network for processing, wherein the output of the second state estimation network is a second training prior estimation, determining a third network loss of the second state estimation network according to the second training prior estimation and the original frame at the first moment, and adjusting parameters in the second state estimation network when the third network loss meets an adjustment threshold (the third network loss is greater than the threshold) or does not reach a preset adjustment number: theta_f2. And repeating the process of inputting the recovery frame at the second moment into the adjusted second state estimation network and training the adjusted second state estimation network until the second training prior estimation output by the second state estimation network with the adjusted parameters does not meet the adjustment threshold (the loss of the third network is less than the threshold, and at the moment, the second training prior estimation can be considered to approach the original frame infinitely), wherein the adjusted second state estimation network is the trained first state estimation network.

In a possible implementation manner, the step of training the first matrix transformation network in the video image processing method may include:

In the embodiments of the present disclosure, it is,the first matrix transformation network may be trained over a plurality of sample frames:

wherein the sample frames may comprise an a priori estimate of the first time instant and a recovered frame of the second time instant.

Inputting the recovered frame of the second moment into the first matrix transformation network for processing, wherein the output of the first matrix transformation network is a first training state transformation matrix, determining the second network loss of the first matrix transformation network according to the first training state transformation matrix and the prior estimation of the first moment, and adjusting the parameters in the first matrix transformation network when the second network loss meets an adjustment threshold (the second network loss is greater than the threshold): theta_m1. And repeating the process of inputting the recovery frame at the second moment into the adjusted first matrix transformation network and training the adjusted first matrix transformation network until the first training state transformation matrix output by the first matrix transformation network with the adjusted parameters does not meet the adjustment threshold (the second network loss is less than the threshold, and at the moment, the second training state transformation matrix can be considered to approach the priori estimation of the first time infinitely), wherein the adjusted first matrix transformation network is the trained first matrix transformation network.

In a possible implementation manner, the step of training the second matrix transformation network in the video image processing method may include:

In the disclosed embodiment, the second matrix transform mesh may be trained over multiple sample framesComplexing:

the sample frame may include a decoded frame and a priori estimates at a first time, and a recovered frame at a second time.

Inputting the decoded frame at the first moment and the restored frame at the second moment into a second matrix transformation network for processing, wherein the output of the second matrix transformation network is a second training state transformation matrix, determining a fourth network loss of the second matrix transformation network according to the second training state transformation matrix and the prior estimation of the first moment, and adjusting parameters in the second matrix transformation network when the fourth network loss meets an adjustment threshold (the fourth network loss is greater than the threshold): theta_m2. And repeating the process of inputting the decoded frame at the first moment and the restored frame at the second moment into the adjusted second matrix transformation network and training the adjusted second matrix transformation network until the second training state transformation matrix output by the second matrix transformation network with the adjusted parameters does not meet the adjustment threshold (the loss of the fourth network is less than the threshold, and at the moment, the second training state transformation matrix can be considered to approach the priori estimation of the first time infinitely), wherein the adjusted second matrix transformation network is the trained second matrix transformation network.

In a possible implementation manner, the step of training the residual error recovery network in the method for processing a video image may include:

In the embodiment of the present disclosure, the residual error recovery network may be trained by a plurality of sample frames:

the sample frame may include a decoded frame and an original frame at a first time.

The decoded frame may be decomposed to obtain a corresponding predicted frame and a predicted residual, and the non-quantized residual at the first time may be determined from the original frame and the predicted frame. And inputting the decoded frame and the prediction residual at the first moment into a residual recovery network to obtain a first training prediction residual after the first moment is recovered. And determining the sum of the training prediction residual error recovered at the first moment and the prediction residual error at the first moment as a second training prediction residual error, and determining the difference value of the second training prediction residual error and the quantization-free residual error at the first moment as a fifth network loss. When the fifth network loss meets the adjustment threshold (the fifth network loss is greater than the threshold), adjusting parameters in the residual error recovery network: theta_z. And repeating the process of inputting the decoded frame and the prediction residual at the first moment into the adjusted residual recovery network to obtain the first training prediction residual and the second training prediction residual which are recovered at the first moment until the obtained fifth network loss does not meet the threshold (the fifth network loss is less than the threshold, and at the moment, the training measured value can be considered to be infinitely close to the non-quantized residual at the first moment), wherein the adjusted residual recovery network is the trained residual recovery network.

In one possible implementation, the method may further include:

and training the first state estimation network according to the prior estimation of the current moment and the original frame of the previous moment.

After obtaining the prior estimate of the current time, the prior estimate of the current time may be taken into account with the original frame of the previous time

In a possible implementation manner, after completing the processing of the target decoding frame at the current time, the method may further include:

For example, the network loss of the first state estimation network may be determined according to a difference between the prior estimation at the current time and the original frame at the current time, and the parameter of the first state estimation network may be adjusted according to the network loss of the first state estimation network.

According to the embodiment of the disclosure, in the process of processing a video, a first state estimation network can be trained in real time according to the prior estimation of the current moment and the original frame of the current moment, and when a decoded frame of the next moment is processed, the prior estimation of the next moment is determined through the adjusted first state estimation network, so that the accuracy of the first state training network can be further improved, and the distortion rate of a recovered frame obtained at the next moment is further reduced.

For example, the network loss of the second state estimation network may be determined according to the prior estimation at the current time and the original frame at the current time, and the parameter of the second state estimation network may be adjusted according to the network loss of the second state estimation network.

According to the embodiment of the disclosure, in the process of processing a video, a second state estimation network can be trained in real time according to the prior estimation of the current moment and the original frame of the current moment, and when a decoded frame of the next moment is processed, the prior estimation of the next moment is determined through the adjusted second state estimation network, so that the precision of the second state training network can be improved, and further the distortion rate of a recovered frame obtained at the next moment is reduced.

For example, the network loss of the first matrix transformation network may be determined according to the state transition matrix at the current time and the prior estimation at the current time, and the parameter of the first matrix transformation network may be adjusted according to the network loss of the first matrix transformation network.

According to the embodiment of the disclosure, in the process of video processing, the first matrix transformation network can be trained in real time according to the state transformation matrix at the current moment and the prior estimation at the current moment, and when the decoded frame at the next moment is processed, the state transformation matrix at the next moment is determined through the adjusted first matrix transformation network, so that the precision of the first matrix transformation network can be improved, and the distortion rate of the recovered frame at the next moment is reduced.

For example, the network loss of the second matrix transformation network may be determined according to the state transition matrix at the current time and the prior estimation at the current time, and the parameter of the second matrix transformation network may be adjusted according to the network loss of the second matrix transformation network.

According to the embodiment of the disclosure, in the process of video processing, the second matrix transformation network can be trained in real time according to the state transformation matrix at the current moment and the prior estimation at the current moment, and when the decoded frame at the next moment is processed, the state transformation matrix at the next moment is determined through the adjusted second matrix transformation network, so that the precision of the second matrix transformation network can be improved, and the distortion rate of the recovered frame at the next moment is reduced.

and training the residual error recovery model network according to the measured value of the current moment, the target decoding frame of the current moment and the original frame of the current moment.

For example, the prediction residual and the prediction frame of the target decoding frame may be obtained by parsing the target decoding frame, the unquantized residual at the current time may be determined according to the original frame and the prediction frame at the current time, the network loss of the residual error recovery network may be determined according to the recovered measurement value and the unquantized prediction residual, and the parameter of the residual error recovery network may be adjusted according to the network loss of the residual error recovery network.

According to the embodiment of the disclosure, in the process of processing a video, a residual error recovery network can be trained in real time according to a measurement value at the current moment, a target decoded frame at the current moment and an original frame at the current moment, and when a decoded frame at the next moment is processed, the measurement value at the next moment is determined through the adjusted residual error recovery network, so that the precision of the measurement value can be improved, and the distortion rate of a recovered frame obtained at the next moment is reduced.

Fig. 15 is a schematic structural diagram illustrating a video image processing apparatus according to an exemplary embodiment. As shown in fig. 15, the video image processing apparatus may include:

a first processing module 1501, configured to perform state estimation processing on a reference frame associated with a target decoding frame to be restored at a current time to obtain a priori estimation of the current time, where the reference frame may include at least a restored frame at a previous time;

the second processing module 1502 may be configured to obtain a recovered frame at the current time according to at least the prior estimation at the current time.

The processing apparatus for video images according to the embodiments of the present disclosure may combine the previous-time restored frame and the target decoded frame to restore the target decoded frame at the current time, that is, the embodiments of the present disclosure may recursively restore the current-time target decoded frame using the restored frames at all times before the current time in the video, and may reduce the distortion rate of the restored frame.

Fig. 16 is a schematic structural diagram illustrating a video image processing apparatus according to an exemplary embodiment.

In one possible implementation, as shown in fig. 16, the apparatus may further include:

a first determining module 1503, which may be configured to determine a measurement value of the target decoded frame according to the target decoded frame;

the second processing module 1502 may be further configured to perform fusion processing on the prior estimation at the current time and the measurement value of the target decoding frame, so as to obtain a recovered frame at the current time.

In one possible implementation, referring to fig. 16, the apparatus may further include:

a third processing module 1504, configured to perform matrix transformation processing on the reference frame to obtain a state transition matrix at the current time;

a second determining module 1505, which may be configured to determine a kalman gain at a current time according to the state transition matrix and the covariance matrix at a previous time;

the second processing module 1502 may be further configured to perform weighted fusion processing on the apriori estimate at the current time and the measurement value of the target decoding frame according to the kalman gain at the current time to obtain a recovered frame at the current time.

In a possible implementation manner, the second determining module may include:

the fifth processing submodule can be used for updating the covariance matrix at the previous moment according to the state transition matrix to obtain the prior covariance matrix at the current moment;

a determining sub-module operable to determine a kalman gain at the current time based on the a priori covariance matrix at the current time.

In a possible implementation manner, the first processing module 1501 may include:

the first processing sub-module may be configured to input the restored frame at the previous time into a first state estimation network for processing, so as to obtain a priori estimate of the current time.

In a possible implementation manner, the third processing module 1504 may include:

the second processing sub-module may be configured to input the restored frame at the previous time into the first matrix transformation network, so as to obtain a state transition matrix at the current time.

In a possible implementation manner, the reference frame may further include the target decoded frame, and the first processing module 1501 may include:

the third processing sub-module may be configured to input the restored frame at the previous time and the target decoded frame into the second state estimation network, so as to obtain a prior estimate corresponding to the current time.

In one possible implementation, the reference frame may further include the target decoded frame,

the third processing module may include:

and the fourth processing submodule may be configured to input the restored frame at the previous time and the target decoded frame into a second matrix transformation network, so as to obtain a state transition matrix at the current time.

In a possible implementation manner, the first determining module 1501 may include:

the second obtaining sub-module is used for inputting the target decoding frame and the prediction residual into a residual recovery network to obtain a recovered prediction residual;

the first fusion submodule may be configured to perform fusion processing on the restored prediction residual and the prediction frame to obtain a measurement value of the target decoded frame.

In one possible implementation, the first state estimation network may include a first convolutional network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolutional network,

the first processing sub-module may be configured to sequentially process the restored frame at the previous time through a first convolution network, a first linear rectification ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolution network of the first state estimation network, so as to obtain a priori estimation at the current time.

In one possible implementation, the first matrix transformation network may include: a third convolutional network, a second linear rectification ReLU network, a second residual block network, a fourth convolutional network, a matrix transform network and a first fusion network,

the second processing sub-module may be configured to sequentially process the restored frame at the previous time through the third convolutional network, the second linear rectification ReLU network, the second residual block network, the fourth convolutional network, the matrix transformation network, and the first fusion network, so as to obtain a state transition matrix at the current time.

the third processing sub-module may be configured to sequentially process the restored frame and the target decoded frame at the previous time through the first splicing network, the first convolution network, the first linear rectification ReLU network, the normalization network, the first residual block network, the inverse normalization network, the second convolution network, and the second fusion network, so as to obtain a priori estimate of the current time.

In one possible implementation, the second matrix transformation network may include: a first splicing network, a third convolution network, a second linear rectification ReLU network, a second residual block network, a fourth convolution network, a matrix transformation network and a first fusion network,

the fourth processing submodule may be configured to sequentially process the restored frame and the target decoded frame at the previous time through the first splicing network, the third convolutional network, the second linear rectification ReLU network, the second residual block network, the fourth convolutional network, the matrix transformation network, and the first fusion network, so as to obtain a state transition matrix at the current time.

In one possible implementation, the residual error recovery network may include: a second stitching network, a fifth convolutional network, a third linearly rectified ReLU network, a normalization network, a third residual block network, an inverse normalization network, and a sixth convolutional network,

the second obtaining sub-module may be configured to sequentially process the target decoded frame and the prediction residual through the second concatenation network, the fifth convolutional network, the third linearly rectified ReLU network, the normalization network, the third residual block network, the inverse normalization network, and the sixth convolutional network, so as to obtain a restored prediction residual.

In a possible implementation manner, the apparatus may further include:

a fourth processing module, configured to input a recovered frame at a second time into a first state estimation network for processing, to obtain a first training prior estimate at the first time, where the second time is a previous time to the first time;

a third determining module, configured to determine a first network loss of the first state estimation network according to a first training prior estimation at the first time and an original frame at the first time;

a first adjusting module may be configured to adjust a parameter value of the first state estimation network according to the first network loss.

In a possible implementation manner, the apparatus may further include:

a fourth determining module, configured to determine a second network loss of the first matrix transformation network according to a first training state transition matrix at the first time and the prior estimation at the first time;

a second adjusting module may be configured to adjust a parameter value of the first matrix transformation network according to the second network loss.

In a possible implementation manner, the apparatus may further include:

a sixth processing module, configured to input a decoded frame corresponding to a first time and a restored frame corresponding to a second time into the second state estimation network for processing, so as to obtain a second training prior estimation of the first time, where the second time is a previous time to the first time;

a third adjusting module, configured to adjust a parameter value of the second state estimation network according to the third network loss.

In a possible implementation manner, the apparatus may further include:

a sixth determining module, configured to determine a fourth network loss of the second matrix transformation network according to a second training state transition matrix at the first time and the prior estimation at the first time;

a fourth adjusting module, configured to adjust a parameter value of the second matrix according to the fourth network loss.

In a possible implementation manner, the apparatus may further include:

a seventh determining module, configured to determine a quantization-free residual of the first time according to the original frame and the predicted frame of the first time;

a fifth adjusting module, configured to adjust a parameter value of the residual error recovery model network according to the fifth network loss.

In a possible implementation manner, the apparatus may further include:

a first training module, configured to train the first state estimation network according to the prior estimation at the current time and an original frame at the current time.

In a possible implementation manner, the apparatus may further include:

the second training module may be configured to train the first matrix transformation network according to the state transition matrix at the current time and the prior estimation at the current time.

In a possible implementation manner, the apparatus may further include:

a third training module, configured to train the second state estimation network according to the prior estimation at the current time and an original frame at the current time.

In a possible implementation manner, the apparatus may further include:

the fourth training module may be configured to train the second matrix transformation network according to the state transition matrix at the current time and the prior estimation at the current time.

In a possible implementation manner, the apparatus may further include:

a fifth training module, configured to train the residual error recovery model network according to the measured value at the current time, the decoded frame at the current time, and the original frame at the current time.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

In addition, the present disclosure also provides a video image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the video image processing methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here. Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 17 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 17, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

FIG. 18 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 18, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for processing video images, comprising:

determining a measurement value of the target decoding frame according to the target decoding frame, wherein the measurement value of the target decoding frame is the target decoding frame with quantization influence eliminated;

carrying out fusion processing on the prior estimation at the current moment and the measured value of the target decoding frame to obtain a recovery frame at the current moment;

the method further comprises the following steps:

2. The method of claim 1, wherein performing a state estimation process on the reference frame associated with the target decoded frame to obtain an a priori estimate of a current time comprises:

3. The method according to claim 1, wherein the performing matrix transformation on the reference frame to obtain a state transition matrix at a current time includes:

4. The method of claim 1 or 3, wherein the reference frame further comprises the target decoded frame,

and inputting the recovery frame and the target decoding frame at the previous moment into a second state estimation network to obtain prior estimation corresponding to the current moment.

5. The method of claim 1 or 2, wherein the reference frame further comprises the target decoded frame,

6. The method according to any one of claims 1 to 3, wherein the determining the measurement value of the target decoding frame according to the target decoding frame comprises:

7. The method of claim 2, wherein the first state estimation network comprises a first convolutional network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolutional network,

8. The method of claim 3, wherein the first matrix transformation network comprises: a third convolutional network, a second linear rectification ReLU network, a second residual block network, a fourth convolutional network, a matrix transform network and a first fusion network,

9. The method of claim 4, wherein the second state estimation network comprises: a first splicing network, a first convolution network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, a second convolution network, and a second fusion network,

10. The method of claim 5, wherein the second matrix transformation network comprises: a first splicing network, a third convolution network, a second linear rectification ReLU network, a second residual block network, a fourth convolution network, a matrix transformation network and a first fusion network,

11. The method of claim 6, wherein the residual recovery network comprises: a second stitching network, a fifth convolutional network, a third linearly rectified ReLU network, a normalization network, a third residual block network, an inverse normalization network, and a sixth convolutional network,

12. The method of claim 3, further comprising:

inputting a recovery frame at a second moment into a first state estimation network for processing to obtain a first training prior estimation at a first moment, wherein the second moment is a previous moment of the first moment;

13. The method according to any one of claims 1 to 3, further comprising:

14. The method of claim 9, further comprising:

15. The method of claim 5, further comprising:

16. The method of claim 11, further comprising:

determining a fifth network loss of the residual error recovery network according to the training prediction residual error recovered at the first moment, the prediction residual error corresponding to the decoded frame at the first moment and the quantization-free residual error at the first moment;

and adjusting the parameter value of the residual error recovery network according to the fifth network loss.

17. The method according to claim 2 or 7, characterized in that the method further comprises:

18. The method according to claim 3 or 8, characterized in that the method further comprises:

19. The method of claim 4, further comprising:

20. The method of claim 5, further comprising:

21. The method of claim 6, further comprising:

and training the residual error recovery network according to the measured value of the current moment, the decoded frame of the current moment and the original frame of the current moment.

22. The method according to any one of claims 1, 2, 3 and 8, wherein the determining the kalman gain at the current time according to the state transition matrix and the covariance matrix at the previous time comprises:

23. A video image processing apparatus, comprising:

a first determining module, configured to determine, according to the target decoded frame, a measurement value of the target decoded frame, where the measurement value of the target decoded frame is the target decoded frame from which quantization influence is removed;

the second processing module is used for carrying out fusion processing on the prior estimation at the current moment and the measured value of the target decoding frame to obtain a recovery frame at the current moment;

the device further comprises:

24. The apparatus of claim 23, wherein the first processing module comprises:

25. The apparatus of claim 23, wherein the third processing module comprises:

26. The apparatus according to claim 23 or 25, wherein the reference frame further comprises the target decoded frame,

the first processing module comprises:

and the third processing sub-module is used for inputting the restored frame at the previous moment and the target decoding frame into a second state estimation network to obtain the prior estimation corresponding to the current moment.

27. The apparatus according to claim 23 or 24, wherein the reference frame further comprises the target decoded frame,

the third processing module comprises:

28. The apparatus according to any one of claims 23 to 25, wherein the first determining module comprises:

29. The apparatus of claim 24, wherein the first state estimation network comprises a first convolutional network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, and a second convolutional network,

30. The apparatus of claim 25, wherein the first matrix transformation network comprises: a third convolutional network, a second linear rectification ReLU network, a second residual block network, a fourth convolutional network, a matrix transform network and a first fusion network,

31. The apparatus of claim 26, wherein the second state estimation network comprises: a first splicing network, a first convolution network, a first linearly rectified ReLU network, a normalization network, a first residual block network, an inverse normalization network, a second convolution network, and a second fusion network,

32. The apparatus of claim 27, wherein the second matrix transformation network comprises: a first splicing network, a third convolution network, a second linear rectification ReLU network, a second residual block network, a fourth convolution network, a matrix transformation network and a first fusion network,

33. The apparatus of claim 28, wherein the residual recovery network comprises: a second stitching network, a fifth convolutional network, a third linearly rectified ReLU network, a normalization network, a third residual block network, an inverse normalization network, and a sixth convolutional network,

34. The apparatus of claim 25, further comprising:

a fourth processing module, configured to input a recovered frame at a second time into the first state estimation network for processing, so as to obtain a first training prior estimate at the first time, where the second time is a time before the first time;

35. The apparatus of any one of claims 23 to 25, further comprising:

36. The apparatus of claim 31, further comprising:

37. The apparatus of claim 27, further comprising:

38. The apparatus of claim 33, further comprising:

an eighth determining module, configured to determine a fifth network loss of the residual error recovery network according to the training prediction residual error recovered at the first time, the prediction residual error corresponding to the decoded frame at the first time, and the quantization-free residual error at the first time;

and the fifth adjusting module is used for adjusting the parameter value of the residual error recovery network according to the fifth network loss.

39. The apparatus of claim 24 or 29, further comprising:

40. The apparatus of claim 25 or 30, further comprising:

41. The apparatus of claim 26, further comprising:

42. The apparatus of claim 27, further comprising:

43. The apparatus of claim 28, further comprising:

and the fifth training module is used for training the residual error recovery network according to the measured value at the current moment, the decoded frame at the current moment and the original frame at the current moment.

44. The apparatus of claim 23, wherein the second determining module comprises:

45. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 22.

46. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 22.