CN112616052A

CN112616052A - Method for reconstructing video compression signal

Info

Publication number: CN112616052A
Application number: CN202011461038.0A
Authority: CN
Inventors: 周涛; 李琛
Original assignee: Shanghai IC R&D Center Co Ltd; Shanghai IC Equipment Material Industry Innovation Center Co Ltd
Current assignee: Shanghai IC R&D Center Co Ltd; Shanghai IC Equipment Material Industry Innovation Center Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-04-06
Anticipated expiration: 2040-12-11
Also published as: CN112616052B

Abstract

The invention discloses a reconstruction method of a video compression signal, which comprises the following steps: according to the image observation values of the key frame and the non-key frame output by the acquisition end, carrying out preliminary prediction on the image of the non-key frame to generate an image prediction value of the non-key frame; calculating an observation residual error of the non-key frame and a prediction residual error of the non-key frame according to the image prediction value of the non-key frame, the image observation value of the non-key frame and a random sampling matrix used by an acquisition end; calculating the average energy of residual error signals of observation residual errors of non-key frames, selecting an adaptive residual error reconstruction algorithm according to the energy of the residual error signals, reconstructing the predicted residual errors and generating reconstructed residual errors; and calculating the image reconstruction value of the non-key frame according to the image prediction value of the non-key frame and the reconstruction residual error of the non-key frame. The method for reconstructing the video compression signal can effectively improve the accuracy of residual error reconstruction, thereby improving the differentiation retention capacity of the video image and finally realizing high-quality reconstruction of the video image.

Description

Method for reconstructing video compression signal

Technical Field

The present invention relates to the technical field of video compression processing, and in particular, to a method for reconstructing a video compression signal.

Background

The traditional representative video compression coding technology such as H.26x and MPEG is based on the Nyquist sampling theorem, sampling is carried out at a sampling rate which is at least twice as large as the bandwidth of a sampling signal, then high-complexity compression coding is carried out on the video signal, and finally the signal is transmitted and stored.

The compressed sensing theory proposed by Donoho, Candes, Tao, et al, 2006 provides a completely new signal sampling technique, which indicates that: for sparse or nearly sparse signals in a transform domain, the signal can be sampled at a frequency well below the nyquist sampling rate and reconstructed with great probability accurately. The compression sensing theory is innovative in that on one hand, the bottleneck of the traditional sampling rate is broken through, on the other hand, the compression sensing theory directly samples signals by designing an observation matrix, and the signals obtained by sampling are compressed, so that the sampling and compression are completed synchronously, the waste of sampling resources is effectively avoided, meanwhile, the signal reconstruction with high complexity is performed at a decoding end, and the equipment performance bottleneck of a sampling section can be effectively solved. The compressive sensing technology has attracted wide attention of academic circles at the beginning of its birth, and is also favored in the fields of wireless sensing, multimedia sensing networks and the like in the aspect of practical application.

With the development of the compressive sensing theory, numerous scholars deeply research the video compressive sensing technology and obtain various remarkable results, wherein a video reconstruction distributed compressive sensing framework based on prediction-residual errors is widely adopted, but the current video reconstruction algorithm of the prediction-residual errors only focuses on the improvement of the precision of the prediction algorithm, and the influence of residual error reconstruction on the final prediction performance is ignored. Since the reconstruction of most image frames is closely related to the residual error, if the accuracy of residual error reconstruction is not high, the overall performance of the video compressed sensing algorithm is difficult to improve.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned drawbacks of the prior art and providing a method for reconstructing a compressed video signal.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a method of reconstructing a compressed video signal, comprising:

s01: according to the image observation values of the key frame and the non-key frame output by the acquisition end, carrying out preliminary prediction on the image of the non-key frame to generate an image prediction value of the non-key frame;

s02: calculating an observation residual error of the non-key frame and a prediction residual error of the non-key frame according to the image prediction value of the non-key frame, the image observation value of the non-key frame and a random sampling matrix used by an acquisition end;

s03: calculating the average energy of residual error signals of the observation residual errors of the non-key frames, selecting an adaptive residual error reconstruction algorithm according to the energy of the residual error signals, reconstructing the predicted residual errors and generating reconstructed residual errors;

s04: and calculating the image reconstruction value of the non-key frame according to the image prediction value of the non-key frame and the reconstruction residual error of the non-key frame.

Further, in step S01, the performing preliminary prediction on the image of the non-key frame, and generating the image prediction value of the non-key frame includes:

the image prediction value for the non-key frame is calculated using the following formula:

wherein the content of the first and second substances,

image prediction values for non-key frames, H_p,qIs a component block, w, for image target block estimation_p,qIs the weight coefficient corresponding to the component block, k, p are the frame index of the image block, k corresponds to the non-key frame in the same image group, p corresponds to the key frame and the non-key frame in the same image group, j, q are the image block index in the frame, the weight coefficient w is the weight coefficient_p,qCalculated using the following formula:

wherein, y_k,jThe image observation value of the non-key frame is represented by lambda, the weight factor of the time prior is represented by lambda, the influence between the associated frames is adjusted by phi, the random sampling matrix is represented by phi, p1 is the frame index of the image blocks of the associated frames, the associated frames comprise the adjacent frames of the non-key frame and the key frames in the same image group, and p2 is the frame index of the image blocks of the non-associated frames.

Further, λ > is 0.5 for static class images and λ <0.5 for dynamic class images.

Further, in step S02, calculating the observation residual of the non-key frame includes:

performing domain transformation processing on the image prediction value of the non-key frame based on the random sampling matrix to obtain the image prediction value of a transformation domain;

and subtracting the image predicted value of the transform domain from the image observed value of the non-key frame to obtain an observation residual error.

Further, in step S02, the calculating the prediction residual of the non-key frame includes:

and performing domain inverse transformation processing on the observation residual error based on the random sampling matrix to obtain a prediction residual error of the non-key frame.

Further, in step S03, the calculating the average energy of the residual signal of the observation residual of the non-key frame includes:

calculating the residual signal average energy according to the following formula:

wherein R is the average energy of the residual signal,

observed residual for non-key frames

R is the average energy of the residual signal, k is the frame index of the image block, j is the image block index within the frame, L is the index of the non-zero residual signal, and L is the number of non-zero components in the residual signal.

Further, in step S03, selecting an adaptive residual reconstruction algorithm according to the magnitude of the residual signal average energy, and reconstructing the prediction residual of the non-key frame, wherein generating the reconstructed residual of the non-key frame includes:

according to the relation between preset judgment threshold values T1 and T2 and the average energy of the residual error signal, one of the following formulas is selected to calculate the sparse expression coefficient of the reconstructed residual error:

if R is less than or equal to T1, then use

If T1 < R ≦ T2, then use

If R > T2, use

Wherein the content of the first and second substances,

in order to reconstruct the sparse representation coefficients of the residual,

sparse representation coefficients of the prediction residual for non-key frames,

is the observation residual of the non-key frame, lambda 1 and lambda 2 are weight factors for balancing the similarity and difference effects between the predicted image and the observation image,

the reconstructed residual of the non-key frame is calculated using the following formula:

wherein the content of the first and second substances,

Ψ is a redundant dictionary for the reconstructed residual of the non-key frame.

Further, step S04 includes:

the image reconstruction value of the non-key frame is calculated using the following formula:

wherein the content of the first and second substances,

for the image reconstruction values of non-key frames,

is the image prediction value of the non-key frame.

Further, the method further comprises:

and calculating an image reconstruction value of the key frame according to the image observation value of the key frame.

Further, the method further comprises:

respectively generating a key frame image and a non-key frame image by using the image reconstruction value of the non-key frame and the image reconstruction value of the key frame;

and combining the key frame images and the non-key frame images according to the sequence of the frame sequences to generate reconstructed video images.

In the reconstruction method of the video compression signal, an adaptive residual error reconstruction algorithm is selected based on the average energy of residual error signals in the reconstruction process of non-key frames, and prediction residual errors are subjected to

And reconstructing, wherein the difference between the prediction information and the observation information is reflected by the average energy of the residual error signals, and different reconstruction algorithms are selected according to the difference degree, so that the accuracy of residual error reconstruction can be effectively improved, the differentiation retention capability of the video image is improved, and the high-quality reconstruction of the video image is finally realized.

Drawings

FIG. 1 is a schematic diagram of the overall architecture of the video compression reconstruction algorithm of the present invention;

FIG. 2 is a schematic diagram of a group of pictures in successive frames of a video in accordance with the present invention;

fig. 3 is a flowchart of a method of reconstructing a video compressed signal according to the present invention.

Detailed Description

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

In the following detailed description of the embodiments of the present invention, in order to clearly illustrate the structure of the present invention and to facilitate explanation, the structure shown in the drawings is not drawn to a general scale and is partially enlarged, deformed and simplified, so that the present invention should not be construed as limited thereto.

For the convenience of description of the technical scheme, the following part reference numbers are explained as follows:

image observation value: the image signal detected directly by the acquisition-side detector is used as y in the present invention_k,jAn image observation representing a non-key frame;

image prediction value: image signal predicted based on image observation value and used in the invention

Representing image predictors of non-key frames;

observed residual of non-key frames

The difference between the image observation signal and the image prediction signal after transformation is used in the present invention

Represents the observed residual of the non-key frame;

prediction residual of non-key frames

Represents the prediction residual of the non-key frame;

reconstructed residual of non-key frames

Observed residual through non-key frames

And combining signals obtained by a segmented reconstruction algorithm

And then the following formula is used for calculation:

wherein the content of the first and second substances,

for predicting residual non-key frames

Sparse representation coefficients of (a);

image reconstruction value: the video signal that is finally to be presented, i.e. the image signal after decompression, is used in the present invention

Representing image reconstruction values of non-key frames.

Referring to fig. 1, it is a schematic diagram of the overall architecture of the video compression reconstruction algorithm of the present invention. The whole process of video compression and reconstruction can be divided into video compression processing at an acquisition end and video reconstruction processing at a reconstruction end. The acquisition end and the reconstruction end can be distributed on different devices, for example, the acquisition end can be arranged on a video server, the reconstruction end can be arranged on a mobile terminal, video signals are compressed at the acquisition end and then transmitted to the mobile terminal through the internet, and the reconstruction end on the mobile terminal reconstructs video data to generate video frames and plays the video frames.

Referring to fig. 2, it is a schematic diagram of the group of pictures in the video consecutive frames of the present invention. The video signal is split into continuous image groups at the acquisition end, each image group consists of a fixed number of continuous image frames, the first frame of each image group is a key frame, and other frames are non-key frames. At the acquisition end, a block compression sensing algorithm is adopted to compress the key frames and the non-key frames, and the block compression sensing algorithm has the advantages of being less in storage space requirement and capable of well retaining image characteristics.

For example, assuming the size of the original image to be sampled is N × N, the image block size may be set to N^1/2×N^1/2When k represents the frame index and j represents the image block index in the frame, the image signal value after sampling the original image is represented as x_k,j. For image signal value x_k,jAfter compressed sensing sampling is carried out, an image observation value is obtained and is expressed as y_k,j。

The sampling process can be expressed by the following formula (1):

y_k,j＝φx_k,jformula (1)

Phi is a random sampling matrix of M multiplied by N, M < < N, namely the observation quantity is far smaller than the quantity of discrete components of the image signal, the sampling rate is M/N, and the random sampling matrix can adopt a Gaussian random matrix, a Hadamard random matrix and the like. The processing principle adopted for key frames and non-key frames is the same, except that the sampling rate is different, and a higher M is usually adopted for key frames to serve as a reference signal in the corresponding image group to provide a reference for estimation of non-key frame image blocks. The non-key frame adopts lower M, so that higher compression rate is realized, and the load of an acquisition end is reduced. And outputting the image observation value generated after sampling by the sampling end as image compression data of the sampling end.

According to the compressed sensing theory, the image observed value y_k,jCan be expressed as the following equation (2):

y_k,j＝φx_k,j＝φΨθ_k,jformula (2)

Phi psi is called the observation matrix, psi is the redundant dictionary, theta_k,jIs x_k,jSparse representation coefficients under a dictionary Ψ, wherein the redundant dictionary Ψ and the random sampling matrix Φ are both known quantities, and the sampling process of the above formula (1) and formula (2) is equivalent to using Φ to x_k,jPerforming domain transformation, and distinguishing two domains before and after transformation by using x_k,jThe domain in which y is located is called the signal domain_k,jThe domain in which is referred to as the transform domain. The redundant dictionary Ψ is used in the image reconstruction process at the reconstruction end.

And at a reconstruction end, reconstructing the key frame and the non-key frame in different modes. The key frames can be reconstructed by adopting the conventional compressed sensing algorithm, and because the key frames have higher sampling rate, the key frames can obtain better effect by adopting a K-SVD (K-singular value decomposition), an L1-Norm optimization algorithm, an MH-BCS-SPL and the like. In the invention, in the process of reconstructing the key frames, the influence of residual energy on the reconstruction of the non-key frames is fully considered, and an adaptive residual reconstruction algorithm is selected according to the size of the residual energy to reconstruct the key frames, wherein the specific processing process is detailed later.

Referring to fig. 3, which is a flowchart of a method for reconstructing a video compression signal according to the present invention, the method may include:

s01: according to the image observation value y of the key frame and the image observation value y of the non-key frame output by the acquisition end_k,jPerforming preliminary prediction on the image of the non-key frame to generate a predicted image value of the non-key frame

Specifically, the image of the non-key frame is preliminarily predicted to generate the predicted image value of the non-key frame

The processing procedure of (2) may include:

calculating an image prediction value of a non-key frame using the following formula (3)

Wherein H_p,qIs for the image target block x_k,jEstimated block of components, w_p,qIs the weight coefficient corresponding to the component block, k, p are the frame index of the image block, k corresponds to the non-key frame in the same image group, p corresponds to the key frame and the non-key frame in the same image group, j, q are the image block index in the frame, the weight coefficient w is the weight coefficient_p,qCalculated using the following equation (4):

wherein, λ is a weighting factor of time prior, and is used for adjusting influence between associated frames. p1 is the frame index of the image block of the associated frame, the associated frame includes the adjacent frame of the non-key frame and the key frame in the same image group, p2 is the frame index of the image block of the non-associated frame, the non-associated frame includes other frames in the same image group except the aforementioned associated frame. The argmin function is the variable value at which the objective function takes a minimum value.

As can be seen from equation (4), the weight coefficient w_p,qIs determined by both the energy prior and the time prior of the image block. Determining a weight coefficient w by minimizing an energy difference between an observed value of an image block and an estimated value of the image block using image energy as prior information of image similarity_p,qMeanwhile, the time correlation is included in the formula (4), because of the continuity of the video image, the target frame and the associated frame generally have high similarity, in order to enhance the influence of the associated frame, the energy difference of the associated frame is calculated independently, the first term of the formula (4) is formed, the second term of the formula (4) is the energy difference of the non-associated frame, and the influence of the associated frame is adjusted through lambda. In the invention, the core of the method is to distinguish the influence of the image blocks at the same position in the same image group on the target image block through the p1 and the p2, the key frame and the associated frame have more reference information on the corresponding image block of the target frame, and the target frame of each non-key frame in the same image group uses the key frame as a reference, so that the key frame is used as the associated frame, namely the key frame is classified into p1 instead of p 2. The λ is a preset value, and the weighting factor λ can be set to different values according to different video image types, and for still images, such as a shot image for a still or an image with little change between frames, the λ>For a dynamic image, for example, a high-speed moving object exists in the image or the whole frame of image is dynamically changed, λ<0.5。

S02: image prediction values from the non-key frames

And image observations y of said non-key frames_k,jAnd calculating the observation residual error of the non-key frame by using a random sampling matrix phi used by the acquisition end

And prediction residual of non-key frames

In step S02, the observation residual of the non-key frame is calculated

The method can comprise the following steps:

image prediction values for non-key frames based on the random sampling matrix phi

Performing domain transformation to obtain predicted image value of transform domain

The domain transformation processing referred to herein is a transformation from a signal domain to a transform domain based on a random sampling matrix phi, and may specifically employ the following formula (5).

Then, the image prediction value of the transform domain is calculated

Image observation value y of non-key frame_k,jSubtracting to obtain the observed residual error of the non-key frame

In addition, in step S02, the prediction residual of the non-key frame is calculated

The method can comprise the following steps:

based on the random sampling matrixPhi, for the observed residual error

Performing inverse domain transform to obtain the prediction residual of the non-key frame

The domain inverse transform processing here refers to transformation from a transform domain to a signal domain based on a random sampling matrix phi, and may specifically adopt the following formula (6):

wherein phi is^-1Is the inverse of the random sampling matrix phi.

S03: computing observed residuals for the non-key frames

And selecting an adaptive residual reconstruction algorithm according to the average energy of the residual signal, and predicting the residual

Reconstructing to generate reconstructed residual error of non-key frame

In step S03, the observation residual of the non-key frame is calculated

The residual signal average energy of (a) includes: calculating the residual signal average energy according to the following formula (7):

wherein, R is the average energy of the residual signal, k is the frame index of the image block, j is the image block index within the frame, L is the index of the non-zero residual signal, and L is the number of non-zero components in the residual signal.

In step S03, an adaptive residual reconstruction algorithm is selected according to the average energy of the residual signal, and the prediction residual is subjected to

Reconstructing to generate reconstructed residual error of non-key frame

The method comprises the following steps:

according to the relationship between preset judgment threshold values T1 and T2 and the average energy R of the residual error signal, one of the following formulas is selected, and the sparse expression coefficient of the reconstructed residual error is calculated

If R is ≦ T1, then it is calculated using equation (8)

If T1 < R ≦ T2, then calculate using equation (9)

If R > T2, then calculate using equation (10)

Wherein, in

Prediction residual for non-key frames

The sparse representation coefficient of (a) is,

for observing the residual error, λ 1 and λ 2 are weight factors for balancing the effects of similarity and difference between the predicted image and the observed image, the decision thresholds T1 and T2 are related to the mean atomic energy of the dictionaries, and the decision thresholds T1 and T2 are proportional to the mean atomic energy of the redundant dictionary Ψ.

R is less than or equal to T1, corresponding to the situation that the average energy of residual signals is lower, the similarity between the predicted information and the observed information is high, namely the predicted value of the image in the transform domain

And the image observed value y_k,jThe method has high similarity, and residual high-frequency information (difference between observation and prediction) is restrained by adopting 2 norms, so that the consistency of the images is ensured.

T1 < R ≦ T2 corresponds to the situation that the average energy of the residual signal is higher, which indicates that the difference between the predicted information and the observed information is more obvious, i.e. the predicted value of the image in the transform domain

And the image observed value y_k,jA relatively significant difference occurs, at which point the combined 2-norm and 1-norm is used to balance the similar and different parts.

R > T2 corresponds to the situation that the average energy of residual signals is extremely high, which shows that the difference between the prediction information and the observed value is very obvious, namely the predicted value of the image in the transform domain

And the image observed value y_k,jThe difference is very obvious, and at the moment, the difference effect is enhanced by adopting a 1-mode.

Then, the reconstructed residual of the non-key frame is calculated using the following equation (11)

Where Ψ is a redundant dictionary.

In the residual reconstruction processing in step S03 described above, the features of the sparse representation coefficients of the prediction residual are fully considered. And evaluating the characteristics of sparse expression coefficients of the predicted residual based on the prior information of positive correlation between the difference between the predicted information and the observed information and the average energy value of the residual signal, thereby selecting different residual reconstruction algorithms to realize high-quality reconstruction effect.

S04: image prediction values from the non-key frames

And reconstructed residual of non-key frames

Computing image reconstruction values for non-key frames

Specifically, step S04 may calculate an image reconstruction value of a non-key frame using the following formula (12)

Wherein the content of the first and second substances,

is the image prediction value of the non-key frame.

Through the above steps S01 to S04, the image reconstruction value of the non-key frame is calculated, and the non-key frame image can be generated based on the image reconstruction value

In addition, the video compression and reconstruction algorithm of the present invention may further include an image reconstruction process of a key frame, and specifically, the video compression and reconstruction algorithm of the present invention may further include:

s05: and calculating an image reconstruction value of the key frame according to the image observation value of the key frame. The process of calculating the image reconstruction values of the key frames may be performed in parallel with the calculation of the image reconstruction values of the non-key frames of the previous steps S01-S04, or may be performed sequentially as shown in fig. 3. As described above, the key frame reconstruction process can be implemented using the existing methods of K-SVD, L1-Norm optimization algorithm, MH-BCS-SPL, etc.

After the image reconstruction values of the key frame and the non-key frame are calculated, the complete video image reconstruction can be performed, and specifically, the video compression reconstruction algorithm of the invention can further include:

s06: respectively generating a key frame image and a non-key frame image by using the image reconstruction value of the non-key frame and the image reconstruction value of the key frame;

s07: and combining the key frame images and the non-key frame images according to the sequence of the frame sequences to generate reconstructed video images.

The above description is only a preferred embodiment of the present invention, and the embodiments are not intended to limit the scope of the present invention, so that all equivalent structural changes made by using the contents of the specification and the drawings of the present invention should be included in the scope of the present invention.

Claims

1. A method for reconstructing a compressed video signal, comprising:

2. The method for reconstructing a compressed video signal as claimed in claim 1, wherein the step S01 of performing preliminary prediction on the image of the non-key frame and generating the predicted image value of the non-key frame comprises:

wherein the content of the first and second substances,

3. A method of reconstructing a compressed video signal as claimed in claim 2, wherein λ > is 0.5 for still type pictures and λ <0.5 for motion type pictures.

4. The method for reconstructing a compressed video signal according to any one of claims 1 to 3, wherein in step S02, calculating the observation residuals of the non-key frames comprises:

5. The method of claim 4, wherein in step S02, the step of calculating the prediction residual of the non-key frame comprises:

6. The method for reconstructing a compressed video signal as claimed in claim 5, wherein in step S03, calculating the average energy of the residual signal of the observation residual of the non-key frame comprises:

wherein R is the average energy of the residual signal,

observed residual for non-key frames

7. The method of claim 6, wherein in step S03, an adaptive residual reconstruction algorithm is selected according to the average energy of the residual signal, and the reconstruction of the prediction residual of the non-key frame is performed, and the generating of the reconstructed residual of the non-key frame comprises:

if R is less than or equal to T1, then use

If T1 < R ≦ T2, then use

If R > T2, use

Wherein the content of the first and second substances,

for sparse representation of reconstructed residualThe coefficients of which are such that,

wherein the content of the first and second substances,

8. The method for reconstructing a compressed video signal as claimed in claim 7, wherein the step S04 comprises:

calculating an image reconstruction value of a non-key frame using the following formula

Wherein the content of the first and second substances,

for the image reconstruction values of non-key frames,

is the image prediction value of the non-key frame.

9. The method of reconstructing a compressed video signal according to claim 8, further comprising:

10. The method of reconstructing a compressed video signal according to claim 9, further comprising: