CN112616052B

CN112616052B - Method for reconstructing video compression signal

Info

Publication number: CN112616052B
Application number: CN202011461038.0A
Authority: CN
Inventors: 周涛; 李琛
Original assignee: Shanghai IC R&D Center Co Ltd; Shanghai IC Equipment Material Industry Innovation Center Co Ltd
Current assignee: Shanghai IC R&D Center Co Ltd; Shanghai IC Equipment Material Industry Innovation Center Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2023-03-28
Anticipated expiration: 2040-12-11
Also published as: CN112616052A

Abstract

The invention discloses a reconstruction method of a video compression signal, which comprises the following steps: according to the image observation values of the key frame and the non-key frame output by the acquisition end, carrying out preliminary prediction on the image of the non-key frame to generate an image prediction value of the non-key frame; calculating an observation residual error of the non-key frame and a prediction residual error of the non-key frame according to the image prediction value of the non-key frame, the image observation value of the non-key frame and a random sampling matrix used by an acquisition end; calculating the average energy of residual error signals of the observation residual errors of the non-key frames, selecting an adaptive residual error reconstruction algorithm according to the energy of the residual error signals, and reconstructing the predicted residual errors to generate reconstructed residual errors; and calculating an image reconstruction value of the non-key frame according to the image prediction value of the non-key frame and the reconstruction residual error of the non-key frame. The reconstruction method of the video compression signal can effectively improve the accuracy of residual error reconstruction, thereby improving the differentiation retention capacity of the video image and finally realizing the high-quality reconstruction of the video image.

Description

Method for reconstructing video compression signal

Technical Field

The present invention relates to the technical field of video compression processing, and in particular, to a method for reconstructing a video compression signal.

Background

The traditional representative video compression coding technology such as H.26x and MPEG is based on the Nyquist sampling theorem, sampling is carried out at a sampling rate which is at least twice as large as the bandwidth of a sampling signal, then high-complexity compression coding is carried out on the video signal, and finally the signal is transmitted and stored.

The compressed sensing theory proposed by Donoho, candes, tao, et al, 2006 provides a completely new signal sampling technique, which indicates that: for sparse or nearly sparse signals in a transform domain, the signal can be sampled at a frequency well below the nyquist sampling rate and reconstructed with great probability accurately. The compression sensing theory is innovative in that on one hand, the bottleneck of the traditional sampling rate is broken through, on the other hand, the compression sensing theory directly samples signals by designing an observation matrix, and the signals obtained by sampling are compressed, so that the sampling and compression are completed synchronously, the waste of sampling resources is effectively avoided, meanwhile, the signal reconstruction with high complexity is performed at a decoding end, and the equipment performance bottleneck of a sampling section can be effectively solved. The compressive sensing technology has attracted wide attention of academic circles at the beginning of its birth, and is also favored in the fields of wireless sensing, multimedia sensing networks and the like in the aspect of practical application.

With the development of the compressive sensing theory, numerous scholars deeply research the video compressive sensing technology and obtain various remarkable results, wherein a video reconstruction distributed compressive sensing framework based on prediction-residual errors is widely adopted, but the current video reconstruction algorithm of the prediction-residual errors only focuses on the improvement of the precision of the prediction algorithm, and the influence of residual error reconstruction on the final prediction performance is ignored. Since the reconstruction of most image frames is closely related to the residual error, if the accuracy of the residual error reconstruction is not high, it is difficult to improve the overall performance of the video compressed sensing algorithm.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned drawbacks of the prior art and providing a method for reconstructing a compressed video signal.

In order to realize the purpose, the technical scheme of the invention is as follows:

a method of reconstructing a compressed video signal, comprising:

s01: according to the image observation values of the key frame and the non-key frame output by the acquisition end, carrying out preliminary prediction on the image of the non-key frame to generate an image prediction value of the non-key frame;

s02: calculating an observation residual error of the non-key frame and a prediction residual error of the non-key frame according to the image prediction value of the non-key frame, the image observation value of the non-key frame and a random sampling matrix used by an acquisition end;

s03: calculating the average energy of residual error signals of the observation residual errors of the non-key frames, selecting an adaptive residual error reconstruction algorithm according to the energy of the residual error signals, reconstructing the predicted residual errors and generating reconstructed residual errors;

s04: and calculating the image reconstruction value of the non-key frame according to the image prediction value of the non-key frame and the reconstruction residual error of the non-key frame.

Further, in step S01, the preliminary predicting the image of the non-key frame, and generating the predicted image value of the non-key frame includes:

the image prediction value for the non-key frame is calculated using the following formula:

wherein the content of the first and second substances,

image prediction values for non-key frames, H _p,q Is a component block, w, for image target block estimation _p,q Is the weight coefficient corresponding to the component block, k, p are the frame index of the image block, k corresponds to the non-key frame in the same image group, p corresponds to the key frame and the non-key frame in the same image group, j, q are the image block index in the frame, the weight coefficient w is the weight coefficient _p,q Calculated using the following formula:

wherein, y _k,j Is the image observed value of the non-key frame, lambda is the weighting factor of time prior, is used for adjusting the influence between the associated frames, phi is the random sampling matrix, p1 is the frame index of the image block of the associated frame, and the associated frame comprises the adjacent frame of the non-key frame and the same imageThe key frame in the image group, p2 is the frame index of the image block of the non-associated frame.

Further, λ > =0.5 for static class images, and λ <0.5 for dynamic class images.

Further, in step S02, calculating the observation residual of the non-key frame includes:

performing domain transformation processing on the image prediction value of the non-key frame based on the random sampling matrix to obtain an image prediction value of a transformation domain;

and subtracting the image predicted value of the transform domain from the image observed value of the non-key frame to obtain an observation residual error.

Further, in step S02, calculating the prediction residual of the non-key frame includes:

and performing domain inverse transformation processing on the observation residual error based on the random sampling matrix to obtain a prediction residual error of the non-key frame.

Further, in step S03, calculating the average energy of the residual signal of the observation residual of the non-key frame includes:

calculating the residual signal average energy according to the following formula:

wherein R is the average energy of the residual signal,

observation residuals for non-key frames->

R is the average energy of the residual signal, k is the frame index of the image block, j is the image block index within the frame, L is the index of the non-zero residual signal, and L is the number of non-zero components in the residual signal.

Further, in step S03, selecting an adaptive residual reconstruction algorithm according to the magnitude of the residual signal average energy, and reconstructing the prediction residual of the non-key frame, wherein generating the reconstructed residual of the non-key frame includes:

according to the relation between preset judgment thresholds T1 and T2 and the average energy of the residual error signals, one of the following formulas is selected to calculate the sparse expression coefficient of the reconstructed residual error:

if R is less than or equal to T1, then use

/>

If T1 < R ≦ T2, use

If R > T2, use

Wherein, the first and the second end of the pipe are connected with each other,

for reconstructing sparse representation coefficients of the residual error, <' >>

Sparse representation coefficient for prediction residue of non-key frame, < > or >>

Lambda 1 and lambda 2 are weight factors for balancing the similarity and difference effects between the predicted image and the observed image,

the reconstructed residual of the non-key frame is calculated using the following formula:

Ψ is a redundant dictionary for the reconstructed residual of the non-key frame.

Further, step S04 includes:

the image reconstruction value of the non-key frame is calculated using the following formula:

image reconstruction value for a non-key frame, <' >>

Is the image prediction value of the non-key frame.

Further, the method further comprises:

and calculating an image reconstruction value of the key frame according to the image observation value of the key frame.

Further, the method further comprises:

respectively generating a key frame image and a non-key frame image by using the image reconstruction value of the non-key frame and the image reconstruction value of the key frame;

and combining the key frame image and the non-key frame image according to the sequence of the frame sequence to generate a reconstructed video image.

In the reconstruction method of the video compression signal, an adaptive residual error reconstruction algorithm is selected based on the average energy of residual error signals in the reconstruction process of non-key frames, and prediction residual errors are subjected to

And reconstructing, wherein the difference between the predicted information and the observed information is reflected by the average energy of the residual error signals, and different reconstruction algorithms are selected according to the difference degree, so that the accuracy of residual error reconstruction can be effectively improved, the differentiation retention capacity of the video image is improved, and the high-quality reconstruction of the video image is finally realized.

Drawings

FIG. 1 is a schematic diagram of the overall architecture of the video compression reconstruction algorithm of the present invention;

FIG. 2 is a schematic diagram of a group of pictures in successive frames of a video in accordance with the present invention;

fig. 3 is a flowchart of a method of reconstructing a video compressed signal according to the present invention.

Detailed Description

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

In the following detailed description of the embodiments of the present invention, in order to clearly illustrate the structure of the present invention and to facilitate explanation, the structure shown in the drawings is not drawn to a general scale and is partially enlarged, deformed and simplified, so that the present invention should not be construed as limited thereto.

For the convenience of description of the technical scheme, the following part reference numbers are explained as follows:

image observation value: the image signal detected directly by the acquisition-side detector is used as y in the present invention _k,j An image observation representing a non-key frame;

image prediction value: image signals predicted based on image observations, used in the invention

A picture prediction value representing a non-key frame;

observed residual of non-key frames

The difference between the image observation signal and the image prediction signal after the transformation is used in the invention>

Represents the observed residual of the non-key frame;

prediction residual of non-key frames

The difference between the image observation signal and the image prediction signal after transformation is used in the present invention/>

Represents the prediction residual of the non-key frame;

reconstructed residual of non-key frames

Observation residual->

And combines the signal obtained by the segmented reconstruction algorithm>

And then the following formula is used for calculation: />

Wherein it is present>

For predicting residual non-key frames

Sparse representation coefficients of (a);

image reconstruction value: the video signal that finally needs to be presented, i.e. the image signal after decompression, is used in the present invention

Representing image reconstruction values of non-key frames.

Referring to fig. 1, it is a schematic diagram of the overall architecture of the video compression reconstruction algorithm of the present invention. The whole process of video compression and reconstruction can be divided into video compression processing at an acquisition end and video reconstruction processing at a reconstruction end. The acquisition end and the reconstruction end can be distributed on different devices, for example, the acquisition end can be arranged on a video server, the reconstruction end can be arranged on a mobile terminal, video signals are compressed at the acquisition end and then transmitted to the mobile terminal through the internet, and the reconstruction end on the mobile terminal reconstructs video data to generate video frames and plays the video frames.

Referring to fig. 2, it is a schematic diagram of the group of pictures in the video consecutive frames of the present invention. At the acquisition end, the video signal is split into continuous image groups, each image group consists of a fixed number of continuous image frames, the first frame of each image group is a key frame, and other frames are non-key frames. At the acquisition end, a block compression sensing algorithm is adopted to compress the key frames and the non-key frames, and the block compression sensing algorithm has the advantages of being less in storage space requirement and capable of well retaining image characteristics.

Taking a non-key frame as an example, assuming that the size of the original image to be sampled is N × N, the image block size may be set to N ^1/2 ×N ^1/2 If k represents the frame index and j represents the image block index in the frame, the image signal value after sampling the original image is represented as x _k,j . For image signal value x _k,j After compressed sensing sampling is carried out, an image observation value is obtained and is expressed as y _k,j 。

The sampling process can be expressed by the following equation (1):

y _k,j ＝φx _k,j formula (1)

Phi is a random sampling matrix of M multiplied by N, M < < N, namely the observation quantity is far smaller than the quantity of discrete components of the image signal, the sampling rate is M/N, and the random sampling matrix can adopt a Gaussian random matrix, a Hadamard random matrix and the like. The key frames and non-key frames are processed according to the same principle, but at different sampling rates, and a higher M is usually used for the key frames as a reference signal in the corresponding image group to provide a reference for estimation of the image blocks of the non-key frames. The non-key frame adopts lower M, so that higher compression rate is realized, and the load of an acquisition end is reduced. And outputting the image observation value generated after sampling by the sampling end as image compression data of the sampling end.

According to the compressed sensing theory, the image observed value y _k,j Can be expressed as the following equation (2):

y _k,j ＝φx _k,j ＝φΨθ _k,j formula (2)

φΨCalled the observation matrix, Ψ is a redundant dictionary, θ _k,j Is x _k,j Sparse representation coefficients under a dictionary Ψ, wherein the redundant dictionary Ψ and the random sampling matrix Φ are both known quantities, and the sampling process of the above formula (1) and formula (2) is equivalent to using Φ to x _k,j Performing domain transformation, and distinguishing two domains before and after transformation by using x _k,j The domain in which y is located is called the signal domain _k,j The domain in which is referred to as the transform domain. The redundant dictionary Ψ is used in the image reconstruction process at the reconstruction end.

And at a reconstruction end, reconstructing the key frame and the non-key frame in different modes. The key frame can be reconstructed by adopting the conventional compressed sensing algorithm, and because the key frame has a higher sampling rate, the key frame can obtain a better effect by adopting a K-SVD (K-singular value decomposition), L1-Norm optimization algorithm, MH-BCS-SPL (hybrid binary coded decimal) algorithm and the like. In the invention, in the process of reconstructing the key frames, the influence of residual energy on the reconstruction of the non-key frames is fully considered, and an adaptive residual reconstruction algorithm is selected according to the size of the residual energy to reconstruct the key frames, wherein the specific processing process is detailed later.

Referring to fig. 3, which is a flowchart of a method for reconstructing a video compression signal according to the present invention, the method may include:

s01: according to the image observation value y of the key frame and the image observation value y of the non-key frame output by the acquisition end _k,j Performing preliminary prediction on the image of the non-key frame to generate a predicted image value of the non-key frame

Specifically, the image of the non-key frame is preliminarily predicted, and the image prediction value->

The processing procedure of (2) may include:

calculating an image prediction value of a non-key frame using the following formula (3)

Wherein H _p,q Is for the image target block x _k,j Estimated block of components, w _p,q Is the weight coefficient corresponding to the component block, k, p are the frame index of the image block, k corresponds to the non-key frame in the same image group, p corresponds to the key frame and the non-key frame in the same image group, j, q are the image block index in the frame, the weight coefficient w is _p,q Calculated using the following equation (4):

wherein, λ is a weighting factor of time prior, and is used for adjusting influence between associated frames. p1 is the frame index of the image block of the associated frame, the associated frame comprises the adjacent frame of the non-key frame and the key frame in the same image group, p2 is the frame index of the image block of the non-associated frame, and the non-associated frame comprises other frames in the same image group except the associated frame. The argmin function is the variable value at which the objective function takes a minimum value.

As can be seen from equation (4), the weight coefficient w _p,q Is determined by both the energy prior and the time prior of the image block. Determining a weight coefficient w by minimizing an energy difference between an observed value of an image block and an estimated value of the image block using image energy as prior information of image similarity _p,q Meanwhile, the time correlation is included in the formula (4), because of the continuity of the video image, the target frame and the associated frame generally have high similarity, in order to enhance the influence of the associated frame, the energy difference of the associated frame is calculated independently, the first term of the formula (4) is formed, the second term of the formula (4) is the energy difference of the non-associated frame, and the influence of the associated frame is adjusted through lambda. In the present invention, the p1 and p2 cores are used for distinguishing the same position in the same image groupThe image blocks influence the size of the target image blocks, the key frames and the associated frames have more reference information on the corresponding image blocks of the target frames, and the target frames of each non-key frame in the same image group use the key frames as references, so that the key frames are used as the associated frames, namely the key frames are classified into p1 instead of p2. The λ is a preset value, and the weighting factor λ can be set to different values according to different video image types, and for still images, such as a shot image for a still or an image with little change between frames, the λ>=0.5, for dynamic images, e.g. high speed moving objects in the image or the entire frame of image is dynamically changing, λ<0.5。

S02: image prediction values from the non-key frames

And image observations y of said non-key frames _k,j And a random sampling matrix phi used by the acquisition end, calculating the observation residual of the non-key frame->

And prediction residue of non-key frame->

Wherein, in step S02, the observation residual of the non-key frame is calculated

The method can comprise the following steps:

image prediction values for non-key frames based on the random sampling matrix phi

Performing domain transformation to obtain the image prediction value in the transformation domain>

The domain transformation processing referred to herein is a transformation from a signal domain to a transform domain based on a random sampling matrix phi, and may specifically beThe following formula (5) is used.

Then, the image prediction value of the transform domain is calculated

Image observation value y of non-key frame _k,j Subtract to obtain the observed residual ^ of the non-key frame>

Furthermore, in step S02, the prediction residual of the non-key frame is calculated

The method can comprise the following steps:

based on the random sampling matrix phi, observing residual errors

Performing inverse domain transform to obtain prediction residue ^ of non-key frame>

The domain inverse transform processing here refers to transformation from a transform domain to a signal domain based on a random sampling matrix phi, and may specifically adopt the following formula (6):

wherein phi is ^-1 Is the inverse of the random sampling matrix phi.

S03: computing observed residuals for the non-key frames

And selecting the adapted residual according to the magnitude of the average energy of the residual signalDifference reconstruction algorithm for predicting residual->

Performs a reconstruction to generate a reconstructed residual ≥ of non-key frames>

In step S03, the observation residual of the non-key frame is calculated

The average energy of the residual signal of (2) includes: calculating the residual signal average energy according to the following formula (7):

wherein, R is the average energy of the residual signal, k is the frame index of the image block, j is the index of the image block in the frame, L is the index of the non-zero residual signal, and L is the number of the non-zero components in the residual signal.

In step S03, an adaptive residual error reconstruction algorithm is selected according to the average energy of the residual error signal, and the prediction residual error is subjected to

The method comprises the following steps: />

According to the relation between preset judgment threshold values T1 and T2 and the average energy R of the residual error signal, one of the following formulas is selected, and the sparse expression coefficient of the reconstructed residual error is calculated

If R is less than or equal to T1, then calculate using equation (8)

If T1 < R ≦ T2, then use equation (9) to calculate

If R > T2, then calculate using equation (10)

Wherein, in

Prediction residue for non-key frame->

Sparse representation coefficient of (4), based on the expression value of (4)>

For observing residual errors, lambda 1 and lambda 2 are weight factors for balancing the effects of similarity and difference between the predicted image and the observed image, the decision thresholds T1 and T2 are related to the average atomic energy of the dictionaries, and the decision thresholds T1 and T2 are in direct proportion to the average atomic energy of the redundant dictionary Ψ.

R is less than or equal to T1 corresponding to the situation that the average energy of residual signals is lower, which shows that the similarity between the predicted information and the observed information is high, namely the predicted value of the image in the transform domain

And the image observation value y _k,j And the similarity is high, and at the moment, 2 norms are adopted to inhibit residual high-frequency information (difference between observation and prediction), so that the consistency of the images is ensured.

T1 < R < T2 corresponds to the situation that the average energy of residual signals is higher, which shows that the prediction information and the observation information have obvious difference, namely the image prediction value of the transform domain

And the image observed value y _k,j A relatively significant difference occurs, at which point the combined 2-norm and 1-norm is used to balance the similar and different parts.

R > T2 corresponds to the situation that the average energy of residual signals is extremely high, and shows that the difference between the predicted information and the observed value is very obvious, namely the predicted value of the image in the transform domain

And the image observation value y _k,j The difference is very obvious, and at the moment, the difference effect is enhanced by adopting a 1-mode.

Then, the reconstructed residual of the non-key frame is calculated using the following formula (11)

Where Ψ is a redundant dictionary.

In the residual reconstruction processing in step S03 described above, the features of the sparse representation coefficients of the prediction residual are sufficiently considered. And evaluating the characteristics of sparse expression coefficients of the predicted residual based on the prior information of positive correlation between the difference between the predicted information and the observed information and the average energy value of the residual signal, thereby selecting different residual reconstruction algorithms to realize high-quality reconstruction effect.

S04: image prediction values from the non-key frames

And reconstructed residual ≥ of non-key frames>

Calculating an image reconstruction value ≥ of non-key frames>

Specifically, step S04 may calculate the image reconstruction value ≦ for the non-key frame using equation (12) below>

Wherein the content of the first and second substances,

is the image prediction value of the non-key frame.

Through the above steps S01 to S04, the image reconstruction value of the non-key frame is calculated, and the non-key frame image can be generated based on the image reconstruction value

In addition, the video compression and reconstruction algorithm of the present invention may further include an image reconstruction process of a key frame, and specifically, the video compression and reconstruction algorithm of the present invention may further include:

s05: and calculating an image reconstruction value of the key frame according to the image observation value of the key frame. The process of calculating the image reconstruction values of the key frames may be performed in parallel with the calculation of the image reconstruction values of the non-key frames of the previous steps S01 to S04, or may be performed sequentially as shown in fig. 3. As described above, the key frame reconstruction process can be implemented using the existing methods of K-SVD, L1-Norm optimization algorithm, MH-BCS-SPL, etc.

After the image reconstruction values of the key frame and the non-key frame are calculated, the complete video image reconstruction can be performed, and specifically, the video compression reconstruction algorithm of the invention can further include:

s06: respectively generating a key frame image and a non-key frame image by using the image reconstruction value of the non-key frame and the image reconstruction value of the key frame;

s07: and combining the key frame images and the non-key frame images according to the sequence of the frame sequences to generate reconstructed video images.

The above description is only a preferred embodiment of the present invention, and the embodiments are not intended to limit the scope of the present invention, so that all equivalent structural changes made by using the contents of the specification and the drawings of the present invention should be included in the scope of the present invention.

Claims

1. A method for reconstructing a compressed video signal, comprising:

s01: the method comprises the steps that a video signal is split into a plurality of image groups, the first frame of each image group is a key frame, the rest frames are non-key frames, according to image observation values of the key frame and the non-key frames output by an acquisition end, the images of the non-key frames are subjected to preliminary prediction, an image prediction value of the non-key frames is generated, and the sampling rate of the image observation values of the key frames is greater than that of the image observation values of the non-key frames;

2. The method of claim 1, wherein the step S01 of performing preliminary prediction on the image of the non-key frame and generating the predicted image value of the non-key frame comprises:

wherein the content of the first and second substances,

wherein, y _k,j The image observation value of the non-key frame is obtained, lambda is a time prior weight factor used for adjusting the influence between the associated frames, phi is a random sampling matrix, p1 is the frame index of the image block of the associated frame, the associated frame comprises the adjacent frame of the non-key frame and the key frame in the same image group, and p2 is the frame index of the image block of the non-associated frame.

3. The method of claim 2, wherein λ > =0.5 for still-like images and λ <0.5 for moving-like images.

4. The method for reconstructing a compressed video signal according to any of claims 1 to 3, wherein in step S02, calculating the observation residuals of the non-key frames comprises:

performing domain transformation processing on the image prediction value of the non-key frame based on the random sampling matrix to obtain the image prediction value of a transformation domain;

5. The method of claim 4, wherein in step S02, calculating the prediction residual of the non-key frame comprises:

6. The method of claim 5, wherein the step S03 of calculating the average energy of residual signals of the observation residuals of the non-key frames comprises:

wherein R is the average energy of the residual signal,

observed residual->

R is the average energy of the residual signal, k is the frame index of the image block, j is the index of the image block within the frame, L is the index of the non-zero residual signal, and L is the number of non-zero components in the residual signal.

7. The method according to claim 6, wherein in step S03, an adaptive residual reconstruction algorithm is selected according to the average energy of the residual signals, and the prediction residual of the non-key frame is reconstructed, and generating the reconstructed residual of the non-key frame comprises:

if R is less than or equal to T1, then use

If T1 is more than R and less than or equal to T2, the use is carried out

If R > T2, use

Wherein the content of the first and second substances,

Sparse representation coefficients for the prediction residual of non-key frames,

is the observation residual of the non-key frame, lambda 1 and lambda 2 are weight factors for balancing the similarity and difference effects between the predicted image and the observation image,

wherein the content of the first and second substances,

for reconstructed residual of non-key frames, Ψ is redundancyAnd (4) remaining dictionaries.

8. The method of claim 7, wherein step S04 comprises:

calculating an image reconstruction value of a non-key frame using the following formula

Wherein the content of the first and second substances,

image reconstruction values for non-key frames>

Is the image prediction value of the non-key frame.

9. The method of reconstructing a compressed video signal according to claim 8, further comprising:

10. The method for reconstructing a compressed video signal according to claim 9, further comprising:

and combining the key frame images and the non-key frame images according to the sequence of the frame sequences to generate reconstructed video images.