WO2017092072A1

WO2017092072A1 - Distributed video encoding framework

Info

Publication number: WO2017092072A1
Application number: PCT/CN2015/097220
Authority: WO
Inventors: 程德强; 刘海; 张国鹏; 寇旗旗; 张剑英
Original assignee: 中国矿业大学
Priority date: 2015-12-04
Filing date: 2015-12-12
Publication date: 2017-06-08
Also published as: CN105430406A; CN105430406B

Abstract

A distributed video encoding framework, comprising: a basic viewpoint, an enhanced viewpoint, a Wyner-Ziv encoder, a Wyner-Ziv decoder, a first intra-frame encoder, a first intra-frame decoder, a time side information generation module, a second intra-frame encoder, a second intra-frame decoder, a space side information generation module, a fusion module and a reconstruction module. The basic viewpoint and the enhanced viewpoint are collection devices. The Wyner-Ziv encoder and decoder, the first intra-frame encoder and decoder and the second intra-frame encoder and decoder respectively encode and decode a first Wyner-Ziv frame, a first key frame and a second key frame. The time side information generation module and the space side information generation module respectively generate a time side information frame and a space side information frame. After the fusion module fuses the time side information frame and the space side information frame, the reconstruction module reconstructs an image. The method can be adapted to a severe and complex environment, has relatively high fault tolerance and universal applicability, and can be widely applied in the field of mining industry.

Description

A distributed video coding framework

Technical field

The present invention relates to image processing techniques, and more particularly to a distributed video coding framework.

Background technique

In a complex and harsh mine, the wireless sensor network (WSN) uses a large number of energy-constrained micro-nodes to collect, transmit and process mine environmental information, so that management dispatchers can understand the situation in real time. However, in the rescue work after the safe and efficient production of coal or the occurrence of mine disasters, the information obtained by the traditional sensor network can no longer meet the comprehensive needs of management dispatchers for information acquisition. At present, the wireless video sensor network (WVSN) has received a lot of attention from researchers because it can acquire rich multimedia information such as images and videos.

In the wireless video sensor network, because the transmission information mainly involves audio information or video information, and the storage and processing capabilities of a single sensor node are severely limited, the efficient implementation of compression coding of multimedia information has become an important aspect of WVSN research. In wireless video sensor networks for different applications, the coding methods are different due to different node correlation models and working mechanisms; that is, there is no efficient and universal coding method for wireless video sensor networks in various applications. . In particular, in the case of long and narrow mine roadway and heavy-duty electromechanical equipment, the random deployment of coding nodes cannot be realized. Moreover, the mine roadway has inherent characteristics such as severe electromagnetic interference and poor wireless channel quality, which makes the current coding methods not Suitable for use in high noise, unreliable channels.

Thus, in the prior art, there is no widely applicable distributed video coding framework that can be applied to complex and harsh environments with high fault tolerance.

Summary of the invention

In view of this, the main object of the present invention is to provide a widely applicable distributed video coding framework that can be applied to a complex and harsh environment with high fault tolerance.

In order to achieve the above object, the technical solution proposed by the present invention is:

A distributed video coding framework includes: a base view point, an enhanced view point, a Wyner-Ziv encoder, a Wyner-Ziv decoder, a first intra encoder, a first intra decoder, a time side information generating module, and a second An intra encoder, a second intra decoder, a spatial side information generating module, a fusion module, and a reconstruction module; wherein

a base view for collecting the first environment video image, dividing the first environment video image into the first Wyner-Ziv frame and the first key frame according to the sequence number of the first environment video image, and the first Wyner-Ziv frame, the first The key frames are sent to the Wyner-Ziv encoder and the first intra encoder respectively.

And enhancing the viewpoint for collecting the second environment video image, dividing the second environment video image into the second Wyner-Ziv frame and the second key frame according to the sequence number of the second environment video image, and sending the second key frame to the second frame Inner encoder.

The Wyner-Ziv encoder is configured to perform a discrete cosine transform on the first Wyner-Ziv frame transmitted by the base view to remove the inter-pixel correlation, and perform channel coding on the bit plane formed by quantizing the transform coefficients, and obtain the Wyner- Ziv coded frames are sent over the wireless channel to the Wyner-Ziv decoder.

Wyner-Ziv decoder for Wyner-Ziv sent to Wyner-Ziv encoder The encoded frame is decoded and the Wyner-Ziv decoded frame is sent to the reconstruction module.

The first intra-frame encoder is configured to perform H.264 intra-frame coding on the first key frame sent by the base view, and send the obtained first key coded frame to the first intra-frame decoder through the wireless channel.

The first intra-frame decoder is configured to perform H.264 intra-frame decoding on the first key coded frame sent by the first intra-frame encoder, and send the obtained first key-decoded frame to the time-side information generating module.

a time side information generating module, configured to perform preprocessing, block matching, and bidirectional motion interpolation on two consecutive first key decoding frames from the first intra decoder, and send the generated time side information frame to the fusion Module.

And a second intra-frame encoder, configured to perform H.264 intra-frame coding on the second key frame sent by the enhanced view, and send the obtained second key coded frame to the second intra-frame decoder through the wireless channel.

And a second intra-frame decoder, configured to perform H.264 intra-frame decoding on the second key coded frame sent by the second intra-frame encoder, and send the obtained second key-decoded frame to the spatial information generating module.

The spatial side information generating module is configured to perform motion estimation according to the second key decoding frame sent by the decoder in the second frame, and send the obtained initial spatial side information frame to the fusion module.

The fusion module is configured to map the initial spatial side information frame sent by the spatial side information generating module to the basic view point according to the correlation between the basic view point and the enhanced view point, obtain the mapped spatial side information frame, and adopt average interpolation After the information is fused by the time side information frame sent by the time side information generating module and the mapping space side information frame, The resulting fused information frame is sent to the reconstruction module.

The reconstruction module is configured to filter the fusion information frame sent by the fusion module, and perform image reconstruction according to the Wyner-Ziv decoding frame and the filtered fusion information frame sent by the Wyner-Ziv decoder.

In summary, in the distributed video coding framework of the present invention, video images are simultaneously acquired by the base view point and the enhanced view point, and the base view point is used as the main acquisition device, and the enhanced view point is used as the auxiliary acquisition device; and, in the narrow mine roadway The base view is placed in parallel with the enhanced view such that the corresponding core lines between the base view and the enhanced view point are parallel to each other and on the same image horizontal scan line. Thus, the base view and the enhanced view are deployed in the mine roadway like the two eyes of the human. The video image captured by the base view is divided into a Wyner-Ziv frame and a first key frame, and the Wyner-Ziv frame is sent to the monitoring room for decoding by encoding; the first key frame is also sent to the monitoring room for decoding by encoding, and is used for generating Time-side information; the second key frame code is extracted from the video image acquired by the enhanced viewpoint, sent to the monitoring room for decoding, and used to generate initial spatial side information corresponding to the enhanced viewpoint. After the temporal side information and the initial spatial side information are preprocessed in the fusion module, the initial spatial side information is mapped to the mapping spatial side information corresponding to the basic viewpoint according to the correlation between the basic viewpoint and the enhanced viewpoint; thus, the time is After the side information is merged with the mapping space side information, the reconstructed module reconstructs and reproduces the video image in the mine roadway. The distributed video coding framework of the present invention draws on the characteristics of the human visual system, and uses the video image acquired by the enhanced viewpoint adjacent to the basic viewpoint as a reference image, thereby avoiding the reconstruction of the reconstructed image in the monitoring room due to the incompleteness of the collected video information. a problem of poor quality; in addition, due to the distributed video coding framework of the present invention, The video image captured by the base view is divided into Wyner-Ziv frames and the first key frame, and then respectively encoded and decoded, and only the second key frame extracted from the video image acquired by the enhanced view is encoded and decoded. The invention also has high coding efficiency and decoding quality. In summary, the distributed video coding framework of the present invention can adapt to harsh environments and has high fault tolerance and universal applicability.

DRAWINGS

1 is a schematic diagram showing the structure of a distributed video coding framework according to the present invention.

2 is a schematic diagram showing the structure of a time side information generating module according to the present invention.

FIG. 3 is a schematic diagram showing the structure of a space side information generating module according to the present invention.

4 is a schematic view showing the structure of a fusion module according to the present invention.

detailed description

The present invention will be further described in detail below with reference to the drawings and specific embodiments.

1 is a schematic diagram showing the structure of a distributed video coding framework according to the present invention. As shown in FIG. 1, the coding framework of the present invention includes: a base view 1, an enhancement view 2, a Wyner-Ziv encoder 3, a Wyner-Ziv decoder 4, a first intra encoder 5, and a first intra decoder. 6. The time side information generating module 9, the second intra encoder 7, the second intra decoder 8, the spatial side information generating module 10, the fusion module 11, and the reconstruction module 12;

The base view 1 is configured to collect the first environment video image, and divide the first environment video image into the first Wyner-Ziv frame and the first key frame according to the sequence number of the first environment video image, and the first Wyner-Ziv frame, the first A key frame is sent to the Wyner-Ziv encoder 3 and the first intra encoder 5, respectively.

The enhancement view 2 is configured to collect the second environment video image, divide the second environment video image into the second Wyner-Ziv frame and the second key frame according to the sequence number of the second environment video image, and send the second key frame to the second Intra encoder 7.

In practical applications, the base view 1 is the main acquisition device; the enhanced view 2 is the auxiliary acquisition device, such as 1 frame / 1 second or 1 frame / 2 seconds. For the group of pictures acquired by the base view 1 and the enhanced view 2, the video frames constituting the picture group are usually divided into key frames and Wyner-Ziv frames according to the size of the picture group. In general, the number of frames of a video frame constituting a picture group is 2, a video frame numbered as an odd number is used as a key frame, and a video frame numbered as an even number is used as a Wyner-Ziv frame. In practical applications, the odd-numbered video frames can also be used as Wyner-Ziv frames, and the even-numbered video frames can be used as key frames.

The Wyner-Ziv encoder 3 is configured to perform a discrete cosine transform for removing the inter-pixel correlation on the first Wyner-Ziv frame transmitted by the base view 1, and perform channel coding on the bit plane formed by quantizing the transform coefficients, and obtain the obtained The Wyner-Ziv coded frame is sent to the Wyner-Ziv decoder 4 over the wireless channel.

The Wyner-Ziv decoder 4 is configured to decode the Wyner-Ziv encoded frame transmitted by the Wyner-Ziv encoder 3 and transmit the Wyner-Ziv decoded frame to the reconstruction module 12.

The first intra-frame encoder 5 is configured to perform H.264 intra-frame coding on the first key frame sent by the base view 1, and send the obtained first key coded frame to the first intra-frame decoder 6 through the wireless channel.

The first intra-frame decoder 6 is configured to perform H.264 intra-frame decoding on the first key coded frame sent by the first intra-frame encoder 5, and send the obtained first key-decoded frame to the time-side information generating module 9 .

The time side information generating module 9 is configured to sequentially perform preprocessing, block matching, and bidirectional motion interpolation on two consecutive first key decoding frames from the first intra decoder 6 to send the generated time side information frame. To the fusion module 11.

The second intra-frame encoder 7 is configured to perform H.264 intra-frame coding on the second key frame transmitted by the enhanced view point 2, and send the obtained second key coded frame to the second intra-frame decoder 8 through the wireless channel.

The second intra-frame decoder 8 is configured to perform H.264 intra-frame decoding on the second key coded frame sent by the second intra-frame encoder 7, and send the obtained second key-decoded frame to the spatial information generating module 10.

The spatial side information generating module 10 is configured to perform motion estimation according to the second key decoding frame sent by the decoder 8 in the second intraframe, and send the obtained initial spatial side information frame to the fusion module 11.

The fusion module 11 is configured to map the initial spatial side information sent by the spatial side information generating module 10 to the base view 1 according to the correlation between the base view 1 and the enhanced view 2, to obtain the mapped space side information, and adopt The average interpolation method performs information fusion on the time side information frame and the mapping space side information frame sent by the time side information generating module 9, and then sends the obtained fusion information frame to the reconstruction module 12.

The reconstruction module 12 is configured to filter the fusion information frame sent by the fusion module 11, and perform image reconstruction according to the Wyner-Ziv decoding frame and the filtered fusion information frame sent by the Wyner-Ziv decoder 4.

In the present invention, the image reconstruction based on the Wyner-Ziv decoded frame and the filtered fused information frame is prior art, and details are not described herein again.

In summary, in the distributed video coding framework of the present invention, video images are simultaneously acquired by the base view point and the enhanced view point, and the base view point is used as the main acquisition device, and the enhanced view point is used as the auxiliary acquisition device; and, in the narrow mine roadway, the basic The viewpoint is placed in parallel with the enhancement viewpoint such that the corresponding epipolar lines between the video images acquired by the base viewpoint and the enhancement viewpoint are parallel to each other and are located on the same image horizontal scanning line. Thus, the base view and the enhanced view are deployed in the mine roadway like the two eyes of the human. The video image captured by the base view is divided into a Wyner-Ziv frame and a first key frame, and the Wyner-Ziv frame is sent to the monitoring room for decoding by encoding; the first key frame is also sent to the monitoring room for decoding by encoding, and is used for generating Time-side information; the second key frame code is extracted from the video image acquired by the enhanced viewpoint, sent to the monitoring room for decoding, and used to generate initial spatial side information corresponding to the enhanced viewpoint. After the temporal side information and the initial spatial side information are preprocessed in the fusion module, the initial spatial side information is mapped to the mapping spatial side information corresponding to the basic viewpoint according to the correlation between the basic viewpoint and the enhanced viewpoint; thus, the time is After the side information is merged with the mapping space side information, the reconstructed module reconstructs and reproduces the video image in the mine roadway. The distributed video coding framework of the present invention draws on the characteristics of the human visual system, and uses the video image acquired by the enhanced viewpoint adjacent to the basic viewpoint as a reference image, thereby avoiding the reconstruction of the reconstructed image in the monitoring room due to the incompleteness of the collected video information. The problem of poor quality; in addition, in the distributed video coding framework of the present invention, after the video images collected by the base view are divided into Wyner-Ziv frames and first key frames, they are respectively encoded and decoded, and only The second key frame extracted from the video image acquired by the enhanced viewpoint is encoded and decoded, so the present invention also has high coding efficiency and decoding quality.

2 is a schematic diagram showing the structure of a time side information generating module according to the present invention. As shown in FIG. 2, the time information generating module 9 of the present invention includes: a first pre-processing unit 91, a first block matching unit 92, and a time side information generating unit 93;

The first pre-processing unit 91 is configured to perform low-pass filtering processing on the two consecutive first key decoding frames from the first intra decoder 6 to divide the obtained two consecutive first key filtering frames into More than fifty basic macroblocks of size M×N are transmitted, and each basic macroblock is sent to the first block matching unit (92); wherein, M and N both represent the number of pixel points and are natural numbers.

The first block matching unit 92 is configured to perform a search according to MSE(i,j)≤δ in each basic macroblock sent by the first pre-processing unit 91, and send the searched two matching basic macroblocks to each other. To time side information generating unit 93; wherein, the matching function

δ is a set value and is a real number; (i, j) represents a motion vector between two arbitrary basic macroblocks, and (x, y), (x+i, y+j) all represent pixel point coordinates; _k (x, y) represents the pixel value of the current frame in two consecutive first key decoded frames at (x, y); f _k-1 (x + i, y + j) represents two consecutive numbers The pixel value of the previous frame in a key decoded frame at (x+i, y+j).

The time side information generating unit 93 is configured to process the two mutually matching basic macroblocks sent by the first block matching unit 92 by using bidirectional motion interpolation to obtain a time side information frame.

Transmitting the time side information frame Y _2n (p) to the fusion module 11; wherein Y _2n (p) represents a time side information frame, p represents a pixel coordinate in a time side information frame; X _2n-1 represents two mutually matching A basic macroblock of a preamble first critical filter frame belonging to two consecutive first key filter frames in the basic macroblock, X _2n+1 representing two consecutive first keys belonging to two mutually matching basic macroblocks The basic macroblock of the first key filtered frame in the filtered frame; MV _f2n represents the forward motion vector, MV _b2n represents the backward motion vector, and MV _f2n and MV _b2n are known.

FIG. 3 is a schematic diagram showing the structure of a space side information generating module according to the present invention. As shown in FIG. 3, the spatial information generating module 10 of the present invention includes: a second pre-processing unit 101, a second block matching unit 102, and a spatial side information generating unit 103;

a second pre-processing unit 101, configured to perform low-pass filtering processing on two consecutive second key decoding frames from the second intra decoder 8 to divide the obtained two consecutive second key filtering frames into Fifty or more enhanced macroblocks of size M×N are transmitted, and each enhanced macroblock is sent to the second block matching unit 102; wherein, M and N both represent the number of pixel points and are natural numbers.

The second matching unit 102 is configured to perform a search according to MSE(r, s) ≤ γ in each enhanced macroblock sent by the second pre-processing unit 101, and send the searched two matched enhanced macroblocks to each other. To the spatial side information generating unit 103; wherein, the matching function

γ is a set value and is a real number; (r, s) represents a motion vector between two arbitrary enhanced macroblocks, and (x, y), (x+r, y+s) represent pixel point coordinates; _l (x, y) represents the pixel value of the current frame in two consecutive second key decoded frames at (x, y); f _l-1 (x + r, y + s) represents two consecutive numbers The pixel value of the previous frame in a key decoded frame at (x+r, y+s).

The spatial side information generating unit 103 is configured to process the two mutually matching enhanced macroblocks sent by the second block matching unit 102 by using bidirectional motion interpolation to obtain an initial spatial side information frame.

The initial spatial side information frame V _{2m is} sent to the fusion module 11; wherein V _2m (q) represents an initial spatial side information frame, q represents a pixel coordinate in an initial spatial side information frame; U _2m-1 represents two mutually matching a macroblock of a preamble first key filter frame belonging to two consecutive first key filter frames in a macroblock, U _2m+1 representing two consecutive first key filter frames in two mutually matching macroblocks The macroblock of the first key filter frame is _{followed by} MV _f2m for the forward motion vector, MV _b2m for the backward motion vector, and MV _f2m and MV _b2m are known.

4 is a schematic view showing the structure of a fusion module according to the present invention. As shown in FIG. 4, the fusion module 11 of the present invention includes a third pre-processing unit 111, a feature point extraction unit 112, a basic matrix generation unit 113, a mapping unit 114, and an information fusion unit 115;

The third pre-processing unit 111 is configured to filter the time-side information frame sent by the time-side information generating module 9 and the initial spatial side information frame sent by the spatial side information generating module 10, and filter the obtained time-side information to filter frames and initials. The spatial side information filtering frame is sent to the basic matrix generating unit 112, and the temporal side information filtering frame and the initial spatial side information filtering frame are respectively sent to the fusion unit 114 and the mapping unit 113.

The feature point extracting unit 112 is configured to acquire the brightness I (x, y) of each pixel point corresponding to the temporal side information filtering frame and the initial spatial side information filtering frame sent by the third preprocessing unit 111 in the horizontal direction and the vertical direction. ), I'(x, y) gradients are as follows:

among them,

Express convolution

After that, the basic autocorrelation matrix M and the enhanced autocorrelation matrix are constructed according to the above gradient correspondence.

M', respectively:

Smoothing the basic autocorrelation matrix M and the enhanced autocorrelation matrix M' to obtain the corresponding basic smooth autocorrelation matrix

Enhanced smooth autocorrelation matrix

Extracting two feature points λ ₁ , λ ₂ representing the principal curvature of the basic autocorrelation matrix M for the basic autocorrelation matrix M, and extracting two principal curvatures representing the enhanced autocorrelation matrix M′ for the enhanced autocorrelation matrix M′ The feature points λ ₁ ', λ ₂ ' are sent to the basic matrix generating unit 113 for each of the feature points and the pixel coordinates corresponding to each feature point;

σ ² represents the pixel point variance; each of the above feature points satisfies the constraint condition λ ₁ · λ _{2 -} 0.04 · (λ ₁ + λ ₂ ) ² > δ, λ ₁ '· λ ₂ '-0.04 · (λ ₁ ' + λ ₂ ') ² > δ, δ is the set threshold.

a basic matrix generating unit (113), configured to acquire, according to each feature point sent by the feature point extracting unit 112 and pixel coordinates corresponding to each feature point, a self between the base view point (1) and the enhanced view point (2) Correlation coefficient CC:

Where (x ₁ , y ₁ ), (x ₂ , y ₂ ) represent the pixel coordinates of the feature points λ ₁ and λ ₂ , respectively, I ₁ (x ₁ , y ₁ ), I ₂ (x ₂ , y ₂ ) respectively Gray scales representing feature points λ ₁ , λ ₂ ; (x ₁ ', y ₁ '), (x ₂ ', y ₂ ') represent pixel coordinates of feature points λ ₁ ', λ ₂ ', I ₁ '( x ₁ ', y ₁ '), I ₂ '(x ₂ ', y ₂ ') represent the gradation of the feature points λ ₁ , λ ₂ , respectively;

The size is (2m+1)× centered on (x ₁ , y ₁ ), (x ₂ , y ₂ ), (x ₁ ', y ₁ '), (x ₂ ', y ₂ '), respectively. Within the matching window of 2m+1), 6 sets of pre-matching points are extracted as 6 sets of samples; construct a linear equation system:

Where m is a natural number, (a, b), (a', b') respectively represent pixel points in the image acquired by the base view, and pixels in the image acquired by the enhanced view; h ₁ , h ₂ , h ₃ respectively Represents three vectors;

Obtain h ₁ , h ₂ , h ₃ from 4 sets of samples randomly selected from 6 sets of samples; further, obtain a homography matrix H=[h ₁ h ₂ h ₃ ] ^T ; for the remaining 2 sets of samples in 6 sets of samples The pair of poles e' is obtained according to xe' x Hx' = 0; further, the obtained basic matrix F = e' x H is sent to the mapping unit 114.

The mapping unit 114 maps the initial spatial side information filtering frame to the base view 1 by the base matrix F transmitted by the base matrix generating unit 113, and transmits the obtained mapped spatial side information frame to the information fusion unit 15.

The information fusion unit 115 is configured to use the average interpolation method to fuse the time side information frame sent by the third preprocessing unit 111 with the mapping space side information frame sent by the mapping unit 114, and send the obtained fusion information frame to the heavy Construct module 12.

In conclusion, the above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Claims

A distributed video coding framework, characterized in that the coding framework comprises a base view, an enhanced view, a Wyner-Ziv encoder, a Wyner-Ziv decoder, a first intra encoder, a first intra decoder, time An edge information generating module, a second intra encoder, a second intra decoder, a spatial side information generating module, a fusion module, and a reconstruction module; wherein

a base view for collecting the first environment video image, dividing the first environment video image into the first Wyner-Ziv frame and the first key frame according to the sequence number of the first environment video image, and the first Wyner-Ziv frame, the first The key frames are respectively sent to the Wyner-Ziv encoder and the first intra encoder;

And enhancing the viewpoint for collecting the second environment video image, dividing the second environment video image into the second Wyner-Ziv frame and the second key frame according to the sequence number of the second environment video image, and sending the second key frame to the second frame Inner encoder

The Wyner-Ziv encoder is configured to perform a discrete cosine transform on the first Wyner-Ziv frame transmitted by the base view to remove the inter-pixel correlation, and perform channel coding on the bit plane formed by quantizing the transform coefficients, and obtain the Wyner- Ziv coded frames are sent over the wireless channel to the Wyner-Ziv decoder;

a Wyner-Ziv decoder for decoding a Wyner-Ziv encoded frame transmitted by the Wyner-Ziv encoder and transmitting the Wyner-Ziv decoded frame to the reconstruction module;

a first intra-frame encoder, configured to perform H.264 intra-frame coding on the first key frame sent by the base view, and send the obtained first key coded frame to the first intra-frame decoder through a wireless channel;

a first intra-frame decoder, configured to perform H.264 intra-frame decoding on the first key coded frame sent by the first intra-frame encoder, and send the obtained first key-decoded frame to the time-side information generating module;

a time side information generating module, configured to perform preprocessing, block matching, and bidirectional motion interpolation on two consecutive first key decoding frames from the first intra decoder, and send the generated time side information frame to the fusion Module

a second intra-frame encoder, configured to perform H.264 intra-frame coding on the second key frame sent by the enhanced view point, and send the obtained second key coded frame to the second intra-frame decoder through the wireless channel;

a second intra-frame decoder, configured to perform H.264 intra-frame decoding on the second key coded frame sent by the second intra-frame encoder, and send the obtained second key-decoded frame to the spatial information generating module;

a spatial side information generating module, configured to perform motion estimation according to the second key decoding frame sent by the decoder in the second frame, and send the obtained initial spatial side information frame to the fusion module;

The fusion module is configured to map the initial spatial side information frame sent by the spatial side information generating module to the basic view point according to the correlation between the basic view point and the enhanced view point, obtain the mapped spatial side information frame, and adopt average interpolation The method performs information fusion on the time side information frame sent by the time side information generating module and the mapping space side information frame, and then sends the obtained fusion information frame to the reconstruction module;

The reconstruction module is configured to filter the fusion information frame sent by the fusion module, and perform image reconstruction according to the Wyner-Ziv decoding frame and the filtered fusion information frame sent by the Wyner-Ziv decoder.
The distributed video coding framework according to claim 1, wherein the time information generating module comprises: a first preprocessing unit, a first block matching unit, and a time side information generating unit;

a first pre-processing unit, configured to perform low-pass filtering processing on two consecutive first key decoding frames from the first intra decoder, and then divide the obtained two consecutive first key filtering frames into More than fifty basic macroblocks of size M×N are transmitted, and each basic macroblock is sent to the first matching unit; wherein, M and N represent the number of pixel points, and are natural numbers;

a first matching unit, configured to search according to MSE(i,j)≤δ in each basic macroblock sent by the first pre-processing unit, and send the searched two matching basic macroblocks to time Side information generating unit; wherein, the matching function
δ is a set value and is a real number; (i, j) represents a motion vector between two arbitrary basic macroblocks, and (x, y), (x+i, y+j) all represent pixel point coordinates; k (x, y) represents the pixel value of the current frame in two consecutive first key decoded frames at (x, y); f k--1 (x + i, y + j) represents two consecutive a pixel value of the previous frame in the first key decoded frame at (x+i, y+j);

a time side information generating unit, configured to process two mutually matching basic macroblocks sent by the first block matching unit by using bidirectional motion interpolation to obtain a time side information frame
Transmitting the time side information frame Y 2n (p) to the fusion module (11); wherein Y 2n (p) represents a time side information frame, p represents a pixel coordinate in a time side information frame; X 2n - 1 represents Among the two basic macroblocks that match each other, the basic macroblock of the preamble first key filter frame belonging to two consecutive first key filter frames, X 2n+1 means that two of the two basic macroblocks that match each other belong to two The basic macroblock of the first key filter frame in the successive first key filter frame; MV f2n represents the forward motion vector, MV b2n represents the backward motion vector, and MV f2n and MV b2n are all known.
The distributed video coding framework according to claim 1, wherein the spatial information generation module comprises: a second pre-processing unit, a second block matching unit, and a spatial side information generating unit;

a second pre-processing unit, configured to perform low-pass filtering processing on two consecutive second key decoding frames from the second intra decoder, and then divide the obtained two consecutive second key filtering frames into More than fifty enhanced macroblocks of size M×N are transmitted, and each enhanced macroblock is sent to the second matching unit; wherein, M and N represent the number of pixel points, and are natural numbers;

a second matching unit, configured to search according to MSE(r, s)≤γ in each enhanced macroblock sent by the second pre-processing unit, and send the searched two matched enhanced macroblocks to the space Side information generating unit; wherein, the matching function
γ is a set value and is a real number; (r, s) represents a motion vector between two arbitrary enhanced macroblocks, and (x, y), (x+r, y+s) represent pixel point coordinates; l (x, y) represents the pixel value of the current frame in two consecutive second key decoded frames at (x, y); f l--1 (x + r, y + s) represents two consecutive a pixel value of the previous frame in the first key decoded frame at (x+r, y+s);

The spatial side information generating unit is configured to process the two mutually matching enhanced macroblocks sent by the second matching unit by using bidirectional motion interpolation to obtain an initial spatial side information frame.
Transmitting an initial spatial side information frame V 2m to the fusion module; wherein V 2m (q) represents an initial spatial side information frame, q represents a pixel coordinate in an initial spatial side information frame; U 2m−-1 represents two mutual A macroblock of a pre-ordered first key filter frame belonging to two consecutive first key filter frames in the matched macroblock, U 2m+1 representing two consecutive first key filters among two mutually matching macroblocks The macroblock of the first key filter frame in the frame is followed by MV f2m for the forward motion vector, MV b2m for the backward motion vector, and MV f2m and MV b2m are known.
The distributed video coding framework according to claim 1, wherein the fusion module comprises a third pre-processing unit, a feature point extraction unit, a basic matrix generation unit, a mapping unit, and an information fusion unit;

a third pre-processing unit, configured to filter a time-side information frame sent by the time-side information generating module and an initial spatial side information frame sent by the spatial-side information generating module, and filter the obtained time-side information filtering frame, The initial spatial side information filtering frame is sent to the basic matrix generating unit, and the time side information filtering frame and the initial spatial side information filtering frame are respectively sent to the fusion unit and the mapping unit;

a feature point extracting unit, configured to acquire, in a horizontal direction and a vertical direction, a brightness information I(x, y) of each pixel point corresponding to the temporal side information filtering frame and the initial spatial side information filtering frame sent by the third preprocessing unit respectively The gradient of I'(x, y) is as follows:

among them,
Express convolution

After that, the basic autocorrelation matrix M and the enhanced autocorrelation matrix are constructed according to the above gradient correspondence. M', respectively:

Smoothing the basic autocorrelation matrix M and the enhanced autocorrelation matrix M' to obtain a corresponding basic smooth autocorrelation matrix
Enhanced smooth autocorrelation matrix
Extracting two feature points λ 1 , λ 2 representing the principal curvature of the basic autocorrelation matrix M for the basic autocorrelation matrix M, and extracting two principal curvatures representing the enhanced autocorrelation matrix M′ for the enhanced autocorrelation matrix M′ The feature points λ 1 ', λ 2 ', and the pixel coordinates corresponding to the feature points and the feature points are sent to the basic matrix generating unit;
σ 2 represents the pixel point variance; each of the above feature points satisfies the constraint condition λ 1 · λ 2 - 0.04 · (λ 1 + λ 2 ) 2 > δ, λ 1 '· λ 2 '-0.04 · (λ 1 ' + λ 2 ') 2 > δ, δ is the set threshold;

The base matrix generating unit is configured to acquire an autocorrelation coefficient CC between the base view point and the enhanced view point according to each feature point sent by the feature point extracting unit and pixel coordinates corresponding to each feature point:

Where (x 1 , y 1 ), (x 2 , y 2 ) represent the pixel coordinates of the feature points λ 1 and λ 2 , respectively, I 1 (x 1 , y 1 ), I 2 (x 2 , y 2 ) respectively Gray scales representing feature points λ 1 , λ 2 ; (x 1 ', y 1 '), (x 2 ', y 2 ') represent pixel coordinates of feature points λ 1 ', λ 2 ', I 1 '( x 1 ', y 1 '), I 2 '(x 2 ', y 2 ') represent the gradation of the feature points λ 1 , λ 2 , respectively;

The size is (2m+1)× centered on (x 1 , y 1 ), (x 2 , y 2 ), (x 1 ', y 1 '), (x 2 ', y 2 '), respectively. Within the matching window of 2m+1), 6 sets of pre-matching points are extracted as 6 sets of samples; construct a linear equation system:
Where m is a natural number, (a, b), (a', b') respectively represent pixel points in the image acquired by the base view, and pixels in the image acquired by the enhanced view; h 1 , h 2 , h 3 respectively Represents three vectors;

Obtain h 1 , h 2 , h 3 from 4 sets of samples randomly selected from 6 sets of samples; further, obtain a homography matrix H=[h 1 h 2 h 3 ] T ; for the remaining 2 sets of samples in 6 sets of samples Obtaining a pair of poles e' according to xe'×Hx'=0; further, transmitting the obtained basic matrix F=e'×H to the mapping unit;

a mapping unit, the initial spatial side information filtering frame is mapped to the basic view point by the basic matrix F sent by the basic matrix generating unit, and the obtained mapped spatial side information frame is sent to the information fusion unit;

An information fusion unit is configured to combine the time side information frame sent by the third preprocessing unit with the mapping space side information frame sent by the mapping unit by using an average interpolation method, and send the obtained fusion information frame to the reconstruction Module.