CN103856779A

CN103856779A - Packet switching network oriented multi-view video transmission distortion predication method

Info

Publication number: CN103856779A
Application number: CN201410098310.1A
Authority: CN
Inventors: 周圆; 陈莹; 庞勃; 崔波; 侯春萍
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2014-03-18
Filing date: 2014-03-18
Publication date: 2014-06-11

Abstract

The packet switching network oriented multi-viewpoint video transmission distortion predication method includes the steps that multi-viewpoint video sequences are coded with H. 264/MVC, and s is the viewpoint number and t is the frame number; distorted measurements under different packet loss rates are simulated, and inter-viewpoint prediction and macro block ratio V, Q, U covered up by intra-frame coding and motion compensation temporal are calculated; and are obtained through pixel value sum reconstructed at a encoder and a decoder by (s, t) frames and I pixels, and transmission parameters lambada a, lambada b and mu a are calculated through the least square method; distorted values - t e s t (s, t) of each viewpoint are iterated; multi-viewpoint video transmission distortion PSNR and MSE are predicted. The packet switching network oriented multi-viewpoint video transmission distortion predication method is capable of correctly and effectively estimating distortion caused by packet loss during multi-viewpoint video streaming transmission on the premise of keeping low computation complexity, predicting error propagation brought by errors in any frame and any viewpoint to a future frame, and better simulating actual frame distortion levels. The simulation value is basically matched with the actually measured value.

Description

Towards the multi-view point video transmission distortion Forecasting Methodology of packet network

Technical field

The present invention relates to a kind of transmission distortion Forecasting Methodology of three-dimensional video-frequency.Particularly relate to a kind of multi-view point video transmission distortion Forecasting Methodology towards packet network.

Background technology

In packet switch IP network, the memory block on certain node is overflowed and may be caused packet loss, also may be considered to certain packet loss because time delay is long.But, compressed vision signal, the three-dimensional video-frequency being particularly encoded, must rely on interframe encode to improve code efficiency owing to having adopted low bit rate video coding scheme, a little less than being therefore highly brittle when in the face of error of transmission.The coding structure of this employing motion compensation and parallax compensation can produce very strong space-time dependence in the time that intersymbol is predicted.Relation in research IP packet network between end-to-end packet loss and video communication quality, the transmission of video distortion model that will set up exactly applicable IP network packet loss characteristic of most critical.For the main purpose of transmission of video distortion modeling, wish exactly the video decode distortion can Accurate Prediction being caused by packet loss.

Although proposed multiple distortion computation method in some documents, they can only be applied to the single view video encoder under block-based motion compensated prediction framework mostly, are not suitable for the transmission of multi-viewpoint three-dimensional video.For the transmission distortion model of low complex degree, mostly consider that in-line coding and spatial loop path filter transmit to carry out distortion analysis, low complex degree estimation model is mainly used in the situation of low error rate, conventionally can not meet the requirement of rate distortion (R-D) Performance optimization to model accuracy.Estimate for the accurate transmission distortion that has appropriate complexity, conventionally estimate to accumulate the total amount distortion obtaining and decide by the distortion of frame above the coding mode of current macro.Best pixel that for example Yang and Rose provide is estimated the expansion of recursive algorithm (ROPE) and this algorithm, thereby it recursively calculates the single order of pixel value after each decoding and the second order total distortion (mean square deviation) apart from definite each pixel.The people such as He have set up the distortion prediction model of bit error in dynamic compensation, and this model are applied to information source rate distortion adaptive frame mode under time varying channel condition is selected and the speed control of combined signal source channel model.

Up to now, there is not yet the report of carrying out about the work of multi-view point video transmission distortion Modeling Research at home and abroad in the paper of publishing and document, therefore the modeling of multi-viewpoint three-dimensional video network transmission distortion is almost a blank research field.Planar video distortion prediction model is directly expanded to the situation of multi-view point video, need to calculate the propagation distortion on every propagation path, this can cause the huge increase of complexity.Therefore the Mathematical Modeling of, analyzing the transmission distortion of multi-view point video in the IP network of packet loss immediately and setting up distortion estimation is a challenging problem.

Summary of the invention

Technical problem to be solved by this invention is, a kind of error propagation of having considered in viewpoint and between viewpoint is provided, and the distortion of the frame of the frame before the transmission distortion of present frame and same viewpoint and adjacent viewpoint is connected, can effectively simulate with lower complexity the multi-view point video transmission distortion Forecasting Methodology towards packet network of two-dimensional distortion communication mode more complicated in multi-viewpoint three-dimensional video.

The technical solution adopted in the present invention is: a kind of multi-view point video transmission distortion Forecasting Methodology towards packet network, comprises the steps:

1) adopt and H.264/MVC multi-view point video sequence is encoded, represent that with s viewpoint numbering, t are the numbering of frame in same viewpoint;

2) first under the condition of different packet loss rate, obtain the measured value D of corresponding distortion by emulation _c(s, t); D _c(s, t) is defined as the distortion that in emulation, actual measurement is arrived, i.e. the pixel value difference of the same frame of same viewpoint before and after transmission of video; Come out the macro block ratio V of interview prediction and the macro block ratio Q of frame mode coding at coding side, and in macro block, adopt motion-compensated temporal to cover the percentage U of mode;

3) measure the pixel value F of (s, t) frame i pixel _ithe pixel value of (s, t) frame i pixel that (s, t) rebuilds at encoder place respectively

with

4) calculate the mean square deviation D of consecutive frame in viewpoint _tECthe mean square deviation D of the same number of frames of (s, t) and adjacent viewpoint _vEC(s, t);

5) adopt least square method to calculate λ by following formula _aand λ _bvalue,

λ_{a}, λ_{b} = \underset{λ_{a}, λ_{b}}{\arg \min} \underset{(s, t) &Element; Recieved}{Σ} {(D_{c} (s, t) - D_{R} (s, t))}^{2}

D _R(s,t)=(1-Q)[V·λ _b·D _c(s,t-1)+(1-V)·λ _a·D _c(s-1,t)]

Wherein, the frame that (s, t) receives for decoding end,

Especially, the frame of the 0th viewpoint, does not comprise first I frame, has all adopted the predictive mode of inter prediction in viewpoint, now V=1, and all adopted inter prediction between viewpoint, now V=0 except the macro block of all viewpoints first frame of the 0th viewpoint;

6) calculate μ by least square method _aand μ _bvalue,

μ_{a}, μ_{b} = \underset{μ_{a}, μ_{b}}{\arg \min} \underset{(s, t) &Element; Lost}{Σ} {(D_{c} (s, t) - D_{L} (s, t))}^{2}

\begin{matrix} D_{L} (s, t) = U \cdot (D_{TEC} (s, t) + μ_{b} \cdot D_{c} (s, t - 1)) \\ + (1 - U) \cdot (D_{VEC} (s, t) + μ_{a} \cdot D_{c} (s - 1, t)) \\ = [U \cdot D_{TEC} (s, t) + (1 - U) \cdot D_{VEC} (s, t)] \\ + [U \cdot μ_{b} \cdot D_{c} (s, t - 1) + (1 - U) \cdot μ_{a} \cdot D_{c} (s - 1, t)] \end{matrix}

Wherein, the frame that (s, t) loses for decoding end,

Especially, for the P frame of any one loss in the video sequence of the 0th viewpoint, make U=1; For the first frame of the each viewpoint except the 0th viewpoint, if there is frame losing, make U=0;

7) iteration goes out the distortion value of each viewpoint, i.e. D _c-testthe value of (s, t)

thus, the average transmission distortion D of (s, t) frame _c-test(s, t) can pass through iterative computation D _c(s, t-1) and D _c(s-1, t) obtains;

8) distortion of propagating according to the V of different frames, Q, U value prediction multi-view point video, i.e. the value of Y-PSNR PSNR and MSE.

The mean square deviation D of consecutive frame in viewpoint described in step (4) _tECthe mean square deviation D of the same number of frames of (s, t) and adjacent viewpoint _vEC(s, t) obtains by following formula respectively:

D_{TEC} (s, t) = E {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s, t - 1)]}^{2}}; D_{VEC} (s, t) = E {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s - 1, t)]}^{2}} .

In step (8):

MSE value is reference picture and rebuilds the mean square error between image, the distortion value of representative reconstruction image,

wherein f (x, y) is for rebuilding the pixel value of image, f ₀(x, y) is the pixel value of reference picture, f (x, the y)-f in transmission distortion prediction ₀(x, y) is D _c-testthe value of (s, t);

The unit of Y-PSNR with decibel represent, its formula is as follows:

RSNR = {10 \log}_{10} \frac{{(2^{n} - 1)}^{2}}{MSE}

Wherein (2 ⁿ-1) ²for pixel amplitudes peak value square, n represents the bit number of each pixel, M and N are horizontal and vertical pixel count.

Multi-view point video transmission distortion Forecasting Methodology towards packet network of the present invention, can be keeping predicting under compared with the prerequisite of low computational complexity the error propagation to future frame of error band in any frame and any viewpoint, simulate actual frame level of distortion, simulation result and actual measured value are agreed with substantially.The present invention can be with more complicated two-dimentional error propagation pattern in the correct effectively simulation of lower complexity multi-viewpoint three-dimensional video, the distortion being caused by packet loss when multi-view point video flow transmission that estimation has been encoded.

Accompanying drawing explanation

Fig. 1 is the flow chart of the inventive method;

Fig. 2 is the MSE of Ballroom sequence the 0th viewpoint;

Fig. 3 is the MSE of Ballroom sequence the 1st viewpoint;

Fig. 4 is the MSE of Ballroom sequence the 2nd viewpoint;

Fig. 5 is the MSE of Ballroom sequence the 3rd viewpoint;

Fig. 6 is the MSE of Ballroom sequence the 4th viewpoint;

Fig. 7 is the MSE of Ballroom sequence the 5th viewpoint;

Fig. 8 is the MSE of Ballroom sequence the 6th viewpoint;

Fig. 9 is the MSE of Ballroom sequence the 7th viewpoint.

Embodiment

Below in conjunction with embodiment and accompanying drawing, the multi-view point video transmission distortion Forecasting Methodology towards packet network of the present invention is described in detail.

As shown in Figure 1, the multi-view point video transmission distortion Forecasting Methodology towards packet network of the present invention, comprises the steps:

with

4) calculate respectively the mean square deviation D of consecutive frame in viewpoint by following formula _tECthe mean square deviation D of the same number of frames of (s, t) and adjacent viewpoint _vEC(s, t):

D_{TEC} (s, t) = E {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s, t - 1)]}^{2}}; D_{VEC} (s, t) = E {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s - 1, t)]}^{2}} .

λ_{a}, λ_{b} = \underset{λ_{a}, λ_{b}}{\arg \min} \underset{(s, t) &Element; Recieved}{Σ} {(D_{c} (s, t) - D_{R} (s, t))}^{2}

D _R(s,t)=(1-Q)[V·λ _b·D _c(s,t-1)+(1-V)·λ _a·D _c(s-1,t)]

Wherein, the frame that (s, t) receives for decoding end,

6) calculate μ by least square method _aand μ _bvalue,

μ_{a}, μ_{b} = \underset{μ_{a}, μ_{b}}{\arg \min} \underset{(s, t) &Element; Lost}{Σ} {(D_{c} (s, t) - D_{L} (s, t))}^{2}

\begin{matrix} D_{L} (s, t) = U \cdot (D_{TEC} (s, t) + μ_{b} \cdot D_{c} (s, t - 1)) \\ + (1 - U) \cdot (D_{VEC} (s, t) + μ_{a} \cdot D_{c} (s - 1, t)) \\ = [U \cdot D_{TEC} (s, t) + (1 - U) \cdot D_{VEC} (s, t)] \\ + [U \cdot μ_{b} \cdot D_{c} (s, t - 1) + (1 - U) \cdot μ_{a} \cdot D_{c} (s - 1, t)] \end{matrix}

Wherein, the frame that (s, t) loses for decoding end,

8) distortion of propagating according to the V of different frames, Q, U value prediction multi-view point video, i.e. the value of Y-PSNR PSNR and MSE:

The unit of Y-PSNR with decibel represent, its formula is as follows:

RSNR = {10 \log}_{10} \frac{{(2^{n} - 1)}^{2}}{MSE}

Below the multi-view point video transmission distortion Forecasting Methodology towards packet network of the present invention is verified.Meanwhile, the simulation result transmitting in packet loss network by predicting the outcome of emulation experiment comparison algorithm and multi-view point video cycle tests, carrys out the validity of validation algorithm by a large amount of experimental results.Experimental result represents with the form of mean square deviation (MSE) and Y-PSNR (PSNR).

Adopt four different multi-view point video cycle testss to evaluate distortion prediction algorithm performance below, these four cycle testss comprise: a high-speed motion sequence " Ballroom ", two middling speed motion sequences " Vassar " and " Exit ", a low-speed motion video " Lotus ".In experiment, the quantization step (QP) of getting respectively them is 32,32,25,41.

In experiment, in each multi-view point video sequence, only have an I frame, and suppose that this I frame do not make mistakes.Meanwhile, think and the only corresponding sheet group of each P frame (slice) organize a corresponding independent bag for each, and the length of each bag is not more than Ethernet MTU (MTU).Choosing packet loss is 5%.

1, first, experiment is used H.264/MVC reference software (JMVM8.0) to encode.In each P frame, the macro block ratio Q of the macro block ratio V of interview prediction and frame mode coding can come out at coding side.Table one has provided in four video test sequence, the macro block of frame mode and the shared percentage of viewpoint inter mode macro block.Error concealment pattern is " frame copies ", i.e. the D of every frame _tEC(s, t) can use formula

calculate in advance, in this D _vEC(s, t) can use simultaneously

D_{VEC} (s, t) = {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s - 1, t)]}^{2}}

Calculate in advance.

Frame mode ratio Q, the interview prediction ratio V of table one cycle tests, time domain error are covered ratio U

Sequence	Ballroom	Exit	Vassar	Lotus
					Q	3.59%	3.19%	0.49%	0.16%
V	89.44%	97.28%	98.22%	90.19%
					U	89.81%	97.7%	98.45%	91.43%

2, then, in the time using the error concealment method that frame copies in experiment, there is μ _a=1 and μ _b=1.In experiment of the present invention, the typical algorithm parameter value of each sequence as shown in Table 2.

The representative value of table two algorithm parameter

3, JVT SVC/AVC packet loss pattern is carried out emulation to packet loss.JVT SVC/AVC packet loss pattern derives from the actual measurement of error situation in packet network, has considered the actual conditions of packet loss in network, as Burst loss etc.Algorithm predicts value and simulation value have been compared in Fig. 2-9, have drawn the MSE value of each frame distortion in the each viewpoint of " Ballroom " sequence in figure.

4, the mean P SNR of four each viewpoints of multi-view point video sequence of test.Table three has provided algorithm predicts value and the actual measured value of each viewpoint average distortion of four cycle testss " Ballroom ", " Vassar ", " Exit " and " Lotus ".

The model predication value of many video sequences of table three under 2% to 10% packet loss and the comparison of actual measured value

Claims

1. towards a multi-view point video transmission distortion Forecasting Methodology for packet network, it is characterized in that, comprise the steps:

with

λ_{a}, λ_{b} = \underset{λ_{a}, λ_{b}}{\arg \min} \underset{(s, t) &Element; Recieved}{Σ} {(D_{c} (s, t) - D_{R} (s, t))}^{2}

D _R(s,t)=(1-Q)[V·λ _b·D _c(s,t-1)+(1-V)·λ _a·D _c(s-1,t)]

Wherein, the frame that (s, t) receives for decoding end,

6) calculate μ by least square method _aand μ _bvalue,

μ_{a}, μ_{b} = \underset{μ_{a}, μ_{b}}{\arg \min} \underset{(s, t) &Element; Lost}{Σ} {(D_{c} (s, t) - D_{L} (s, t))}^{2}

\begin{matrix} D_{L} (s, t) = U \cdot (D_{TEC} (s, t) + μ_{b} \cdot D_{c} (s, t - 1)) \\ + (1 - U) \cdot (D_{VEC} (s, t) + μ_{a} \cdot D_{c} (s - 1, t)) \\ = [U \cdot D_{TEC} (s, t) + (1 - U) \cdot D_{VEC} (s, t)] \\ + [U \cdot μ_{b} \cdot D_{c} (s, t - 1) + (1 - U) \cdot μ_{a} \cdot D_{c} (s - 1, t)] \end{matrix}

Wherein, the frame that (s, t) loses for decoding end,

2. the multi-view point video transmission distortion Forecasting Methodology towards packet network according to claim 1, is characterized in that, the mean square deviation D of consecutive frame in the described viewpoint of step (4) _tECthe mean square deviation D of the same number of frames of (s, t) and adjacent viewpoint _vEC(s, t) obtains by following formula respectively:

D_{TEC} (s, t) = E {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s, t - 1)]}^{2}}; D_{VEC} (s, t) = E {{[{\hat{F}}_{i} (s, t) - {\hat{F}}_{i} (s - 1, t)]}^{2}} .

3. the multi-view point video transmission distortion Forecasting Methodology towards packet network according to claim 1, is characterized in that, in step (8):

The unit of Y-PSNR with decibel represent, its formula is as follows:

RSNR = {10 \log}_{10} \frac{{(2^{n} - 1)}^{2}}{MSE}