CN105306954A - Method for sensing stereoscopic video coding based on parallax just-noticeable difference model - Google Patents

Method for sensing stereoscopic video coding based on parallax just-noticeable difference model Download PDF

Info

Publication number
CN105306954A
CN105306954A CN201410240167.5A CN201410240167A CN105306954A CN 105306954 A CN105306954 A CN 105306954A CN 201410240167 A CN201410240167 A CN 201410240167A CN 105306954 A CN105306954 A CN 105306954A
Authority
CN
China
Prior art keywords
jnd
centerdot
parallax
stereo
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410240167.5A
Other languages
Chinese (zh)
Other versions
CN105306954B (en
Inventor
郑喆坤
焦李成
薛飞
孙天
乔伊果
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201410240167.5A priority Critical patent/CN105306954B/en
Publication of CN105306954A publication Critical patent/CN105306954A/en
Application granted granted Critical
Publication of CN105306954B publication Critical patent/CN105306954B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention belongs to the field of video processing technology, and specifically discloses a method for sensing stereoscopic video coding based on a parallax just-noticeable difference model. The method comprises the following steps: (1) estimating parallax; (2) estimating a JND model based on the parallax; (3) calculating a brightness, texture and time weighted JND model; (4) combining the JND model based on the parallax with a spatial domain-time domain JND model by using a nonlinear additive model to obtain a binocular stereoscopic JND model based on the parallax; and (5) using the binocular stereoscopic JND model based on the parallax on a stereoscopic residual preprocessor for resetting a residual. The method disclosed by the invention can be used for effectively eliminating inter-view redundancy of time, space and binocular stereoscopic videos, and the information of the brightness, a texture region or an object edge keeps a very natural visual effect. Therefore, the method disclosed by the invention can be used for greatly reducing the stereoscopic video code rate on the premise of generating no influence on the stereoscopic visual perceptual quality.

Description

A kind of perception stereo scopic video coding based on the minimum appreciable error model of parallax
Technical field
The invention belongs to technical field of video processing, be specifically related to a kind of perception method for encoding stereo video, particularly a kind of perception method for encoding stereo video based on the minimum appreciable error model of parallax.
Background technology
Because visual experience demand true to nature strengthens gradually, the development of 3D TV tech is very rapid in recent years.Catch from different visual angles the multi-view point video that same scene produces by the multiple cameras of different points of view, more lively visual experience can be brought to user.But along with increasing of number of cameras, when storage and transmission 3D three-dimensional video-frequency, required memory space and bandwidth need increase at double could keep the quality of video image.Therefore, effective method for encoding stereo video is very necessary.
Stereo scopic video coding be exactly to eliminate video sequence space, time and look a redundancy etc., under the prerequisite of not losing video quality, reduce code check.Due to the final recipient normally human visual system (HVS) of vision signal, so the fusion of human visual perception Summing Factor Video coding can maintain the perceived quality of video better.At present, a large amount of perception method for video coding is suggested.The coding method of wherein sheltering the minimum appreciable error model (JND) of function in conjunction with human visual system serves important function.JND model obtains a threshold of sensitivity by simulating human visually-perceptible redundancy, when coding distortion is lower than can not by the perception of naked eyes institute during this threshold value.This distortion threshold be on each code signal objective reality can be used for re-distribute code rate thus reach reduce the object of code check.
Existing minimum appreciable error model, generally includes pixel domain JND model and transform domain JND model.Transform domain JND model considers interchannel reciprocation, combines human vision frequency effect, spatial contrast degree sensitlzing effect and time contrast sensitlzing effect.It utilizes the contrast sensitivity function (CSF) of each frequency band susceptibility that the visual characteristic of human eye has been incorporated in model, but algorithm generic pixel territory is comparatively complicated.
ShangX, WangY and LuoL etc. are called the central fovea JND model proposing a kind of DCT domain in the paper of " Perceptualmultiviewvideocodingbasedonfoveatedjustnoticea bledistortionprofileinDCTdomain " on IEEEInternationalConferenceICIP.It has employed and horizontal and vertical frequency, eccentricity, waits relevant contrast threshold to obtain this JND model, and combines with space-time JND model.Multiple view video coding can be effective to, but the amount of calculation of transform domain is larger.
Pixel domain JND model algorithm is simple.To have living space the most widely-time JND model, fovea centralis JND model and the JND model based on depth map.Space-time JND model effectively embodies brightness, texture and temporal masking.Fovea centralis JND model is integrated with traditional visual sensitivity characteristic sum fovea centralis feature, has showed central fovea masking effect.These two models effectively describe Time and place redundancy properties but can not embody the characteristic of looking a redundancy, are not suitable for stereo scopic video coding.JND model based on depth map considers degree of depth masking effect, can be used for the perception redundancy eliminating stereoscopic video images.But the prerequisite that this model is suitable for is the depth map sequence of known video.
Within 2013, in the NO.CN103414889A patent of " a kind of three-dimensional video-frequency Rate Control scheme of the proper discernable distortion based on binocular " by name, propose the proper discernable distortion model of a kind of binocular based on brightness JND model.The method is by calculating viewpoint layer respectively, image sets layer, the target bit rate of frame-layer and macroblock layer realizes Rate Control, this JND model is obtained depending on the contrast of pixel intensity with the left side on skew difference vector position on left viewpoint basis, achieve Rate Control, it can be too dependent on left viewpoint, based on a left side depending on the threshold value that obtains for right view, binocular parallax redundant information can not be fully demonstrated.
Within 2012, propose in the patent of the NO.CN102724525A of by name " a kind of deep video coding method just can perceiveing distortion model based on central fovea " and a kind ofly just can perceive distortion model based on central fovea.It draws according to left and right viewpoint and the left and right viewpoint degree of depth sequence overall situation obtaining each pixel just can perceive distortion and intermediate virtual viewpoint video.Thus the coded quantization parameter of this degree of depth sequence of largest tolerable distortion video acquisition by left viewpoint degree of depth sequence, for Video coding, reach good stereo scopic video coding effect.But the method needs the depth map sequence first calculating three-dimensional video-frequency, or for the known three-dimensional video-frequency of depth map, the code efficiency of encoding software is reduced.
Summary of the invention
The object of the invention is to for above-mentioned existing methodical deficiency, propose a kind of perception method for encoding stereo video based on the minimum appreciable error model of parallax, an independently JND threshold value is obtained according to the parallax information of left and right viewpoint, do not rely on the monochrome information of left viewpoint or right viewpoint, only relevant to parallax, that effectively can eliminate double vision three-dimensional video-frequency looks a perception redundancy, and does not substantially bring the decline of three-dimensional perceived quality.
The technical scheme realizing the object of the invention is: a kind of perception method for encoding stereo video based on the minimum appreciable error model of parallax, looks a perception redundancy based on left and right viewpoint, comprises the steps:
(1) disparity estimation:
1a) read in each two field picture I that the left and right viewpoint of binocular tri-dimensional video is corresponding respectively iLand I iR, and employing carries out segmentation preliminary treatment based on the method for average drifting Color Segmentation to it, obtains image I ' iLand I ' iR;
1b) to I ' iLand I ' iRcarry out Stereo matching, obtain the parallax d (x, y) between left and right viewpoint;
(2) JND model based on parallax is estimated:
According to the relation between the parallax sequence information obtained by input video sequence and human vision sensitivity, calculate the parallax JND threshold value JND of each frame video image at each pixel of input respectively dIS(x, y):
JND DIS(x,y)=ψ·e -d(x,y)
ψ=17,φ=3。
(3) brightness is calculated, texture and time weight JND model:
3a) calculate the brightness JND threshold value JND of left and right viewpoint at each pixel respectively l(x, y), it is determined by brightness masking effect and background luminance contrast;
3b) calculate the texture JND threshold value JND of left and right viewpoint at each pixel respectively t(x, y), it is closely related with texture masking effect and image edge structure, and utilizes non-linear additive model by itself and JND l(x, y) is bonded spatial domain JND model JND s(x, y):
Non-linear additive model: JND s=JND l+ JND t-φ min{JND l, JND t;
3c) calculate the time weight JND threshold value JND of left and right viewpoint at each pixel respectively tEM(x, y), it is determined by the temporal masking effect obtained according to video sequence interframe luminance difference, and by itself and JND s(x, y) merges mutually and obtains Spatial-Temporal JND model JND sT(x, y):
Weighted model: JND sT=JND sjND tEM;
(4) utilize non-linear additive model by based on the JND model of parallax and Spatial-Temporal JND model JND sT(x, y) combines, and obtains the binocular solid JND model JND based on parallax sTEREO(x, y):
JND STEREO(x,y)=JND ST+JND DIS-θ·min{JND ST,JND DIS}
(5) by the binocular solid JND model JND based on parallax sTEREO(x, y) is for three-dimensional residual error preprocessor:
5a) calculate the average JND threshold value quadratic sum of left and right viewpoint each B × B block respectively b gets different values according to different block sizes:
P JND STEREO 2 = 1 B 2 Σ y = 0 B - 1 Σ x = 0 B - 1 ( JND STEREO ( x , y ) ) 2
5b) calculate the mean residual of each B × B block of left and right viewpoint respectively
R b ‾ = 1 B 2 Σ y = 0 B - 1 Σ x = 0 B - 1 ( R ori ( x , y ) )
5c) calculate the variance of each pixel residual signals of left and right viewpoint respectively
σ R 2 = 1 B 2 Σ i [ R ori ( x , y ) - M ] 2 ,
M = 1 B 2 Σ i R ori ( x , y ) ,
5d) calculate Optimal Parameters ν according to rate-distortion optimization (RDO) principle: min (J)=min (D+ λ R), it may be used for the relation between balanced code rate and distortion:
λ = 4 · Q P 2 β · σ 2 ,
D = ω · v 2 · P JND STEREO 2 · ln ( σ R 2 + 1 ) ,
∂ J ∂ v = ∂ ( ( ω · v 2 · P JND STEREO 2 · ln ( σ R 2 + 1 ) ) + λ · R ) ∂ v = 0 ,
v = 1 P JND STEREO 2 · 4 · Q P 2 α · β · ϵ 2 · ω · ln ( σ R 2 + 1 ) - 1 ,
5e) utilize the JND threshold value JND obtained in (4) sTEREO(x, y) and from 5a)-5d) residual error R that the parameters value that obtains is original to video sequence ori(x, y) resets, and obtains new residual error R n(x, y), reaches the object of saving code check:
R N ( x , y ) = R ori + v &CenterDot; JND STEREO ( x , y ) , if R ori - R b &OverBar; < - v &CenterDot; JND STEREO ( x , y ) R b &OverBar; , if | R ori - R b &OverBar; | &le; v &CenterDot; JND STEREO ( x , y ) R ori - v &CenterDot; JND STEREO ( x , y ) , otherwise .
Each two field picture I that the left and right viewpoint of binocular tri-dimensional video is corresponding is read in respectively in described step (1a) iLand I iR, and employing carries out segmentation preliminary treatment based on the method for average drifting Color Segmentation to it, obtains image I ' iLand I ' iR, carry out as follows:
(1a1) each corresponding to left and right viewpoint respectively two field picture carries out average drifting filtering, obtains the information of all subspaces convergence point;
(1a2) by being combined by the pixel cluster of the description same area by average drifting filtering, cut zone is obtained.
To I ' in described step (1b) iLand I ' iRcarry out Stereo matching, obtain the parallax d (x, y) between left and right viewpoint, carry out as follows:
(1b1) adopt sectional perspective to mate can obtain:
d(x,y)=a·x+b·y+c,
Wherein, a, b, c, be three parameters of specifying disparity plane, they have determined that the parallax d (x, y) of each reference pixel (x, y);
(1b2) calculate each pixel absolute error and:
DIS SAD ( x , y , d ) = &Sigma; ( x , y ) &Element; W ( x , y ) | P 1 ( i , j ) - P 2 ( i + d , j ) | ,
(1b3) by pixel each in cut zone absolute error and minimize the value obtaining disparity plane and three parameters:
min ( DIS SEG ( R , D ) ) = min ( &Sigma; ( x , y ) &Element; R DIS SAD ( x , y , d ) ) ,
The brightness JND threshold value JND of left and right viewpoint at each pixel is calculated respectively described in described step (3a) l(x, y), carry out as follows:
(3a1) in 5 × 5 pixel domain centered by (x, y), calculate the mean value of background luminance:
I ( x , y ) &OverBar; = 1 32 &Sigma; i = 1 5 &Sigma; j = 1 5 I ( x - 3 + i , y - 3 + j ) &CenterDot; B ( i , j ) ,
Wherein, the brightness that I (x, y) is this pixel, B (i, j) is a weighted low pass ripple device;
(3a2) brightness JND threshold value is obtained by brightness masking effect and I (x, y):
JND L ( x , y ) = 17 ( 1 - I ( x , y ) &OverBar; 127 ) + 3 , if I ( x , y ) &OverBar; &le; 127 3 128 ( I ( x , y ) &OverBar; - 127 ) + 3 , otherwise .
The texture JND threshold value JND of left and right viewpoint at each pixel is calculated respectively described in described step (3b) t(x, y), carry out as follows:
(3b1) calculating pixel point (x, y) gradient around:
grad n ( x , y ) = 1 16 &Sigma; i = 1 5 &Sigma; j = 1 5 I ( x - 3 + i , y - 3 + j ) &CenterDot; g n ( i , j ) ,
Thus obtain its maximum weighted average gradient: G ( x , y ) = max n = 1,2,3,4 { | grad n ( x , y ) | } ;
(3b2) texture JND threshold value JND is obtained based on canny rim detection t(x, y):
JND T=θ·G(x,y)·W(x,y),
Wherein, W (x, y) represents the weight model that an edge is relevant.
The time weight JND threshold value JND of left and right viewpoint at each pixel is calculated respectively described in described step (3c) tEM(x, y), carry out as follows:
(3c1) luminance difference between the average frame calculating consecutive frame in same viewpoint:
&eta; &OverBar; = 1 M &CenterDot; N &Sigma; x &Sigma; y ( I t ( x , y ) - I t - 1 ( x , y ) ) ,
Wherein, M and N presentation video is wide and high;
(3c2) free masking effect obtains time weight JND threshold value JND tEM(x, y):
JND TEM = max ( &zeta; , 8 2 exp ( - 0.15 2 &pi; ( &eta; &OverBar; + 255 ) ) + &zeta; ) , &eta; &OverBar; &le; 0 max ( &zeta; , 3.2 2 exp ( - 0.15 2 &pi; ( 255 - &eta; &OverBar; ) ) + &zeta; ) , &eta; &OverBar; > 0 ,
Wherein, ζ=0.8.
The present invention compared with prior art has the following advantages:
1. the present invention carries out preliminary treatment by adopting each two field picture of left and right view respectively based on the method for average drifting Color Segmentation, make the disparity estimation of fringe region more accurate, because marginal texture will attract the more attentiveness of human visual system, the distortion of object boundary is more easily discovered, the disparity estimation precision improving this region can make proposed three-dimensional JND model more perfect, thus brings better visually-perceptible to experience;
2. the binocular solid JND model based on parallax that the present invention obtains better embodies the relation between the left and right viewpoint of three-dimensional video-frequency, utilize the parallax information between viewpoint effectively to eliminate parallax perception redundancy between viewpoint, make significantly to reduce three-dimensional video-frequency code check under the prerequisite of substantially not losing three-dimensional perceived quality;
3. the proposed by the invention and parallax masking effect of simulation, perfectly combine the mechanism of sheltering of human visual system and human vision to the sensitivity of three-dimensional object boundary, the relatively existing three-dimensional JND model of algorithm is comparatively simple.
The simulation experiment result shows, the present invention combines based on the method for average drifting Color Segmentation and the parallax information of three-dimensional video-frequency, three-dimensional edge quality can better be maintained, and parallax perception redundancy, room and time redundancy that effective elimination is unnecessary, video code rate is significantly reduced, and maintain the three-dimensional perceived quality of video, algorithm is simple, is a kind of perception method for encoding stereo video of good performance.
Accompanying drawing explanation
Fig. 1 is general frame figure the present invention being incorporated JMVC;
Fig. 2 is wherein two the test video images used in emulation experiment of the present invention;
Fig. 3 is the image of the reconstruction frames utilizing the inventive method to obtain;
Fig. 4 utilizes the method for JMVC and the image comparison figure of the inventive method reconstruction frames.
Embodiment
As shown in Figure 1, performing step of the present invention is as follows:
Step 1, disparity estimation
1a) read in each two field picture I that the left and right viewpoint of binocular tri-dimensional video is corresponding respectively iLand I iR, and employing carries out segmentation preliminary treatment based on the method for average drifting Color Segmentation to it, obtains image I ' iLand I ' iR;
(1a1) each corresponding to left and right viewpoint respectively two field picture carries out average drifting filtering, obtains the information of all subspaces convergence point;
(1a2) by being combined by the pixel cluster of the description same area by average drifting filtering, cut zone is obtained;
1b) to I ' iLand I ' iRcarry out Stereo matching, obtain the parallax d (x, y) between left and right viewpoint, concrete steps are as follows:
(1b1) adopt sectional perspective to mate can obtain:
d(x,y)=a·x+b·y+c,
Wherein, a, b, c, be three parameters of specifying disparity plane, they have determined that the parallax d (x, y) of each reference pixel (x, y);
(1b2) calculate each pixel absolute error and:
DIS SAD ( x , y , d ) = &Sigma; ( x , y ) &Element; W ( x , y ) | P 1 ( i , j ) - P 2 ( i + d , j ) | ,
(1b3) by pixel each in cut zone absolute error and minimize the value obtaining disparity plane and three parameters:
min ( DIS SEG ( R , D ) ) = min ( &Sigma; ( x , y ) &Element; R DIS SAD ( x , y , d ) ) ,
Step 2, to estimate based on the JND model of parallax
According to the relation between the parallax sequence information obtained by input video sequence and human vision sensitivity, calculate the parallax JND threshold value JND of each frame video image at each pixel of input respectively dIS(x, y):
JND DIS(x,y)=ψ·e -d(x,y)+φ,
Wherein, ψ and φ is two regulating parameter, and they make JND threshold value closer to human visual system's characteristic, ψ=17, φ=3.
Step 3, calculating brightness, texture and time weight JND model
3a) calculate the brightness JND threshold value JND of left and right viewpoint at each pixel respectively l(x, y):
(3a1) in 5 × 5 pixel domain centered by (x, y), calculate the mean value of background luminance:
I ( x , y ) &OverBar; = 1 32 &Sigma; i = 1 5 &Sigma; j = 1 5 I ( x - 3 + i , y - 3 + j ) &CenterDot; B ( i , j ) ,
Wherein, the brightness that I (x, y) is this pixel, B (i, j) is a weighted low pass ripple device;
(3a2) brightness JND threshold value is obtained by brightness masking effect and I (x, y):
JND L ( x , y ) = 17 ( 1 - I ( x , y ) &OverBar; 127 ) + 3 , if I ( x , y ) &OverBar; &le; 127 3 128 ( I ( x , y ) &OverBar; - 127 ) + 3 , otherwise .
3b) calculate the texture JND threshold value JND of left and right viewpoint at each pixel respectively t(x, y):
(3b1) calculating pixel point (x, y) gradient around:
grad n ( x , y ) = 1 16 &Sigma; i = 1 5 &Sigma; j = 1 5 I ( x - 3 + i , y - 3 + j ) &CenterDot; g n ( i , j ) , Thus obtain its maximum weighted average gradient: G ( x , y ) = max n = 1,2,3,4 { | grad n ( x , y ) | } ;
(3b2) texture JND threshold value JND is obtained based on canny rim detection t(x, y):
JND T=θ·G(x,y)·W(x,y),
Wherein, W (x, y) represents the weight model that an edge is relevant;
(3b3) utilize non-linear additive model by JND t(x, y) and JND l(x, y) is bonded spatial domain JND model JND s(x, y):
Wherein,
3c) calculate the time weight JND threshold value JND of left and right viewpoint at each pixel respectively tEM(x, y):
(3c1) luminance difference between the average frame calculating consecutive frame in same viewpoint:
&eta; &OverBar; = 1 M &CenterDot; N &Sigma; x &Sigma; y ( I t ( x , y ) - I t - 1 ( x , y ) ) ,
Wherein, M and N presentation video is wide and high;
(3c2) free masking effect obtains time weight JND threshold value JND tEM(x, y):
JND TEM = max ( &zeta; , 8 2 exp ( - 0.15 2 &pi; ( &eta; &OverBar; + 255 ) ) + &zeta; ) , &eta; &OverBar; &le; 0 max ( &zeta; , 3.2 2 exp ( - 0.15 2 &pi; ( 255 - &eta; &OverBar; ) ) + &zeta; ) , &eta; &OverBar; > 0 ,
Wherein, ζ=0.8.
(3c3) by JND tEM(x, y) and JND s(x, y) merges mutually and obtains Spatial-Temporal JND model JND sT(x, y):
JND ST=JND S·JND TEM
Step 4, utilize non-linear additive model by based on the JND model of parallax and Spatial-Temporal JND model JND sT(x, y) combines, and obtains the binocular solid JND model JND based on parallax sTEREO(x, y):
JND STEREO(x,y)=JND ST+JND DIS-θ·min{JND ST,JND DIS}。
Step 5, by the binocular solid JND model JND based on parallax sTEREO(x, y) is for three-dimensional residual error preprocessor
5a) calculate the average JND threshold value quadratic sum of left and right viewpoint each B × B block respectively b gets different values according to different block sizes;
P JND STEREO 2 = 1 B 2 &Sigma; y = 0 B - 1 &Sigma; x = 0 B - 1 ( JND STEREO ( x , y ) ) 2
5b) calculate the mean residual of each B × B block of left and right viewpoint respectively
R b &OverBar; = 1 B 2 &Sigma; y = 0 B - 1 &Sigma; x = 0 B - 1 ( R ori ( x , y ) ) ,
5c) calculate the variance of each pixel residual signals of left and right viewpoint respectively
&sigma; R 2 = 1 B 2 &Sigma; i [ R ori ( x , y ) - M ] 2 ,
M = 1 B 2 &Sigma; i R ori ( x , y ) ,
5d) calculate Optimal Parameters ν according to rate-distortion optimization (RDO) principle: min (J)=min (D+ λ R), it may be used for the relation between balanced code rate and distortion;
&lambda; = 4 &CenterDot; Q P 2 &beta; &CenterDot; &sigma; 2 ,
D = &omega; &CenterDot; v 2 &CenterDot; P JND STEREO 2 &CenterDot; ln ( &sigma; R 2 + 1 ) ,
&PartialD; J &PartialD; v = &PartialD; ( ( &omega; &CenterDot; v 2 &CenterDot; P JND STEREO 2 &CenterDot; ln ( &sigma; R 2 + 1 ) ) + &lambda; &CenterDot; R ) &PartialD; v = 0 ,
v = 1 P JND STEREO 2 &CenterDot; 4 &CenterDot; Q P 2 &alpha; &CenterDot; &beta; &CenterDot; &epsiv; 2 &CenterDot; &omega; &CenterDot; ln ( &sigma; R 2 + 1 ) - 1 ,
5e) utilize the JND threshold value JND obtained in (4) sTEREO(x, y) and from 5a)-5d) residual error R that the parameters value that obtains is original to video sequence ori(x, y) resets, and obtains new residual error R n(x, y), reaches the object of saving code check:
R N ( x , y ) = R ori + v &CenterDot; JND STEREO ( x , y ) , if R ori - R b &OverBar; < - v &CenterDot; JND STEREO ( x , y ) R b &OverBar; , if | R ori - R b &OverBar; | &le; v &CenterDot; JND STEREO ( x , y ) R ori - v &CenterDot; JND STEREO ( x , y ) , otherwise .
The present invention compared with prior art has the following advantages:
1. the present invention carries out preliminary treatment by adopting each two field picture of left and right view respectively based on the method for average drifting Color Segmentation, make the disparity estimation of fringe region more accurate, because marginal texture will attract the more attentiveness of human visual system, the distortion of object boundary is more easily discovered, the disparity estimation precision improving this region can make proposed three-dimensional JND model more perfect, thus brings better visually-perceptible to experience;
2. the binocular solid JND model based on parallax that the present invention obtains better embodies the relation between the left and right viewpoint of three-dimensional video-frequency, utilize the parallax information between viewpoint effectively to eliminate parallax perception redundancy between viewpoint, make significantly to reduce three-dimensional video-frequency code check under the prerequisite of substantially not losing three-dimensional perceived quality;
3. the proposed by the invention and parallax masking effect of simulation, perfectly combine the mechanism of sheltering of human visual system and human vision to the sensitivity of three-dimensional object boundary, the relatively existing three-dimensional JND model of algorithm is comparatively simple.
The simulation experiment result shows, the present invention combines based on the method for average drifting Color Segmentation and the parallax information of three-dimensional video-frequency, three-dimensional edge quality can better be maintained, and parallax perception redundancy, room and time redundancy that effective elimination is unnecessary, video code rate is significantly reduced, and maintain the three-dimensional perceived quality of video, algorithm is simple, is a kind of perception method for encoding stereo video of good performance.
Effect of the present invention can be further illustrated by following emulation experiment:
1. simulated conditions:
Be Intel (R) Core at CPU tM2 core processor T6670: dominant frequency 2.2GHZ, internal memory 2G, operating system: WINDOWS7, emulation platform: JMVC.
As shown in Figure 2 two sequence right view pictures of test stereoscopic video sequence are selected in emulation, wherein:
Fig. 2 (a) is the 35th frame video image in the first test three-dimensional video-frequency right view picture;
Fig. 2 (b) is the 45th frame video image in the first test three-dimensional video-frequency right view picture;
Fig. 2 (c) is the 35th frame video image in the second test three-dimensional video-frequency right view picture;
Fig. 2 (d) is the 45th frame video image in the second test three-dimensional video-frequency right view picture.
2. emulate content:
In emulation experiment, utilize the inventive method and existing multiple view video coding (MVC) method on test video sequence Trepeze, Tunnel, Puppy, Soccer, to carry out stereo scopic video coding emulation respectively, QP gets 25,30,35,40 respectively.
Emulation 1, utilizes the inventive method and MVC method to get different QP respectively and encodes to above-mentioned four test three-dimensional video-frequencies, obtain the average bit rate of 50 reconstruction frames sequences, PSNR and SSIM index is as shown in table 1;
The index contrast of the code check that table 1 utilizes the method for JMVC and the inventive method to obtain, PSNR and SSIM
Table 1 be utilize the method for JMVC and the inventive method to obtain code check, PSNR and SSIM index contrast, as shown in table 1, method of the present invention significantly reduces three-dimensional video-frequency code check, in Soccer sequence, as QP=25, code check is maximum reduces 16.7%, and SSIM does not have anything to change substantially, although PSNR has slight minimizing, but do not affect the three-dimensional perceived quality of video, the present invention only needs the three-dimensional perception redundancy except the insensitive region of human vision, so perceived quality unaffected, as shown in Figure 3.
Emulation 2, utilize the inventive method respectively to shown in Fig. 2 first test three-dimensional video-frequency and second test three-dimensional video-frequency encode, wherein QP=25, obtains the reconstruction frames sequence of two test videos, randomly draw wherein two frames as shown in Figure 3, wherein:
Fig. 3 (a) is for using the inventive method to the 35th frame reconstruction frames image extracted after the first test stereo scopic video coding;
Fig. 3 (b) is for using the inventive method to the 45th frame reconstruction frames image extracted after the first test stereo scopic video coding;
Fig. 3 (c) is for using the inventive method to the 35th frame reconstruction frames image extracted after the second test stereo scopic video coding;
Fig. 3 (d) is for using the inventive method to the 45th frame reconstruction frames image extracted after the second test stereo scopic video coding.
Contrasted from Fig. 3 (a)-3 (d) and Fig. 2 (a)-2 (d), the video reconstruction frames obtained after the inventive method coding and the corresponding frame of original video are substantially without any the decline of perceived quality, no matter be brightness, texture region or the information of object edge, all remain very natural visual effect, and effectively eliminate space, time and look a redundancy, video code rate is significantly reduced, as shown in Table 1.
Emulation 3, utilize the inventive method and existing MVC method to test three-dimensional video-frequency to second shown in Fig. 2 (c) He Fig. 2 (d) respectively to encode, obtain the reconstruction frames sequence of two test videos, extract the 35th and 45 frames as shown in Figure 4, wherein:
Fig. 4 (a) is for using MVC method to the reconstruction image of Fig. 2 (c) after cycle tests coding and partial, detailed view;
Fig. 4 (b) is for using MVC method to the reconstruction image of Fig. 2 (d) after cycle tests coding and partial, detailed view;
Fig. 4 (c) is to the reconstruction image of Fig. 2 (c) after cycle tests coding and partial, detailed view by the inventive method;
Fig. 4 (d) is to the reconstruction image of Fig. 2 (d) after cycle tests coding and partial, detailed view by the inventive method.
From Fig. 4 (a) and the contrast of Fig. 4 (c) and Fig. 4 (b) with Fig. 4 (d), it is more natural that the inventive method obtains the perception of reconstruction frames image vision, athletic leg and trousers boundary information retain complete, ringing effect obviously reduces, visual effect is more clear, and the flat site of back and trousers, also noise is considerably reduced, softer to the stimulation of vision, so the present invention not only significantly reduces three-dimensional video-frequency code check, further improve the perceived quality of video.
More than exemplifying is only illustrate of the present invention, does not form the restriction to protection scope of the present invention, everyly all belongs within protection scope of the present invention with the same or analogous design of the present invention.

Claims (6)

1. based on a perception stereo scopic video coding for the minimum appreciable error model of parallax, it is characterized in that: look a perception redundancy based on left and right viewpoint, comprise the steps:
(1) disparity estimation:
1a) read in each two field picture I that the left and right viewpoint of binocular tri-dimensional video is corresponding respectively iLand I iR, and employing carries out segmentation preliminary treatment based on the method for average drifting Color Segmentation to it, obtains left-eye image I ' iLwith eye image I ' iR;
1b) to I ' iLand I ' iRcarry out Stereo matching, obtain the parallax d (x, y) between left and right viewpoint;
(2) JND model based on parallax is estimated:
According to the relation between the parallax sequence information obtained by input video sequence and human vision sensitivity, calculate the parallax JND threshold value JND of each frame video image at each pixel of input respectively dIS(x, y):
JND DIS(x,y)=ψ·e -d(x,y)
ψ=17,φ=3;
(3) brightness is calculated, texture and time weight JND model:
3a) calculate the brightness JND threshold value JND of left and right viewpoint at each pixel respectively l(x, y), it is determined by brightness masking effect and background luminance contrast;
3b) calculate the texture JND threshold value JND of left and right viewpoint at each pixel respectively t(x, y), it is closely related with texture masking effect and image edge structure, and utilizes non-linear additive model by itself and JND l(x, y) is bonded spatial domain JND model JND s(x, y):
Non-linear additive model: JND s=JND l+ JND t-φ min{JND l, JND t;
3c) calculate the time weight JND threshold value JND of left and right viewpoint at each pixel respectively tEM(x, y), it is determined by the temporal masking effect obtained according to video sequence interframe luminance difference, and by itself and JND s(x, y) merges mutually and obtains Spatial-Temporal JND model JND sT(x, y):
Weighted model: JND sT=JND sjND tEM;
(4) utilize non-linear additive model by based on the JND model of parallax and Spatial-Temporal JND model JND sT(x, y) combines, and obtains the binocular solid JND model JND based on parallax sTEREO(x, y):
JND STEREO(x,y)=JND ST+JND DIS-θ·min{JND ST,JND DIS}
(5) by the binocular solid JND model JND based on parallax sTEREO(x, y) is for three-dimensional residual error preprocessor:
5a) calculate the average JND threshold value quadratic sum of left and right viewpoint each B × B block respectively b gets different values according to different block sizes:
P JND STEREO 2 = 1 B 2 &Sigma; y = 0 B - 1 &Sigma; x = 0 B - 1 ( JND STEREO ( x , y ) ) 2
5b) calculate the mean residual of each B × B block of left and right viewpoint respectively
R b &OverBar; = 1 B 2 &Sigma; y = 0 B - 1 &Sigma; x = 0 B - 1 ( R ori ( x , y ) )
5c) calculate the variance of each pixel residual signals of left and right viewpoint respectively
&sigma; R 2 = 1 B 2 &Sigma; i [ R ori ( x , y ) - M ] 2 ,
M = 1 B 2 &Sigma; i R ori ( x , y ) ,
5d) calculate Optimal Parameters ν according to rate-distortion optimization (RDO) principle: min (J)=min (D+ λ R), it may be used for the relation between balanced code rate and distortion:
&lambda; = 4 &CenterDot; Q P 2 &beta; &CenterDot; &sigma; 2 ,
D = &omega; &CenterDot; v 2 &CenterDot; P JND STEREO 2 &CenterDot; ln ( &sigma; R 2 + 1 ) ,
&PartialD; J &PartialD; v = &PartialD; ( ( &omega; &CenterDot; v 2 &CenterDot; P JND STEREO 2 &CenterDot; ln ( &sigma; R 2 + 1 ) ) + &lambda; &CenterDot; R ) &PartialD; v = 0 ,
v = 1 P JND STEREO 2 &CenterDot; 4 &CenterDot; Q P 2 &alpha; &CenterDot; &beta; &CenterDot; &epsiv; 2 &CenterDot; &omega; &CenterDot; ln ( &sigma; R 2 + 1 ) - 1 ,
5e) utilize the JND threshold value JND obtained in (4) sTEREO(x, y) and from 5a)-5d) residual error R that the parameters value that obtains is original to video sequence ori(x, y) resets, and obtains new residual error R n(x, y):
R N ( x , y ) = R ori + v &CenterDot; JND STEREO ( x , y ) , if R ori - R b &OverBar; < - v &CenterDot; JND STEREO ( x , y ) R b &OverBar; , if | R ori - R b &OverBar; | &le; v &CenterDot; JND STEREO ( x , y ) R ori - v &CenterDot; JND STEREO ( x , y ) , otherwise .
2. a kind of perception stereo scopic video coding based on the minimum appreciable error model of parallax according to claim 1, is characterized in that: described step 1a) in read in each two field picture I corresponding to the left and right viewpoint of binocular tri-dimensional video respectively iLand I iR, and employing carries out segmentation preliminary treatment based on the method for average drifting Color Segmentation to it, obtains image I ' iLand I ' iR, carry out as follows:
Each 1a1) corresponding to left and right viewpoint respectively two field picture carries out average drifting filtering, obtains the information of all subspaces convergence point;
1a2) by being combined by the pixel cluster of the description same area by average drifting filtering, obtain cut zone.
3. the perception stereo scopic video coding based on the minimum appreciable error model of parallax according to claim 1, is characterized in that: described step 1b) in I ' iLand I ' iRcarry out Stereo matching, obtain the parallax d (x, y) between left and right viewpoint, carry out as follows:
1b1) adopt sectional perspective to mate can obtain:
d(x,y)=a·x+b·y+c,
Wherein, a, b, c, be three parameters of specifying disparity plane, they have determined that the parallax d (x, y) of each reference pixel (x, y);
1b2) calculate each pixel absolute error and:
DIS SAD ( x , y , d ) = &Sigma; ( x , y ) &Element; W ( x , y ) | P 1 ( i , j ) - P 2 ( i + d , j ) | ,
1b3) by pixel each in cut zone absolute error and minimize the value obtaining disparity plane and three parameters:
min ( DIS SEG ( R , D ) ) = min ( &Sigma; ( x , y ) &Element; R DIS SAD ( x , y , d ) ) .
4. a kind of perception stereo scopic video coding based on the minimum appreciable error model of parallax according to claim 1, is characterized in that: described step 3a) described in calculate the brightness JND threshold value JND of left and right viewpoint at each pixel respectively l(x, y), carry out as follows:
In 5 × 5 pixel domain centered by (x, y), 3a1) calculate the mean value of background luminance:
I ( x , y ) &OverBar; = 1 32 &Sigma; i = 1 5 &Sigma; j = 1 5 I ( x - 3 + i , y - 3 + j ) &CenterDot; B ( i , j ) ,
Wherein, the brightness that I (x, y) is this pixel, B (i, j) is a weighted low pass ripple device;
3a2) obtain brightness JND threshold value by brightness masking effect and I (x, y):
JND L ( x , y ) = 17 ( 1 - I ( x , y ) &OverBar; 127 ) + 3 , if I ( x , y ) &OverBar; &le; 127 3 128 ( I ( x , y ) &OverBar; - 127 ) + 3 , otherwise .
5. a kind of perception stereo scopic video coding based on the minimum appreciable error model of parallax according to claim 1, is characterized in that: described step 3b) described in calculate the texture JND threshold value JND of left and right viewpoint at each pixel respectively t(x, y), carry out as follows:
3b1) calculating pixel point (x, y) gradient around:
grad n ( x , y ) = 1 16 &Sigma; i = 1 5 &Sigma; j = 1 5 I ( x - 3 + i , y - 3 + j ) &CenterDot; g n ( i , j ) ,
Thus obtain its maximum weighted average gradient: G ( x , y ) = max n = 1,2,3,4 { | grad n ( x , y ) | } ;
3b2) obtain texture JND threshold value JND based on canny rim detection t(x, y):
JND T=θ·G(x,y)·W(x,y),
Wherein, W (x, y) represents the weight model that an edge is relevant.
6. a kind of perception stereo scopic video coding based on the minimum appreciable error model of parallax according to claim 1, is characterized in that: described step 3c) described in calculate the time weight JND threshold value JND of left and right viewpoint at each pixel respectively tEM(x, y), carry out as follows:
Luminance difference between average frame 3c1) calculating consecutive frame in same viewpoint:
&eta; &OverBar; = 1 M &CenterDot; N &Sigma; x &Sigma; y ( I t ( x , y ) - I t - 1 ( x , y ) ) ,
Wherein, M and N presentation video is wide and high;
3c2) free masking effect obtains time weight JND threshold value JND tEM(x, y):
JND TEM = max ( &zeta; , 8 2 exp ( - 0.15 2 &pi; ( &eta; &OverBar; + 255 ) ) + &zeta; ) , &eta; &OverBar; &le; 0 max ( &zeta; , 3.2 2 exp ( - 0.15 2 &pi; ( 255 - &eta; &OverBar; ) ) + &zeta; ) , &eta; &OverBar; > 0 ,
Wherein, ζ=0.8.
CN201410240167.5A 2014-05-30 2014-05-30 A kind of perception stereo scopic video coding based on parallax minimum appreciable error model Active CN105306954B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410240167.5A CN105306954B (en) 2014-05-30 2014-05-30 A kind of perception stereo scopic video coding based on parallax minimum appreciable error model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410240167.5A CN105306954B (en) 2014-05-30 2014-05-30 A kind of perception stereo scopic video coding based on parallax minimum appreciable error model

Publications (2)

Publication Number Publication Date
CN105306954A true CN105306954A (en) 2016-02-03
CN105306954B CN105306954B (en) 2018-05-22

Family

ID=55203629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410240167.5A Active CN105306954B (en) 2014-05-30 2014-05-30 A kind of perception stereo scopic video coding based on parallax minimum appreciable error model

Country Status (1)

Country Link
CN (1) CN105306954B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107071385A (en) * 2017-04-18 2017-08-18 杭州派尼澳电子科技有限公司 A kind of method for encoding stereo video that parallax compensation is introduced based on H265
CN107241607A (en) * 2017-07-18 2017-10-10 厦门大学 A kind of visually-perceptible coding method based on multiple domain JND model
CN107948649A (en) * 2016-10-12 2018-04-20 北京金山云网络技术有限公司 A kind of method for video coding and device based on subjective quality model
CN108521572A (en) * 2018-03-22 2018-09-11 四川大学 A kind of residual filtering method based on pixel domain JND model
CN110505480A (en) * 2019-08-02 2019-11-26 浙江大学宁波理工学院 A kind of quick sensing method for video coding towards monitoring scene
CN111178118A (en) * 2018-11-13 2020-05-19 浙江宇视科技有限公司 Image acquisition processing method and device and computer readable storage medium
CN114697632A (en) * 2022-03-28 2022-07-01 天津大学 End-to-end stereo image compression method and device based on bidirectional condition coding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101841723A (en) * 2010-05-25 2010-09-22 东南大学 Perceptual video compression method based on JND and AR model
CN102137258A (en) * 2011-03-22 2011-07-27 宁波大学 Method for controlling three-dimensional video code rates
CN102710949A (en) * 2012-05-11 2012-10-03 宁波大学 Visual sensation-based stereo video coding method
CN102724525A (en) * 2012-06-01 2012-10-10 宁波大学 Depth video coding method on basis of foveal JND (just noticeable distortion) model
US20140092208A1 (en) * 2012-09-28 2014-04-03 Mitsubishi Electric Research Laboratories, Inc. Method and System for Backward 3D-View Synthesis Prediction using Neighboring Blocks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101841723A (en) * 2010-05-25 2010-09-22 东南大学 Perceptual video compression method based on JND and AR model
CN102137258A (en) * 2011-03-22 2011-07-27 宁波大学 Method for controlling three-dimensional video code rates
CN102710949A (en) * 2012-05-11 2012-10-03 宁波大学 Visual sensation-based stereo video coding method
CN102724525A (en) * 2012-06-01 2012-10-10 宁波大学 Depth video coding method on basis of foveal JND (just noticeable distortion) model
US20140092208A1 (en) * 2012-09-28 2014-04-03 Mitsubishi Electric Research Laboratories, Inc. Method and System for Backward 3D-View Synthesis Prediction using Neighboring Blocks

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107948649A (en) * 2016-10-12 2018-04-20 北京金山云网络技术有限公司 A kind of method for video coding and device based on subjective quality model
CN107948649B (en) * 2016-10-12 2020-07-03 北京金山云网络技术有限公司 Video coding method and device based on subjective quality model
CN107071385A (en) * 2017-04-18 2017-08-18 杭州派尼澳电子科技有限公司 A kind of method for encoding stereo video that parallax compensation is introduced based on H265
CN107071385B (en) * 2017-04-18 2019-01-25 杭州派尼澳电子科技有限公司 A kind of method for encoding stereo video introducing parallax compensation based on H265
CN107241607B (en) * 2017-07-18 2020-06-16 厦门大学 Visual perception coding method based on multi-domain JND model
CN107241607A (en) * 2017-07-18 2017-10-10 厦门大学 A kind of visually-perceptible coding method based on multiple domain JND model
CN108521572B (en) * 2018-03-22 2021-07-16 四川大学 Residual filtering method based on pixel domain JND model
CN108521572A (en) * 2018-03-22 2018-09-11 四川大学 A kind of residual filtering method based on pixel domain JND model
CN111178118A (en) * 2018-11-13 2020-05-19 浙江宇视科技有限公司 Image acquisition processing method and device and computer readable storage medium
CN111178118B (en) * 2018-11-13 2023-07-21 浙江宇视科技有限公司 Image acquisition processing method, device and computer readable storage medium
CN110505480A (en) * 2019-08-02 2019-11-26 浙江大学宁波理工学院 A kind of quick sensing method for video coding towards monitoring scene
CN114697632A (en) * 2022-03-28 2022-07-01 天津大学 End-to-end stereo image compression method and device based on bidirectional condition coding
CN114697632B (en) * 2022-03-28 2023-12-26 天津大学 End-to-end stereoscopic image compression method and device based on bidirectional conditional coding

Also Published As

Publication number Publication date
CN105306954B (en) 2018-05-22

Similar Documents

Publication Publication Date Title
CN105306954B (en) A kind of perception stereo scopic video coding based on parallax minimum appreciable error model
CN104469386B (en) A kind of perception method for encoding stereo video of the proper appreciable error model based on DOF
Ryu et al. No-reference quality assessment for stereoscopic images based on binocular quality perception
KR101492876B1 (en) 3d video control system to adjust 3d video rendering based on user prefernces
CN102158712B (en) Multi-viewpoint video signal coding method based on vision
Cheng et al. A novel 2Dd-to-3D conversion system using edge information
CN103581648B (en) Draw the hole-filling method in new viewpoint
RU2423018C2 (en) Method and system to convert stereo content
CN102307304B (en) Image segmentation based error concealment method for entire right frame loss in stereoscopic video
Po et al. A new multidirectional extrapolation hole-filling method for depth-image-based rendering
US9235920B2 (en) Method and processor for 3D scene representation
CN102801997A (en) Stereoscopic image compression method based on interest depth
Han et al. Stereoscopic video quality assessment model based on spatial-temporal structural information
CN102710949B (en) Visual sensation-based stereo video coding method
Qi et al. Stereoscopic video quality assessment based on stereo just-noticeable difference model
Jin et al. Validation of a new full reference metric for quality assessment of mobile 3DTV content
Smirnov et al. Methods for depth-map filtering in view-plus-depth 3D video representation
Lee et al. A new framework for measuring 2D and 3D visual information in terms of entropy
CN108668135A (en) A kind of three-dimensional video-frequency B hiding frames error methods based on human eye perception
Han et al. View synthesis using foreground object extraction for disparity control and image inpainting
Hasan et al. No-reference quality assessment of 3D videos based on human visual perception
CN103826135B (en) Three-dimensional video depth map coding method based on just distinguishable parallax error estimation
CN103702120B (en) Subjective distortion estimation method for synthetic viewpoint
CN103997653A (en) Depth video encoding method based on edges and oriented toward virtual visual rendering
CN105915886B (en) A kind of depth map reasoning algorithm based on video compress domain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant