CN101610411B

CN101610411B - Video sequence mixed encoding and decoding method and system

Info

Publication number: CN101610411B
Application number: CN2009100888055A
Authority: CN
Inventors: 郭立; 袁红星; 郁理
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2009-07-16
Filing date: 2009-07-16
Publication date: 2010-12-08
Anticipated expiration: 2029-07-16
Also published as: CN101610411A

Abstract

The invention relates to a video sequence mixed encoding and decoding method, which divides an input video into two parts of a video object and a background by image segmentation against a multi-angle shot static scene video sequence. Parameters of a video camera, a three-dimensional wire-frame model of the object and a texture image are reconstructed from the input video by a technology which is based on image modeling, the parameters are also extracted for carrying out compression for realizing compressed encoding of the video object, and the background adopts the prior standard for carrying out the compressed encoding. The invention also provides a video sequence mixed encoding and decoding system. The video sequence mixed encoding and decoding method and the system have the advantages that in addition to the situation that the encoding efficiency at low bit rate is superior to H.264, as the technology which is based on the image modeling can obtain the complete wire-frame model, content-based mixed encoding and decoding can also be carried out on the whole scene video sequence, and a virtual image under any visual angle is generated at a decoding end, so that the method and the system can be widely applied in heritage conservation, commodity virtual display, immersion interconnection experience and other fields.

Description

A kind of method and system of video sequence mixed encoding and decoding

Technical field

The present invention relates to the video compression technology field, particularly a kind of method and system of video sequence mixed encoding and decoding.

Background technology

At present, although video is popularized on network and wireless device, the method for video coding of low bit rate remains the technical problem that is worth discussion.

International Organization for Standardization and international technology federation (ITU-T) from last century late nineteen eighties formulated a series of international standard at different application occasion, different transmission code rate, different images form and different images quality requirement.H.264 this standard in March, 2003 formally by ITU-T by and formally promulgation in the world, with H.263 or MPEG4 compare, under same quality, its code check can reduce about half, thereby has been subjected in the world paying attention to widely and welcoming.

H.264 with former standard (as H.261, H.263, MPEG-1, MPEG-4) in the composition of encoder functionality piece do not have what difference, all be based upon on the Shannon information theory basis, method of wherein using such as predictive coding, transition coding or vector quantization etc. all are that the stochastic behaviour of considering signal compresses, its compression bit rate is subjected to the restriction of rate distortion theory, under certain picture quality, code check can not be very low.These existing methods are that base unit is handled compressed image with piece (block), inevitably will produce blocking artifact under low code check occasion.So, emerged the method for some Low Bit-rate Codings in succession again, as model-based coding (model-based coding), based on the coding of semanteme etc.Wherein the model-based coding is based on a branch of content encoding technology.

Current video encoding standard all is based on the coding of waveform, and H.264 video encoding standard has almost reached the limit based on waveform coding method efficient.This traditional method for video coding is to utilize to describe object based on the moving mass information source model and the color parameter of translation.Yet this information source model does not also meet the agenda of object.This problem has been recognized in content-based coding method.The attempt of this method is divided into zone corresponding to different objects and these objects of encoding respectively to frame of video.For each object, also must the transmission shape information except motion and texture information.For the video of some certain content, content-based coding method can reach very high code efficiency, and required bit number ratio is based on the required bit number much less of the coding method of waveform.Particularly under the situation that the kind of object in video sequence is known, can adopt the three-dimensional wireframe model that pre-defines that object is encoded.Owing to can be adaptive to the shape of object, use predefined wire-frame model, just can significantly increase code efficiency.This coding techniques is called model-based coding again, has obtained great success in people's face coding of video conference call, MPEG4 for this reason specialized designs the threedimensional model of people's face.

The model-based coding is the compaction coding method efficiently of a kind of integrated image analysis and computer graphics.The 3D characteristic of model-based method consideration object is described image.Its coding method has 3D model and 2D model, and the 3D model need be set up the 3D parameter of object; The 2D model then is a kind of method in common, and it need not to know the 3D parameter, often uses the deformation triangle to come split image and comes compressed image based on motion model with affine transformation.Also comprise coding method in the 2D model based on Region Segmentation and motion compensation.The 3D modelling needs the three-dimensional parameter of object, obtains very difficulty of these parameters from image, so with regard to being limited at specific images category is encoded.The model-based coding has the different of essence with traditional coded system.It is not that picture signal is considered as a statistical model, but as a structural model, utilizing structurized mode to describe image forms, therefore can make image restored keep the higher visual quality, overcome traditional coded system effectively and under low code check, produced problems such as serious blocking artifact, be a kind of under low code check condition the high-quality compression method.

MPEG-4 has introduced the faceform at low code check application scenarios such as video conference, video telephone, utilizes the model-based coding techniques to realize the compression of low code check.MPEG-4 sets up threedimensional model and some expressions and the action parameter of good person's face in advance, is the expression and the action of transmission people's face during coding; Decoding end is out of shape according to the model of these parameters to people's face.Faceform's base codec structure as shown in figure 14 among the MPEG-4.

This encoding model of MPEG-4 has been obtained great success in video conference call, yet this method need be predicted video content, that is to say that known video has comprised people's face when coding; This method will have been set up threedimensional model, the i.e. model of people's face in advance in addition.

In a word, be based upon the restriction that H.264 video encoding standard compression bit rate on the Shannon information theory basis is subjected to rate distortion theory, under certain picture quality, code check can not be very low; Though and the model-based encoding scheme among the MPEG-4 can satisfy the requirement of low code check, need the content of precognition video, set up the model of object in advance.For whole large scene, especially go up the unknown content static scene video sequence that obtains from a plurality of angle shots, can not obtain complete wire-frame model usually based on the image modeling technology.That is to say that content-based coding is difficult to whole scene is carried out the difficult problem that modeling still is a prior art, the new method of exploring and solve low bit rate video coding is the important topic that faces at present.

Summary of the invention

Purpose of the present invention is to solve the problems referred to above that prior art exists, and a kind of method and system that can carry out content-based mixed encoding and decoding to the video sequence of whole scene is provided.

For achieving the above object, the invention provides a kind of method of video sequence mixed encoding and decoding, this method is carried out following steps:

One, video sequence is carried out the step of hybrid coding

1) step cut apart of video image adopts video image segmentation technology that inputted video image is divided into object video and two parts of background, to object video part execution in step 2), to background parts execution in step 4);

2) extract the step of object video parameter, adopt based on the three-dimensional parameter of the technology reconstructing video object of image modeling and carry out next step;

3) employing parameter coding technology is compressed and execution in step 5 three-dimensional parameter of object video);

4) adopt existing video encoding standard to compress and execution in step 5);

5) output compressed bit stream.

Two, the step that the compressed bit stream of video sequence mixed coding is decoded

A) from compressed bit stream, separate the three-dimensional parameter that extrudes object video and carry out next step respectively;

B) from compressed bit stream, separate the background that extrudes object video and carry out next step respectively;

C) object video and background are merged, realize the reconstruct of original video.

The method of video sequence mixed encoding and decoding of the present invention is wherein in the step 2 of described extraction object video parameter) in, carry out following steps:

21) extract the parameter of video camera in the object video by input, set up the corresponding relation between two-dimensional position on the three-dimensional position of object video and the plane of delineation;

22) employing is based on the method for image reconstruction, by the wire-frame model that reconstructs 3D shape in the object video;

23) according to the observability of three-dimensional wireframe model,, and be combined into a texture image, calculate the position of summit on texture image of these triangular facets, set up the mapping relations one by one between triangular facet summit and texture image for each triangular facet is given best colouring information.

The method of video sequence mixed encoding and decoding of the present invention is wherein in the step 21 of the parameter of described extraction video camera) in, carry out following steps:

211) from the input video object, detect SIFT characteristic point under the different angles, and these characteristic points are mated;

212) according to the RANSAC algorithm, calculate the basis matrix between image, obtain the relative motion between video camera, promptly obtain the rotation amount R and the translational movement t of the relative world coordinate system of video camera;

213) adopt bundle adjustment that the rotation amount R and the translational movement t of above-mentioned coordinate system are optimized.

The method of video sequence mixed encoding and decoding of the present invention is wherein in described step 22) in, carry out following steps:

221) adopt the method for three-dimensional coupling, on the object video surface or the vicinity find a three dimensions point, be the central configuration voxel with this point, as seed;

222) be the new voxel of surface construction with plane, joining place, seed voxels and object video surface, and with new voxel as seed, continue to obtain and the point that intersects on the object video surface;

223) repeat said process, obtain all and the crossing voxel in object video surface;

224) adopt the stepping cube algorithm, the voxel that surrounds the object video surface is converted into wire-frame model;

225) through triangular facet optimization output three-dimensional wireframe model.

The method of video sequence mixed encoding and decoding of the present invention, therein described step 23) in, carry out following steps:

231) adopt Markov random field model, set up the corresponding relation between wire-frame model summit and all shooting viewpoints;

232) each triangular facet of above-mentioned model is mapped to respectively on the isosceles right triangle in the two-dimensional space;

233) these isosceles right triangles are combined into quadrangle, gray area is illustrated in the space of wasting in the splicing among the figure;

234) triangle after the splicing is carried out color and fill, form final texture image.

The method of video sequence mixed encoding and decoding of the present invention, wherein in the step a) of described decompress(ion) object video three-dimensional parameter, carry out following steps:

A1) from compressed bit stream, recover camera parameters, three-dimensional wireframe model and texture image;

A2) texture image is mapped on the wire-frame model, finishes description object 3D shape and color;

A3) project to respectively under the original shooting viewpoint according to the wire-frame model of camera parameters after, recover object video texture mapping.

For achieving the above object, the invention provides a kind of video sequence mixed encoding and decoding system, comprise object-oriented video coding device and object video decoder, wherein, the object-oriented video coding device comprises:

The Object Segmentation module is used for from video sequence divided video object;

Parameter extraction module: be used for reconstructing camera parameters, three-dimensional wireframe model and texture, so that object to be encoded in the expression video sequence from object video;

Parameter coding module: be used for the parameter that extracts is carried out compressed encoding, form compressed bit stream so that transmission; With

Background coding module: be used for carrying out encoding compression with existing H.264 standard;

Object Segmentation module, parameter extraction module and parameter coding module link to each other successively, and the Object Segmentation module links to each other with the background coding module, and the video sequence of input is connected to the Object Segmentation module, the code stream of parameter coding module and the common output encoder of background coding module;

The object video decoder comprises parameter decoder module and background decoder module, the video sequence of input code flow behind the decompress(ion) that parameter decoder module and background decoder module are exported respectively.

Video sequence mixed encoding and decoding system of the present invention, wherein said parameter extraction module comprises:

The camera parameters extraction module is used for camera parameters and extracts: according to the imaging model of video camera, extract the projection matrix of video camera on from three dimensions to the two dimensional image plane by the method for multi-eye stereo coupling;

Three-dimensional wireframe model reconstructed module is used to adopt the method for voxel growth to reconstruct the wire-frame model of object 3D shape;

The texture image constructing module is used to recover the colouring information of object, forms final texture image;

The parameter coding module comprises:

The video camera parameter coding module is used to adopt harmless entropy coding method compression to extract camera parameters;

MPEG three-dimensional grid coding module, the three-dimensional wireframe model that is used for adopting the 3DMC instrument compression reconfiguration of MPEG4 to go out;

The texture image coding module is used to adopt JPEG coding method compression to extract texture image;

Wherein, the camera parameters extraction module links to each other with the video camera parameter coding module, and three-dimensional wireframe model reconstructed module links to each other with MPEG three-dimensional grid coding module, the texture image constructing module with link to each other with the texture image coding module.Video sequence input camera parameters extraction module, three-dimensional wireframe model reconstructed module and texture image constructing module, the encoded compression module output code flow of video camera parameter coding module, MPEG three-dimensional grid coding module and texture image coding module.

Video sequence mixed encoding and decoding system of the present invention, wherein said parameter decoder module comprises camera parameters decoder module, trellis decode module, texture image decoder module and texture mapping module, wherein, trellis decode module, texture image decoder module link to each other respectively with the texture mapping module, the encoded decompression module of code stream inputs to camera parameters decoder module and trellis decode module, texture image decoder module, camera parameters decoder module and texture mapping module through 3d space to 2D space projection module output video object.

The advantage of the method and system of video sequence mixed encoding and decoding of the present invention is: because the present invention adopts the encoding scheme of model-based for the video of unknown content, adopt the 3D shape of three-dimensional wireframe model representation object, colouring information with the texture indicated object, from video, extract the three-dimensional parameter of object video, the 3D shape and the colouring information that comprise video camera projective parameter, object have obtained the code stream of low code check.Simultaneously; owing to obtained the three-dimensional information of object; therefore decoding end can formation object image under visual angle arbitrarily; at last; can generate various special video effects according to the three-dimensional information of object; for example illumination, distortion etc. solved the problem of the video sequence of whole scene being carried out content-based mixed encoding and decoding, can be widely used in fields such as historical relic's protection, commodity virtual display and immersion interconnection experience.

Description of drawings

Fig. 1 is the block diagram of object-oriented video coding device in the video sequence mixed encoding and decoding system of the present invention;

Fig. 2 is the block diagram of object video decoder in the video sequence mixed encoding and decoding system of the present invention;

Fig. 3 is the block diagram of parameter extraction module among Fig. 1;

Fig. 4 is the block diagram of parameter coding module among Fig. 1;

Fig. 5 is the coding flow chart in the video sequence mixed encoding and decoding of the present invention;

Fig. 6 is the flow chart that extracts the object video three-dimensional parameter;

Fig. 7 is for extracting the camera parameters flow chart;

Fig. 8 is a reconstruct three-dimensional wireframe model flow chart;

Fig. 9 is for extracting the flow chart of texture image;

Figure 10 is the mapping relations figure between isosceles right triangle in threedimensional model triangular facet and the two dimensional surface;

Figure 11 is the design sketch of triangle splicing on the two dimensional surface;

Figure 12 is the decoding process figure in the video sequence mixed encoding and decoding of the present invention;

Figure 13 is the present invention when code check is 160kb/s and the comparison diagram of a certain frame decoding picture quality of cycle tests H.264.

Figure 14 is a MPEG-4 faceform base codec structure;

Figure 15 is visible Shell structure flow chart.

The present invention will be described below in conjunction with accompanying drawing.

Embodiment

For the video of unknown content, adopt the encoding scheme of model-based, key is the three-dimensional parameter that will extract object video from video, comprises the 3D shape and the colouring information of video camera projective parameter, object.The present invention adopts the 3D shape of three-dimensional wireframe model representation object, with the colouring information of texture indicated object.Obtain just can adopting after these information the coding method of model-based.In view of aforementioned existing in prior technology problem, the present invention is directed to the unknown content static scene video sequence that obtains from a plurality of angle shots, a kind of mixed encoding and decoding method based on parameter is proposed.Consider large scene to be difficult to modeling and the relatively easy reconstruct of object is divided into object video and two parts of background with input video by image segmentation.The three-dimensional wireframe model and the texture image that reconstruct camera parameters, object by the technology based on image modeling from input video are used to represent object video, and the parameter that these extract is compressed the coding of realizing object video; Background then adopts existing standard to compress, as H.264.

The method of video sequence mixed encoding and decoding of the present invention, this method is carried out following steps:

One, as shown in Figure 5, the step that video sequence is carried out hybrid coding is:

1) step cut apart of video image adopts video image segmentation technology that inputted video image is divided into object video and two parts of background, to object video part execution in step 2), to background parts execution in step 4).Here, Object Segmentation can adopt common watershed algorithm, based on the algorithm of figure cutting etc.

4) adopt existing video encoding standard to compress and execution in step 5);

5) output compressed bit stream.

Two, as shown in figure 12, the step that the compressed bit stream of video sequence mixed coding is decoded is:

Below key step in the method for video sequence mixed encoding and decoding of the present invention is carried out deep explanation.

Step 2 in said extracted object video parameter) in, as shown in Figure 6, carry out following steps:

In the method for video sequence mixed encoding and decoding of the present invention, camera parameters represents that the projection on the plane of delineation of object dimensional position obtains the relation of two-dimensional position, can simulate the imaging process of real camera with pinhole camera modeling, suppose that the homogeneous coordinates of three dimensions point P are [XYZ1] ^T, the homogeneous coordinates of its subpoint p on the plane of delineation are [xy1] ^T, then there is following relation between them:

λ[xy1] ^T＝K[Rt][XYZ1] ^T

λ is non-zero constant arbitrarily in the following formula; R is 3 * 3 spin matrix, and t is a translation vector, and R and t are called the external parameter of video camera, the relative position relation between expression camera coordinate system and world coordinate system; K is called the inner parameter of video camera, is individual 3 * 3 upper triangular matrix, is defined as follows:

K = [\begin{matrix} f_{x} & s & p_{x} \\ 0 & f_{y} & p_{y} \\ 0 & 0 & 1 \end{matrix}]

F in the formula _xAnd f _yBe focal length; (p _x, p _y) represent the coordinate of basic point (principle point) on the plane of delineation, be respectively half of picture traverse and height; S is the deflection factor, is commonly considered as 0.

The present invention supposes that camera is constant in the shooting process mid-focal length, therefore the inner parameter of video camera is known, only need extract the external parameter of video camera, here, camera parameters extracts will recover K, R and t exactly, adopts the method for multi-eye stereo coupling usually.

Step 21 in the parameter of said extracted video camera) in, as shown in Figure 7, the method for multi-eye stereo coupling is carried out following steps:

213) adopt bundle adjustment (bundle adjustment) that the rotation amount R and the translational movement t of above-mentioned coordinate system are optimized.

When recovering the three-dimensional wireframe model in by object video, the three-dimensional surface of object video is regarded as a contour surface, and the functional value that surperficial area surrounded inside is had a few is all less than zero, and the functional value of being had a few outside the enclosing region is all greater than zero.

Step 22 in the reconstruct of above-mentioned three-dimensional wireframe model) in, as shown in Figure 8, carry out following steps:

221) adopt the method for three-dimensional coupling, on the object video surface or the vicinity find a three dimensions point, be the central configuration voxel with this point, as seed.

As seen shell is an Estimation of its Upper-Bound of object dimensional shape.The visible shell of object can be of equal value on mathematics becomes a contour surface of ambient surface, and with f (X)=0 expression, wherein (x, y z) are coordinates of spatial points to X=.As seen each point of shell enclosing region inside satisfies f (X)＜0; Point outside the enclosing region then satisfies f (X)＞0.As seen the body structure method of shell cuts away all voxels outside contour surface f (X)=0 enclosing region in bounding box exactly.Multi-eye stereo coupling by characteristic point can obtain the 3d space point corresponding with characteristic point, and these 3d space points are positioned on the body surface or the vicinity.With certain point wherein is the central configuration voxel, and then this voxel will intersect with contour surface f (X)=0.

222) be seed with this voxel, growth then can obtain all and the crossing voxel of contour surface up to the whole surface of traversal on contour surface.For this reason, be the new voxel of surface construction with plane, joining place, seed voxels and object video surface, and with new voxel as seed, continue to obtain and the point that intersects on the object video surface.

223) repeat said process, obtain all and the crossing voxel in object video surface.Therefore, by solid coupling structure seed voxels, can realize the structure of visible shell again with the method for voxel growth.

Concrete flow process mainly comprises initialization, multi-eye stereo coupling, voxel growth, trigonometric ratio and adaptively sampled as shown in figure 15.Initialization be used to be provided with initial seed size, set up voxel chained list and triangular facet chained list, be respectively applied for the triangular facet that voxel that storage generates and voxel change into.The multi-eye stereo coupling is by the detection of characteristic point, the center that coupling obtains the initial seed voxel; The voxel growth then travels through the voxel that finds all and contour surface to intersect on contour surface; Trigonometric ratio is represented to change into triangular facet by look-up table with voxel and is represented according to voxel and the crossing situation of contour surface.The voxel size that adaptively sampled amount of curvature adjustment according to the seed voxels place is generated by seed voxels, the local voxel that curvature changing is big is less, and the little local voxel of curvature changing is bigger.

224) adopt the stepping cube algorithm, the voxel that surrounds the object video surface is converted into wire-frame model.

The method of video sequence mixed encoding and decoding of the present invention, adopt the 3D shape of visible shell (visual hull) approximate representation object, realize the reconstruct of 3D shape with the mode of voxel growth, in a further embodiment, also can adopt other reconstructing method, for example multi-eye stereo reconstruct (multi-view stereo reconstruction).

When the 3D shape of reconstructing video object, for the perfect representation object video, also need extract the colouring information of object video, the method for the video sequence mixed encoding and decoding of the present invention colouring information of texture image indicated object.In a further embodiment, also can adopt other reconstructing method.

In the step 23 of said extracted texture image, carry out following steps:

231) as shown in Figure 9, adopt Markov random field model, set up the best correspondence between wire-frame model summit and all shooting viewpoints.

The corresponding relation of setting up between grid vertex and viewpoint is a labeling process in essence.For grid M, set up vertex sequence V={v ₁..., v _mAnd viewpoint between 1 ..., the corresponding relation of n} is found the solution mark vector L={l exactly ₁..., l _m∈ 1 ..., n} ^m, l wherein _j(j=1 ..., m) the best viewpoint of j summit correspondence of expression.The final texture image quality that generates depends on two factors.The one, the angle the normal vector on summit and the directed line segment from the summit to corresponding viewpoint, obviously angle more the video camera under the bright corresponding viewpoint of novel face this summit more; The 2nd, the seam phenomenon that produces during adjacent vertex correspondence different points of view.The target of algorithm will be got a best compromise exactly between these two influencing factors, make each viewpoint corresponding with its best viewpoint as far as possible, the corresponding identical viewpoint of trying one's best between adjacent vertex simultaneously.Based on this target, it is as follows to set up cost function:

E(L)＝E _data(L)+wE _smooth(L)

First correspondence distributed to the cost on summit with viewpoint in the formula, is called the data cost; Second is the cost when distributing different points of view between adjacent vertex, is called level and smooth cost, and w is the weight coefficient of level and smooth cost.By minimum this cost function, can obtain the mark vector L={l of an optimum ₁..., l _m, thereby determined the best viewpoint of each summit correspondence.

Algorithm comes the definition of data cost according to the normal vector on summit and the angle the directed line segment from the summit to corresponding viewpoint, shown in the formula specific as follows:

E_{data} (L) = Σ_{j = 1}^{m} (1 - \cos φ_{j}^{i})

In the formula

The expression vertex v _jNormal vector and from v _jTo viewpoint l _iDirected line segment between angle.Level and smooth cost is defined as follows:

E_{smooth} (L) = Σ_{j = 1}^{m} \underset{k &Element; N (j)}{Σ} e (\Pr_{l_{j}} (v_{j}), \Pr_{l_{k}} (v_{k}))

N in the formula (j) is and vertex v _jThe set that adjacent all summits constitute, Function e (●, ●) be used to measure the difference of color.

Cost function in the formula (3) is a typical discrete multiple labeling Markov random field.To finding the solution of this problem, adopt multichannel figure cutting algorithm (Multiway Graph Cuts).Multichannel figure cutting algorithm has two kinds of methods, i.e. α expansion and α exchange.α expansion method better performances, but need in the level and smooth cost Function e (●, ●) satisfy following regularity conditions:

e(Pr _i(v _k)，Pr _i(v _l))+e(Pr _j(v _k)，Pr _k(v _l))

≤e(Pr _i(v _k)，Pr _k(v _l))+e(Pr _i(v _k)，Pr _i(v _l))

Because the color distortion metric function of definition satisfies following formula at color space, therefore available α expansion is found the solution.

After setting up the corresponding relation between summit and viewpoint, each triangular facet in the grid need be mapped to the triangle on the two dimensional surface.The present invention creates an isosceles right angle texture triangle to each triangular facet in the grid, according to certain rule these right-angled triangles is spliced into quadrangle then, at last these quadrangles is carried out color and fills, and forms texture image.

232) each triangular facet of above-mentioned model is mapped to respectively on the isosceles right triangle in the two-dimensional space as shown in figure 10.

At first to determine the size of the right-angled triangle of triangular facet correspondence.Triangular facet is projected on its all visible visual point image planes, try to achieve maximum projection triangle area.Select the length of side of right-angled triangle, make its area equal or just greater than the maximal projection area of triangular facet.After obtaining the size of the corresponding right-angled triangle of each triangular facet, can set up the one-to-one relationship between the two of right-angled triangle on triangular facet and the two dimensional surface, Figure 10 has provided the schematic diagram of this corresponding relation.

Figure 10 intermediate cam shape Δ T ₀T ₁T ₂Be certain triangular facet in the grid, Δ t ₀t ₁t ₂Be its right-angled triangle in the two-dimensional space correspondence.For Δ t ₀t ₁t ₂In pixel x, it is at Δ T ₀T ₁T ₂The coordinate Calculation method of the 3D point X of last correspondence is as follows:

X＝w ₀T ₀+w ₁T ₁+w ₂T ₂

w ₁＝u/(N-1)

w ₂＝v/(N-1)

w ₀＝1-w ₁-w ₂

Wherein, (u, v) expression point x is with t ₀Be initial point, t ₁-t ₀Be transverse axis, t ₂-t ₀Be the coordinate in the coordinate system of the longitudinal axis; N is Δ t ₀t ₁t ₂Number of pixels on the right-angle side.X just can obtain the corresponding points of each pixel on triangular facet in the right-angled triangle in the through type.

233) as shown in figure 11, these isosceles right triangles are combined into quadrangle, gray area is illustrated in the space of wasting in the splicing among the figure.

In order to improve storage efficiency, obtain the right-angled triangle of each triangular facet correspondence after, these triangles need be combined into square.Triangular facet in the grid is divided into four classes.The first kind is the adjacent triangular facet with identical big or small right-angled triangle; Second class is the adjacent triangular facet of corresponding different big or small right-angled triangles; The 3rd class is the non-conterminous triangular facet with identical big or small right-angled triangle; The 4th class is remaining all triangular facets.Priority orders according to from the first kind to the three classes is combined into square with right-angled triangle.For the second class triangular facet, the less triangle length of side is expanded as the bigger leg-of-mutton length of side.For the 3rd class triangular facet, for fear of the color damage of non-conterminous triangular facet on diagonal, quadrangle diagonal zone doubles during splicing.Then quadrangle and the remaining right-angled triangle that is spliced into carried out descending according to the size of the length of side.According to from big to small, order from left to right, with these quadrangles and remaining rounded projections arranged at a 2 dimensional region, if arrived the border in zone, then newline.The splicing result hint effect as shown in figure 11, gray area is illustrated in the space of wasting in the splicing among the figure.

234) as shown in figure 10, the triangle after the splicing is carried out color fill, form final texture image.The color of triangular apex corresponding pixel points equals its corresponding triangular facet summit in wire-frame model and is distributing the pixel color that projection obtains on the visual point image plane, and the color value of pixel distributes barycentric coodinates weight interpolation of visual point image plane on subpoint color to obtain in corresponding points on the wire-frame model intermediate cam face on three summits by it on the triangle edges and in the enclosing region.

In order to obtain final texture image, need carry out color to the 2 dimensional region that is spliced into and fill.Here adopted color interpolation method based on the barycentric coodinates weight.As shown in figure 10, in order to determine the color of pixel x, at first obtain x at corresponding triangular facet Δ T ₀T ₁T ₂On some X; Then X is projected to summit T respectively ₀, T ₁And T ₂Corresponding visual point image I _i, I _jAnd I _kOn; At last the color of these three subpoints is carried out the color of weighted average as pixel x place.Suppose color color (●) expression, Δ T ₀T ₁T ₂Area be A, Δ XT ₁T ₂, Δ XT ₂T ₀With Δ XT ₀T ₁Area be respectively a, b and c, then the color computing formula of pixel x is as follows:

color (x) = \frac{a}{A} color (\Pr_{i} (X)) + \frac{b}{A} color (\Pr_{j} (X)) + \frac{c}{A} color (\Pr_{k} (X))

In the step a) of above-mentioned decompress(ion) object video three-dimensional parameter, as shown in figure 12, carry out following steps:

In a video sequence mixed encoding and decoding process that with Figure 13 is embodiment, the corresponding video camera of each frame video image is taken viewpoint, the camera parameters that extracts comprises inner parameter (focal length and basic point) and external parameter (spin matrix and translation vector), has reflected from the projection relation of 3 dimension spaces on the 2 dimension planes of delineation.The camera parameter data that extracts is as follows, wherein " // " expression note.

300//video frame number

Hat_texture.jpg//texture image title

640 480//video image resolution

884.253662//focal length of camera

318.299835 244.023193//basic point coordinate

3 row, 3 row spin matrixs the 1st row of the corresponding video camera of-0.081671-0.995944 0.037743//the 1st two field picture

3 row, 3 row spin matrixs the 2nd row of the corresponding video camera of-0.634454 0.022747-0.772625//the 1st two field picture

0.768634 3 row, 3 row spin matrixs the 3rd row of the corresponding video camera of-0.087047-0.633739//the 1st two field picture

The translation vector of the corresponding video camera of-5.136273 5.179986 178.007614//the 1st two field pictures

……

0.880977 3 row, 3 row spin matrixs the 1st row of the corresponding video camera of 0.140073-0.451950//the 300th two field picture

3 row, 3 row spin matrixs the 2nd row of the corresponding video camera of-0.038059-0.931104-0.362764//the 300th two field picture

3 row, 3 row spin matrixs the 3rd row of the corresponding video camera of-0.471626 0.336788-0.814950//the 300th two field picture

3.691115 the translation vector of the corresponding video camera of 1.803580 156.440475//the 300th two field pictures

The 3D shape that extracts is stored as 3D file format---VRML2.0 general on the network.3D shape is made up of summit and triangular facet, has provided three-dimensional shape data below.

#VRML V2.0 utf8//vrml file format identifier

geometry?IndexedFaceSet{

coord?Coordinate{

point[

The space coordinates x on-2.000000-30.098499 2.000000, // the 1 summit, y, z

The space coordinates on-4.000000-30.324100 0.000000, // the 2 summit

The space coordinates on-2.000000-30.759001 0.000000, // the 3 summit

……

4.000000?4.000000?34.160400

]

}

coordIndex[

0,1,2 ,-1, // the 1 triangular facet: the 1st of 0,1,2 expression, the 2nd and the 3rd summit formation triangular facet, the-1st, identifier

3,4,5 ,-1, // the 2 triangular facet

6,7,8 ,-1, // the 3 triangular facet

……

4080，4081，4082，-1，

]

}

Adopt classical entropy coding to compress to camera parameters, the data behind the coding are as follows.

52?61?72?21?1A?07?00?CF?90?73?00?00?0D?00?00?00

00?00?00?00?D3?7A?74?20?90?2F?00?0C?19?00?00?75

4B?00?00?02?64?B5?D9?96?2F?7E?B4?3A?1D?33?0A?00

……

49?CD?2B?70?C6?A5?69?9F?1C?CB?6E?B8?BF?FF?20?C4

3D?7B?00?40?07?00

For three-dimensional shape data, the three-dimensional grid coding tools that adopts MPEG-4 to provide compresses, and the data behind the coding are as follows.

00?20?E0?3C?1D?FF?24?7C?1F?8E?1E?50?00?80?00?81

0A?0D?AD?E1?56?40?02?00?81?80?00?80?00?87?A0?FF

FB?C7?EF?F6?89?CA?50?00?30?00?80?00?31?00?80?71

……

19?40?26?08?D4?68?A2?D3?5B?82?89?16?A9?76?84?B8

31?E9?46?F5?6E?56?60?5D?A7?FF?FF

Adopting classical JPEG compress technique is that texture image is encoded to the color parameter that extracts, and dateout is as follows.

FF?D8?FF?E0?00?10?4A?46?49?46?00?01?01?00?00?01

00?01?00?00?FF?DB?00?43?00?02?01?01?01?01?01?02

01?01?01?02?02?02?02?02?04?03?02?02?02?02?05?04

……

A0?02?8A?28?A0?02?8A?28?A0?02?8A?28?A0?02?8A?28

A0?02?8A?28?A0?0F?FF?D9

Represent video camera coded data beginning flag symbol with AA AA AA A0, AA AA AA A1 represents 3D shape coded data beginning flag symbol, and AA AA AA A2 represents texture image coded data beginning flag symbol, and mixed data are as follows.

AA?AA?AA?A0?52?61?72?21?1A?07?00?CF?90?73?00?00

0D?00?00?00?00?00?00?00?D3?7A?74?20?90?2F?00?0C

19?00?00?75?4B?00?00?02?64?B5?D9?96?2F?7E?B4?3A

……

49?CD?2B?70?C6?A5?69?9F?1C?CB?6E?B8?BF?FF?20?C4

3D?7B?00?40?07?00?AA?AA?AA?A1?00?20?E0?3C?1D?FF

24?7C?1F?8E?1E?50?00?80?00?81?0A?0D?AD?E1?56?40?02

00?81?80?00?80?00?87?A0?FF?FB?C7?EF?F6?89?CA?50?00

……

19?40?26?08?D4?68?A2?D3?5B?82?89?16?A9?76?84?B8

31?E9?46?F5?6E?56?60?5D?A7?FF?FF?AA?AA?AA?A2?FF

D8?FF?E0?00?10?4A?46?49?46?00?01?01?00?00?01?00?01

00?00?FF?DB?00?43?00?02?01?01?01?01?01?02?01?01?01

02?02?02?02?02?04?03?02?02?02?02?05?04

……

A0?02?8A?28?A0?02?8A?28?A0?02?8A?28?A0?02?8A?28

A0?02?8A?28?A0?0F?FF?D9

Color parameter is actually each the triangular facet Fill Color into 3D shape, and for ease of storage and compression, these triangular facets are combined into quadrangle, thereby these colors are formed texture image.

According to identifier AA AA AA A0, AA AA AA A1, AA AA AA A2 obtains video camera coded data, 3D shape coded data and texture image coded data respectively from code stream.

The data decode of camera parameters

Because what the video camera coding adopted is the entropy coding that can't harm, therefore decoded data and initial data are just the same, as follows.

300//video frame number

Hat_texture.jpg//texture image title

640 480//video image resolution

884.253662//focal length of camera

318.299835 244.023193//basic point coordinate

……

The grid data that decodes is as follows.Compare with the original three-dimensional shape data among the figure, some difference of the space coordinates of point, and triangular facet is just the same.This is that triangular facet then can't harm because the coordinate of putting has adopted the coding method that diminishes.

#VRML?V2.0utf8

geometry?IndexedFaceSet{

coord?Coordinate{

point[

-2.0755?-30.1789?1.9591

-4.0667?-30.3716?-0.0321

-2.0755?-30.8212?-0.0321

-0.0843?-31.0782?-0.0321

……

3.9624?3.9286?34.0754

]

}

coordIndex[

0?1 2 -1

3?4 5 -1

6?7 8 -1

9?10?11?-1

……

4080?4081?4082?-1

]

}

After the texture image decoding is exactly the jpeg image of standard.

Texture mapping is actually sets up mapping relations one by one with each triangular facet in the above-mentioned 3D shape and Figure 11, thereby can utilize OpenGL or D3D to draw out the color of 3D shape by rasterisation.Mapping relations one by one between 3D shape and texture image realize that by texture coordinate its data format is as follows.

#VRML?V2.0utf8

geometry?IndexedFaceSet{

coord?Coordinate{

point[

-2.0755?-30.1789?1.9591

-4.0667?-30.3716?-0.0321

-2.0755?-30.8212?-0.0321

-0.0843?-31.0782?-0.0321

……

3.9624 3.9286 34.0754

]

}

coordIndex[

0?1 2 -1

3?4 5 -1

6?7 8 -1

9?10?11?-1

……

4080?4081?4082?-1

]

TexCoord TextureCoordinate{ // texture coordinate

point[

0.0503 the texture coordinate on the 0.5170//1st summit

0.0386 the texture coordinate on the 0.5053//2nd summit

0.0474 the texture coordinate on the 0.4916//3rd summit

……

0.0581?0.4799

]

}

Texture coordinate has all normalized in [0,1] interval range among the last figure.Calculate the position of summit on texture image, only round behind width that the abscissa and the ordinate of texture coordinate need be multiply by texture image respectively and the height.

With embodiment video sequence mixed encoding and decoding system of the present invention is described below.

The equipment of video sequence mixed encoding and decoding system of the present invention can be personal computer, server or mobile device etc.As shown in Figure 1, the present invention adopts the Object Segmentation technology that object video is separated from background, by obtaining camera parameters, three-dimensional wireframe model and texture image respectively, at last the parameter of these extractions is carried out compressed encoding and form code stream then based on the image modeling technology.Corresponding implementation process as shown in Figure 3.Decoding end is recovered camera parameters, three-dimensional wireframe model and texture image respectively from code stream, then texture image is mapped on the threedimensional model, the wire-frame model after utilizing camera parameters with texture mapping at last projects to respectively under the original shooting viewpoint and has just recovered original object video.Corresponding implementation process as shown in Figure 4.Compare with model-based coded system among the MPEG-4, video sequence mixed encoding and decoding system of the present invention does not need to have set up in advance the threedimensional model of object, but therefore utilization can adopt the coding techniques of model-based to the video of unknown content based on the threedimensional model of the technology reconstruct object of image modeling from video.

Video sequence mixed encoding and decoding system of the present invention comprises object-oriented video coding device 100 and object video decoder 200.Wherein, as shown in Figure 1, object-oriented video coding device 100 comprises:

Object Segmentation module 101 is used for from video sequence divided video object;

Parameter extraction module 102: be used for reconstructing camera parameters, three-dimensional wireframe model and texture, so that object to be encoded in the expression video sequence from object video;

Parameter coding module 103: be used for the parameter that extracts is carried out compressed encoding, form compressed bit stream so that transmission; With

Background coding module 104: be used for existing standard, as H.264 carrying out encoding compression.

Object Segmentation module 101, parameter extraction module 102 and parameter coding module 103 link to each other successively, Object Segmentation module 101 links to each other with background coding module 104, the video sequence of input is connected to Object Segmentation module 101, the code stream of parameter coding module 103 and background coding module 104 common output encoders;

Wherein, as shown in Figure 2, object video decoder 200 comprises parameter decoder module 201 and background decoder module 202, the video sequence of input code flow behind the decompress(ion) that parameter decoder module 201 and background decoder module 202 are exported respectively.

As shown in Figure 3, the parameter extraction module 102 of above-mentioned object video comprises:

Camera parameters extraction module 121 is used for camera parameters and extracts: according to the imaging model of video camera, extract the projection matrix of video camera on from three dimensions to the two dimensional image plane by the method for multi-eye stereo coupling;

Three-dimensional wireframe model reconstructed module 122 is used to adopt the method for voxel growth to reconstruct the wire-frame model of object 3D shape;

Texture image constructing module 123 is used to recover the colouring information of object, forms final texture image.

As shown in Figure 3, parameter coding module 103 comprises:

Video camera parameter coding module 131 is used to adopt harmless entropy coding method compression to extract camera parameters;

MPEG three-dimensional grid coding module 132, the three-dimensional wireframe model that is used for adopting the 3DMC instrument compression reconfiguration of MPEG4 to go out;

Texture image coding module 133 is used to adopt JPEG coding method compression to extract texture image.

Wherein, camera parameters extraction module 121 links to each other with video camera parameter coding module 131, and three-dimensional wireframe model reconstructed module 122 links to each other with MPEG three-dimensional grid coding module 132, texture image constructing module 123 with link to each other with texture image coding module 133.Video sequence input camera parameters extraction module 121, three-dimensional wireframe model reconstructed module 122 and texture image constructing module 123, video camera parameter coding module 131, MPEG three-dimensional grid coding module 132 and texture image coding module 133 encoded compression module 105 output code flows.

As shown in Figure 4, the parameter decoder module 201 of above-mentioned object video comprises camera parameters decoder module 2011, trellis decode module 2012, texture image decoder module 2013 and texture mapping module 2014.Wherein, trellis decode module 2012, texture image decoder module 2013 link to each other respectively with texture mapping module 2014, the encoded decompression module 2001 of code stream inputs to camera parameters decoder module 2011 and trellis decode module 2012, texture image decoder module 2013, camera parameters decoder module 2011 and texture mapping module 2014 through 3d space to 2D space projection module 2015 output video objects.

When code check is 160kb/s, the present invention with H.264 be example with an a certain frame decoding image of cycle tests, carried out the contrast of decoded image quality.As shown in figure 13, wherein, 13-1 is an original image, and 13-2 is decoded picture H.264, and 13-3 is a decoded picture of the present invention.As seen from the figure, H.264 method has been lost most of detailed information, and the image that decodes is fuzzyyer; And the image that the present invention decodes is almost just the same with original image.

Above-described embodiment is described preferred implementation of the present invention; be not that scope of the present invention is limited; do not breaking away under the prerequisite that the present invention relates to spirit; various distortion and improvement that the common engineers and technicians in this area make technical scheme of the present invention all should fall in the definite protection range of claims of the present invention.

Claims

1. the method for a video sequence mixed encoding and decoding is characterized in that: this method is carried out following steps:

One, video sequence is carried out the step of hybrid coding

4) adopt existing video encoding standard to compress and execution in step 5);

5) output compressed bit stream;

2. method according to claim 1 is characterized in that: wherein in the step 2 of described extraction object video parameter) in, carry out following steps:

23) according to the observability of three-dimensional wireframe model, for each triangular facet is given best colouring information, and be combined into a texture image, calculate the position of summit on texture image of these triangular facets, set up between triangular facet summit and texture image～mapping relations.

3. method according to claim 2 is characterized in that: wherein in the step 21 of the parameter of described extraction video camera) in, carry out following steps:

4. according to claim 2 or 3 described methods, it is characterized in that: wherein in described step 22) in, carry out following steps:

5. method according to claim 4 is characterized in that: therein described step 23), carry out following steps:

6. method according to claim 5 is characterized in that: wherein in the step a) of described decompress(ion) object video three-dimensional parameter, carry out following steps:

7. video sequence mixed encoding and decoding system, it is characterized in that: comprise object-oriented video coding device (100) and object video decoder (200), wherein, object-oriented video coding device (100) comprising:

Object Segmentation module (101) is used for from video sequence divided video object;

Parameter extraction module (102): be used for reconstructing camera parameters, three-dimensional wireframe model and texture, so that object to be encoded in the expression video sequence from object video;

Parameter coding module (103): be used for the parameter that extracts is carried out compressed encoding, form compressed bit stream so that transmission; With

Background coding module (104): be used for carrying out encoding compression with existing H.264 standard;

Object Segmentation module (101), parameter extraction module (102) and parameter coding module (103) link to each other successively, Object Segmentation module (101) links to each other with background coding module (104), the video sequence of input is connected to Object Segmentation module (101), the code stream of the common output encoder of parameter coding module (103) and background coding module (104);

Object video decoder (200) comprises parameter decoder module (201) and background decoder module (202), the video sequence of input code flow behind the decompress(ion) that parameter decoder module (201) and background decoder module (202) are exported respectively.

8. system according to claim 7 is characterized in that: wherein said parameter extraction module (102) comprising:

Camera parameters extraction module (121) is used for camera parameters and extracts: according to the imaging model of video camera, extract the projection matrix of video camera on from three dimensions to the two dimensional image plane by the method for multi-eye stereo coupling;

Three-dimensional wireframe model reconstructed module (122) is used to adopt the method for voxel growth to reconstruct the wire-frame model of object 3D shape;

Texture image constructing module (123) is used to recover the colouring information of object, forms final texture image;

Parameter coding module (103) comprising:

Video camera parameter coding module (131) is used to adopt harmless entropy coding method compression to extract camera parameters;

MPEG three-dimensional grid coding module (132), the three-dimensional wireframe model that is used for adopting the 3DMC instrument compression reconfiguration of MPEG4 to go out;

Texture image coding module (133) is used to adopt JPEG coding method compression to extract texture image;

Wherein, camera parameters extraction module (121) links to each other with video camera parameter coding module (131), three-dimensional wireframe model reconstructed module (122) links to each other with MPEG three-dimensional grid coding module (132), texture image constructing module (123) with link to each other with texture image coding module (133); Video sequence input camera parameters extraction module (121), three-dimensional wireframe model reconstructed module (122) and texture image constructing module (123), video camera parameter coding module (131), MPEG three-dimensional grid coding module (132) and the encoded compression module of texture image coding module (133) (105) output code flow.

9. according to claim 7 or 8 described systems, it is characterized in that: wherein said parameter decoder module (201) comprises camera parameters decoder module (2011), trellis decode module (2012), texture image decoder module (2013) and texture mapping module (2014), wherein, trellis decode module (2012), texture image decoder module (2013) links to each other respectively with texture mapping module (2014), the encoded decompression module of code stream (2001) inputs to camera parameters decoder module (2011) and trellis decode module (2012), texture image decoder module (2013), camera parameters decoder module (2011) and texture mapping module (2014) arrive 2D space projection module (2015) output video object through 3d space.