CN1134175C - Video image communication system and implementation method for multi-camera video target extraction - Google Patents

Video image communication system and implementation method for multi-camera video target extraction Download PDF

Info

Publication number
CN1134175C
CN1134175C CNB001214411A CN00121441A CN1134175C CN 1134175 C CN1134175 C CN 1134175C CN B001214411 A CNB001214411 A CN B001214411A CN 00121441 A CN00121441 A CN 00121441A CN 1134175 C CN1134175 C CN 1134175C
Authority
CN
China
Prior art keywords
sub
line segment
video
matching
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB001214411A
Other languages
Chinese (zh)
Other versions
CN1275871A (en
Inventor
芸 何
何芸
张越成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB001214411A priority Critical patent/CN1134175C/en
Publication of CN1275871A publication Critical patent/CN1275871A/en
Application granted granted Critical
Publication of CN1134175C publication Critical patent/CN1134175C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

本发明属于基于信息内容的视频图象通信技术领域,系统包括由与多个摄像头相连的基于多视和多种特征结合的视频对象提取单元和视频对象编码单元组成的发射端,由视频对象解码单元和视频对象显示单元组成的接收端,所说的发射端与接收端通过通信信道相连;本发明可以获得物理目标的三维空间信息,解决了从多视频流中实时提取物理目标的深度信息算法的关键技术,使得视频目标提取能够快速执行。

The invention belongs to the technical field of video image communication based on information content. The system includes a video object extraction unit and a video object coding unit connected to multiple cameras based on multi-view and multiple feature combinations. The video object is decoded Unit and video object display unit composed of the receiving end, said transmitting end is connected with the receiving end through a communication channel; the present invention can obtain the three-dimensional spatial information of the physical target, and solves the depth information algorithm for real-time extraction of the physical target from multiple video streams The key technology enables video object extraction to be performed quickly.

Description

Video image communication system and implementation method that the multi-cam video object extracts
The invention belongs to the video image communication technical field based on the information content, particularly video object extracts, based on the video image code method of the information content.
Based on the video image communication of video object, be a major part among the ISO/IEC MPEG-4 of International Standards Organization.In this standard, video object is limited by the binary picture sequence, and that how this binary picture sequence obtains to international standard is not related, i.e. the extraction of video object is open.The video object extraction is graphical analysis and understands research field an open question so far that it is relevant with image pickup, expression and treatment technology, and is also relevant to the interest of different target with human vision property and different people.In the image communication communication system, existing video object extracting method has following a few class:
(1), adopts the discontinuity split image of texture based on the video object extracting method of texture.Representing document is M.Kunt, A.Ikonomopoulos, and M.Kocher, " Second generation image coding techniques " (second generation image coding technology), Proceedings of the IEEE (IEE's journal), Vol.73 (4) (the 73rd the 4th phase of volume), pp.549-575 (page or leaf), 1985.
(2) based drive video object extracting method adopts motion model coupling divided video target.Representing document is HansGeorge Musmann, Michael Hotter and Jorn Ostermann, " Object-oriented analysis-synthesis coding for moving images " (the active images analysis-composite coding of based target), ImageCommunication (image communication) Vol.1, pp.117-138 (first volume 117-138 page or leaf), 1989.
(3), adopt the discontinuity divided video target of color based on the video standard laid down by the ministries or commissions of the Central Government extracting method of color.Represent document LiHB, Forchheimer R." Location of Face Using Color Cue " (based on people's face location of color), PictureCoding Symposium, P.2.4 (picture coding meeting, the 4th piece of part 2), 1993.
(4) many feature video target extraction method, document is more, for example use motion feature and edge feature divided video target, Reinders, M.van Beek, P., Sankur, B., and van der Lubbe, J. " Facial featurelocalisation and adaptation of a generic face model for model-based coding ", (the face characteristic location in the model-based coding and with general faceform's coupling) Signal Processing:ImageComm., Vol.7, No.1, pp.57-74 (signal processing journal, image communication divides periodical, the 7th the 1st phase of volume, the 57-74 page or leaf), 1995; Cut apart people's face target with motion feature and color characteristic, T.Xie, Y.He, and C.Weng, " Alayeredvideo coding scheme for very low bit rate videophone " (the very layered video coding method of low numeric code rate video telephone), Picture Coding Symposium, pp.343-347 (picture coding meeting, the 343-347 page or leaf), Berlin (Berlin), 1997.
Video communication system based on said method all adopts single camera to obtain video image, is called the video communication system of single camera based on the information content.Feature such as single camera video communication system utilization campaign, texture, color and some priori are extracted object video, are to encode and be sent to communication channel in the unit then with the object video.After receiving terminal is received signal, code word is deciphered the reconstruction video target, by video display display video target.The structure of its communication system as shown in Figure 1.Single camera is made up of two unit at transmitting terminal based on the video communication system of the information content among Fig. 1.First unit is " based on the Video Object Extraction unit of haplopia ", and second unit is " object-oriented video coding unit ".Also form at receiving terminal by two unit.First unit is " an object video decoding unit ", and second is " object video display unit ".
Another kind of video communication system is multi-video communication system (Multi-view video communicationsystem), is designated hereinafter simply as many viewing systems.Existing many viewing systems comprise " looking more-look more type " and " looking-the haplopia type " more:
Look-look more more, comprise automatic monitored control system, multimachine position on-the-spot broadcasting system and camera array system, its structure mainly comprises as shown in Figure 2: have more than two at transmitting terminal (1.。。, n) " haplopia signal encoding unit ", each haplopia signal encoding unit connects a camera.N haplopia code stream input " multi-channel video code stream Multi-connection unit " carried out signal and mixed, and is sent to communication channel then.At receiving terminal, composite bit stream is separated into n independently code stream in " multi-channel video code stream tap ", and n " haplopia signal decoding unit " is reduced into n video image with n video code flow, shown by n video display respectively.The characteristics of this type systematic are to there is no positive connection between looking more, only are system-level a plurality of haplopia communication systems to be merged integral body with certain function of formation.Wherein multimachine position on-the-spot broadcasting system characteristics are the multi-channel videos about Same Scene, and getting parms for concrete image does not have special provision; And the characteristics of camera array system are not only at Same Scene, and the camera parameter of relation of the mutual alignment between the video camera and unit are all had the regulation of comparison strictness.The concrete application comprises three-dimensional video-frequency communication etc.
Look-haplopia more, comprise based on looking selecting system and scene rebuilding system.
Wherein based on looking the selecting system structure as shown in Figure 3, it mainly comprises: position judging module, a plurality of visual acquisition module, a MUX, a single channel video encoding module and a single channel video decode module.Its general workflow is: at first judge observer's present located position by the position judging module, and positional information is transmitted back to the MUX control assembly; MUX is according to suitably choosing of looking of the positional information passed back (or the image of looking in the middle of being undertaken by simple interpolations generates), and send the video encoder module with image as a result; Video encoder is encoded to input imagery, and code stream arrives decoding end by Channel Transmission; Decoder is decoded to code stream, produces decoded picture, and sends to the end user.
The scene rebuilding system configuration as shown in Figure 4, it mainly comprises: a plurality of visual acquisition modules, scene rebuilding module, virtual scene projection module, position judging module and respective coding decoder module.General workflow is: at first the multi-channel video input module send the scene rebuilding module with the multi-channel video that obtains; Then reconstruct virtual 2D or 3D scene by the scene rebuilding module according to many visual informations of importing; Judge the position of observer in virtual scene by the position judging module, and send the virtual scene projection module positional information; By the virtual scene projection module according to the observation the person in virtual scene the position and the virtual scene of generation carry out the virtual generation of looking, and virtual the looking that will generate sent video encoder; Encoder is encoded, and code stream arrives decoder by Channel Transmission; Decoder is finished code stream decoding, produces decoded picture and sends to the end user.This type systematic is not analyzed the content of image.Be that perspective view is not among several figure of easy choice, but will be combined into corresponding figure with system's difference shown in Figure 3.
Said method and system have the following disadvantages: the single camera video communication system has been lost the three-dimensional information of physical target in the process of video image picked-up, with the source of the two-dimensional image after the projection as video image analysis and coding, its result has very big uncertainty.Because the purpose of cutting apart of video object is the prospect and the background of dividing in the video image, only dividing from two-dimensional signal is this probabilistic main cause.And because the information matches operand of looking between the bitstream very big more, many views are not developed in the application of communication system as yet as degree of depth matching algorithm.Can the key issue that be used in based on the communication of the information content be to extract in the real-time operation of depth information.
The objective of the invention is for overcoming the weak point of prior art, the video image communication system that a kind of multi-cam video object extracts is proposed, adopt multi-cam input video image, thereby can obtain the three-dimensional spatial information of physical target, promptly depth information will be to providing important basis cutting apart of prospect and background; The implementation method of Ti Chuing has solved the depth information Algorithm of Key Technology of extract real-time physical target from multiple video strems simultaneously, and making depth information extract can carry out fast.
The present invention proposes the video image communication system that a kind of multi-cam video object extracts, comprise the transmitting terminal of forming by Video Object Extraction unit and object-oriented video coding unit, by the receiving terminal that object video decoding unit and object video display unit are formed, said transmitting terminal links to each other by communication channel with receiving terminal; It is characterized in that, said Video Object Extraction unit carries out matching operation to a plurality of video flowings simultaneously for linking to each other with a plurality of cameras, extract the depth information of object video, on the basis of depth information, motion feature in conjunction with object video, color characteristic, shape facility to video object cut apart based on the Video Object Extraction unit of looking more.
Said system of the present invention is two-way communication system, in each communication ends transmitter unit and receiving element is arranged simultaneously, and works simultaneously.
The present invention proposes a kind of method that realizes said system, may further comprise the steps:
(1) at transmitting terminal, by a plurality of camera input video images, one of them video flowing is a target image, and all the other video flowings are auxiliary image;
(2) under the help of auxiliary image, target image is carried out the analysis and the extraction of said depth information, and carry out motion feature based on depth information, color characteristic, the shape facility video object extracts comprehensive judgement, carries out again based on locations of pixels corresponding relation between a plurality of video flowings of analysis, thus the 3D object segmentation of the matching result of the depth information of calculating subject, thereby extract video object, its result is expressed as the binary picture sequence of video object;
(3) the object-oriented video coding unit is according to the binary picture sequence of video object, and the source target image is carried out coding based on object video, thereby forms the code stream based on object video, is sent to communication channel;
(4) at receiving terminal, the object video decoding unit will be reduced into the image based on object video based on the code stream of object video;
(5) the object video display unit independently shows each object video.
Related definition in the inventive method is as follows:
Target image: be a certain frame in the video to be split.
Reference picture: be the respective frame in the reference video.
Target segment: be the intersection of some nuclear faces and target image, if the photocentre line of two optical systems and direction of line scan level then are the part (or all) of a certain scan line.
With reference to line segment: the intersection of reference picture and same nuclear face.In fact because previously described reason, the matching problem that the pixel matching problem in reference picture and the target image is put on can being converted into reference to line segment and target segment under the specific hypothesis.
The line segment coupling: we are defined as line segment A and line segment B coupling, and target segment A is with consistent with reference to the Origin And Destination of line segment B.
Sub-line segment: line segment is divided into nonoverlapping subinterval, and each subinterval is a sub-line segment.
Matching degree: by the decision of matching degree flow function function value size.
The histogram functions of line segment: the picture element on line segment carries out brightness statistics, the number of the picture element of getting certain brightness that obtains and the corresponding relation of corresponding bright.
Histogram operation: be actually the picture element with certain brightness in the image become and have another or the image transform processes of the picture element of several brightness in addition.
Sub-line segment: be target segment or with reference to a continuation subset of putting on the line segment.
The rapid analysis and the extracting method of depth information adopt multiple iteration between a plurality of video flowings that the present invention obtains multi-cam, the algorithm of refinement successively, and each layer may further comprise the steps:
(1) imports target segment and respectively with reference to line segment;
(2) carry out the histogram adjustment respectively to said target segment with reference to line segment;
(3) adjusted line segment is established the feature thresholding;
(4) with above-mentioned thresholding line segment is carried out coarse segmentation and obtain sub-line segment, then according to histogram antithetical phrase line segments extraction feature;
(5) carry out characteristic matching with the sub-line segment of target with reference to sub-line segment;
(6) matching result is carried out the judgement that whether will cut apart again;
(7) if do not satisfy condition then enter down one deck, repeating step (1) is to step (7);
Module is cut apart in the unified input of last each layer matching result, thereby finishes cutting apart and coupling of specified accuracy.
Above-mentioned histogram adjustment is to carrying out respectively from the target segment of two visual fields with reference to line segment, and concrete grammar is:
(1) the highest brightness value Max and the minimum brightness value Min of statistics whole piece target segment.
(2), otherwise every bit on the line segment is made following luminance transformation if the difference of Max and Min less than some thresholding Th1, then is changed to its brightness average with the brightness of being had a few on this line segment: g ( x ) = f ( x ) - Min Max - Min × VMax
Wherein f (x) is the conversion desired value, and g (x) is a transformation results, and Vmax is the excursion of the brightness of system.
Above-mentioned establishment feature gate method concrete steps are as follows: after the histogram adjustment, the whole piece line segment is divided into different zones according to thresholding, thereby seeks the corresponding relation of each sub-line segment for the coupling of line segment; (1) setting a thresholding Th2 is one and is slightly less than 50% numerical value; (2) if Th2<30% then carries out histogram equalization to the adjusted line segment of histogram; (3) find brightness value DU make brightness greater than the pixel number of DU in two line segments shared toatl proportion just greater than Th2; (4) find brightness value DD make brightness less than the pixel number of DD in two line segments shared toatl proportion just greater than Th2; (5) pixel between statistics brightness DU and the DD is sought the local valley of its number; (6) if local valley do not occur, then reduce Th2, repeat (2)-(5); (7), repeat (2)-(5) if a plurality of valley then increases Th2; (8) with the thresholding of valley as thresholding.
Above-mentioned sub-line segment feature extracting method can adopt following concrete steps: (1) is cut apart to target segment with reference to line segment with above-mentioned thresholding; (2) point of the attribute of the same race that will link to each other joins together into section; (3) characteristic value of extracting each sub-line segment is maximum Mmax in the sub-line segment, the minimum M min in the sub-line segment, the length M length of sub-line segment, the average brightness Maverage of sub-line segment pixels.
The method of above-mentioned subcharacter line segment coupling can adopt following concrete steps: (1) hypothetical target line segment is split into the nonoverlapping sub-line segment of m bar, is designated as C[1] ... C[m]; Then be split into the sub-line segment of n bar non-overlapping copies with reference to line segment, be designated as R[1] ... R[m].Its characteristic value is the picture element mean value of corresponding sub-line segment; (2) establish every strip line segment corresponding weights and be respectively KC[i], KR[j], equal the length of corresponding sub-line segment respectively; (3) get a part (i in the space of m * n ... i+4, j ... j+4); (4) determine its matching degree: to sub-line segment coupling is right one to one: the sub-line segment C[i of hypothetical target line segment] in the reference line segment with sub-line segment R[j] corresponding, then the matching degree that produced of this sub-line segment correspondence is: FV [ i , j ] = KC [ i ] + KR [ j ] 2 × ( C [ i ] - R [ j ] ) One-to-many is mated sub-line segment: the sub-line segment C[i+1 of hypothetical target line segment] with C[i] in the reference line segment with sub-line segment R[j] corresponding, then the matching degree of this part is: FV [ i , j ] + FV [ i + 1 , j ] = KC [ i ] + KC [ i + 1 ] + KR [ j ] 2 × ( C [ i ] × KC [ i ] + C [ i + 1 ] × KC [ i + 1 ] KC [ i ] + KC [ i + 1 ] + R [ j ] ) To there not being coupling sub-line segment: C[i] or R[j], its matching degree of separate provision is:
FV[i,0]=KC[i]×OcP
FV[0, j]=KR[j] * OcP wherein OcP calculate the FV[on its each height section respectively for blocking penalty factor (5) to each bar candidate matches path ,], then the final coupling tolerance factor S FV in whole piece coupling path is all FV[on the path ,] sum; (6) path candidate of calculating smallest match measure coefficient.
The method that the continuing of the sub-line segment of above-mentioned coupling cut apart judgement can adopt following concrete steps: (1) is to carry out the 3D object segmentation in view of the purpose of whole algorithm, be included into the sub-line segment of object or background scope for whole line segment, needn't have further mated; (2) brightness does not have the sub-line segment of fluctuating, i.e. the sub-line segment of those of Mmax-Mmin<a certain thresholding Th3; (3) the too short sub-line segment of length, the i.e. sub-line segment of those of Mlength<a certain thresholding Th4; (4) Dui Ying whole sub-line segment meets above-mentioned 3 sub-line segment; (5) the sub-line segment that will mate makes equal in length by interpolation, asks whole line segment sad value again, and this value is right less than those sub-line segments of a certain thresholding Th5; (6) to the processing of no matching section,, think the blocked area, further do not mate for the sub-line segment that does not have coupling.
The present invention adopts many features to the method that video object extracts on the basis of depth information, may further comprise the steps: (1) replenishes judgement with colouring information to the result of depth information analysis; (2) with movable information depth information is replenished judgement: (3) also can adopt further expanding of other information; (4) adopt division-merging method that video object is cut apart.
Above-mentionedly with colouring information the result of depth information analysis is replenished decision method and can adopt following concrete steps: the thresholding that (1) adopts directivity to face territory minimal difference figure is divided target image is carried out spatial sub area dividing based on color; (2) adopt the zone to overflow the water algorithm to the regional merging of the spatial sub of color; (3) with the combining of depth information, carry out subregion degree of depth thresholding according to the maximum likelihood mean depth of color subregion and cut apart.
Above-mentioned movable information replenishes decision method to depth information can adopt following concrete steps; (1) with the criterion of different motor patterns as the subregion area dividing; (2) with the identical motor pattern of different subregions as the foundation that merges; (3) interframe of cutting apart as object according to motion vector is inherited.
The above-mentioned method that further expands based on other information can comprise the employing marginal information, more advanced processes information etc.
The method that above-mentioned employing division-merging method is cut apart video object can comprise:
At first divide, its concrete steps are as follows: decision function F is divided in one of (1) definition Seg(A|I) wherein I is a target image to be split, and A is the subregion of a connection on it; (2) when dividing decision function at the value of subregion A division thresholding, i.e. F greater than certain setting Seg(A|I)>Th SegThe time, subregion A further is divided into the m sub regions; (3) foundation of Hua Fening is certain metric function sum minimalization on A, that is: ( m , A 1 . . . . A m ) = Para ( min ( Σ i = 1 m D ( A i ) ) ) Wherein D (.) is that the subregion that is adopted is divided metric function;
The concrete steps that merge then are as follows: concurrent sentencing function F of (1) definition Merge(A 1, A 2..., A n| I) A wherein i(i=1,2 ..., n) be any n communicated subarea among the I; (2) when concurrent sentencing function during less than the thresholding of certain setting, this n sub regions is merged into a sub regions A: above-mentioned division methods and merging method replace iteration and carry out.
Above-mentioned division-merging method is used for the method that the video object of multiple information characteristics cuts apart can adopts following concrete steps: (1) adopts N feature (F 1, F 2..., F N) T, at first they are divided into two groups of not mutual exclusion:
U seg=(F i1,F i2,…,F iK) T
U Merge=(F I1, F I2..., F IK) T(2) U wherein SegFor being used for divided characteristic collection U MergeBe the feature set that will be used to merge; (3) respectively according to U SegAnd U MergeDesign F Seg(A|I) and F Merge(A 1, A 2..., A n| I), and divide metric function D (.); (4) with the F that obtains Seg(A|I), F Merge(A 1, A 2..., A n| I) and in the above-mentioned division of D (.) substitution-merging method formula, promptly ( m , A 1 . . . . A m ) = Para ( min ( Σ i = 1 m D ( A i ) ) ) F Merge(A 1, A 2..., A n| I) just obtain the division merge algorithm of a various features combination; (5) the division merge algorithm that combines as the various features of concurrent sentencing with the subregion maximum likelihood degree of depth.
Above-mentioned maximum likelihood degree of depth decision method can adopt following concrete steps: (1) definition makes posterior probability
The maximum x of P (d (z)=x|z ∈ A, I, Dd (I)) is the maximum likelihood degree of depth of subregion A.Wherein d (z) is the degree of depth of z pixel, and A is subregion to be adjudicated, and I is a target image to be split, and Dd (I) is an optical parallax field; (2) the subregion maximum likelihood degree of depth is reduced to the two-value criterion:
F Dts=P (d (z)<Th d| z ∈ A, I, Dd (I)) be the ratio of the subregion mid point degree of depth less than certain certain threshold; (3) depth information is included in the step of division-merge algorithm;
The method of above-mentioned 3D object segmentation based on matching result can adopt following steps: (1) is divided into object according to the matching result of sub-line segment with coupling starting point and the sub-line segment that coupling terminal point parallax all surpasses a certain thresholding Th6; (2) will mate starting point and be divided into background with the sub-line segment that coupling terminal point parallax all is no more than a certain thresholding Th6; (3) continue to cut apart the coupling iteration for other zones; (4) till whole segmentation result can satisfy required precision.
Characteristics of the present invention and effect:
The video image communication system that the multi-cam video object that the present invention proposes extracts, thus input constitutes as image based on video information content coding and video image communication system notion and system's realization by multi-cam.The Video Object Extraction unit will carry out matching operation to a plurality of video flowings, thus according to the degree of depth, color, the multiple information relevant with the physics video object such as motion are cut apart video object information.Encoding to the video object after cutting apart in the object-oriented video coding unit, is sent to transmission channel then.At receiving terminal, video decoding unit is told video unit to code stream decoding, and the final video display is to different video object independent displaying.
Because the present invention adopts the fast algorithm that depth information extracts between a plurality of video flowings that multi-cam is obtained, making depth information extract can carry out fast, thereby the video image communication system that the multi-cam video object is extracted can realize.
Because the many feature video target extraction algorithm based on depth information of the present invention makes target extract and obtains better effect, for the communication based on video information content provides better target source.Adopt various features to carry out having high efficient and accuracy cutting apart of video object.
The simple declaration of accompanying drawing: Fig. 1 is the video communication system structured flowchart of existing single camera based on the information content.Fig. 2 is existing look-multi-video communication system architecture block diagram more.Fig. 3 for existing based on looking looking-haplopia video communication system structured flowchart of choosing more.Fig. 4 looks-haplopia video communication system structured flowchart for existing scene rebuilding more.Fig. 5 is the video image communication system architecture block diagram that multi-cam video object of the present invention extracts.Fig. 6 is parallel optical axis condition of the present invention and search 1 dimensionization schematic diagram.Fig. 7 is the geometric projection schematic diagram on the coaxial plane of the present invention.Fig. 8 is the fast method FB(flow block) that depth information of the present invention extracts.Fig. 9 is the Optimum Matching schematic diagram of cutting apart the son section of the present invention.Figure 10 is the path candidate schematic diagram of smallest match measure coefficient of the present invention.Figure 11 is depth information rapid extraction experimental simulation result of the present invention.Wherein: Figure 11 (a) is ball letter left side frame video input figure (500 * 500);
Figure 11 (b) is the right frame video input figure (500 * 500) of ball letter;
Figure 11 (c) is ball_letter
Figure 11 (d) is a man sequence left side frame video input imagery (384 * 384);
Figure 11 (e) is the right frame video input imagery (384 * 384) of man sequence;
Figure 11 (f) is a man sequence segmentation result.
In conjunction with each accompanying drawing operation principle of the present invention and embodiment are described in detail as follows:
The video image communication system architecture that multi-cam video object of the present invention extracts as shown in Figure 5, comprise by based on looking the transmitting terminal of forming with the Video Object Extraction unit and the object-oriented video coding unit of various features combination more, by the receiving terminal that object video decoding unit and object video display unit are formed, transmitting terminal links to each other by communication channel with receiving terminal; The Video Object Extraction unit links to each other with a plurality of cameras and simultaneously the depth information between the video flowing of target image and a plurality of auxiliary image formation is carried out matching operation, and video object information is cut apart, and its result is expressed as the binary picture sequence of video object; The object-oriented video coding unit is according to the binary picture sequence of video object, and the source target image is carried out coding based on object video, thereby forms the code stream based on object video, is sent to communication channel; At receiving terminal, the object video decoding unit will be reduced into the image based on object video based on the code stream of object video; The object video display unit independently shows object video.
The rapid extracting method principle analysis of depth information between a plurality of video flowings that multi-cam is obtained of the present invention:
With two cameras is example, if the parallel optical axis condition is satisfied in the geometric position of two cameras, thereby make two matching problems between the video image be reduced to the linear search matching problem, as shown in Figure 6: suppose that stereo projection system satisfies parallel optical axis condition (epipolar condition), the optical axis that is optical projection system O1 and O2 be parallel to each other (might as well be assumed to be the Z direction), the projection of then a certain spatial point P in two visual fields must be on P and the determined plane of projection centre separately, two visual fields, this plane is nuclear face (epipolar plane), and P1 is on the nuclear face PO1O2 in P2.The space is on the intersection that the projection o'clock in two optical projection systems on certain nuclear face X also must be in nuclear face and corresponding projection plane, that is to say if F1 is the intersection of the image plane S1 of X and O1 system, and F2 is the intersection of the image plane S2 of X and O2 system, then the projection of space corresponding points in the O2 system of the last arbitrfary point of F1 must drop on the F2, and vice versa.Therefore in the process of search volume corresponding points, can be reduced to the matching problem of corresponding points on two straight lines, the obvious like this complexity that greatly reduces problem.If O1O2 is parallel with horizontal scanning line, then all can be parallel with horizontal scanning line at each nuclear face, therefore the data on each bar scan line of two central final images that obtained in visual field are inevitable from same nuclear face, that is can be the right problem of search match point on corresponding scan line, with the right problem reduction of search match point in two visual fields.
The position of spatial point in stereo projection system and the relation of spatial depth, as shown in Figure 7: the distance as plane and lens mid point of supposing two cameras is l (thinking that in most of the cases l is approximately equal to focal length of lens f) with being without loss of generality, and the optical center of lens spacing of two video cameras is 2d.
Relative position py1 and py2 according to space object point P projection P1 and P2 on two image planes can obtain the space coordinates that P is ordered.The P point is on straight line O1P1, so xp, yp satisfy: yp = py 1 l × xp - d P is also on straight line O2P2, so xp, yp satisfy again simultaneously: yp = py 2 l × xp + d Above two solution of equations of simultaneous get:
Figure C0012144100133
Therefore, the degree of depth xp of space object point is only relevant with the difference py1-py2 of its relative position of projection on two image planes, and irrelevant with the concrete numerical value of py1, py2, only need obtain this object point and get final product at the parallax of stereo image pair.
The rapid analysis and the extracting method of depth information can adopt multiple iteration between a plurality of video flowings that the present invention obtains multi-cam, the algorithm of refinement successively, and as shown in Figure 8, each layer may further comprise the steps: (1) imports target segment respectively and with reference to line segment; (2) carry out the histogram adjustment respectively to said target segment with reference to line segment; (3) adjusted line segment is established the feature thresholding; (4) with above-mentioned thresholding line segment is carried out coarse segmentation and obtain sub-line segment, then according to histogram antithetical phrase line segments extraction feature; (5) carry out characteristic matching with the sub-line segment of target with reference to sub-line segment; (6) matching result is carried out the judgement that whether will cut apart again; (7) if do not satisfy condition then enter down one deck, repeating step (1) is to step (7); Module is cut apart in the unified input of last each layer matching result, thereby finishes cutting apart and coupling of specified accuracy.
Above-mentioned histogram method of adjustment is to carrying out respectively from the target segment of two visual fields with reference to line segment, specifically can may further comprise the steps: the highest brightness value Max and the minimum brightness value Min of (1) statistics whole piece target segment.(2), otherwise every bit on the line segment is made following luminance transformation if the difference of Max and Min less than some thresholding Th1, then is changed to its brightness average with the brightness of being had a few on this line segment: g ( x ) = f ( x ) - Min Max - Min × VMax
Wherein f (x) is the conversion desired value, and g (x) is a transformation results, and Vmax is the excursion of the brightness of system.
Above-mentioned establishment feature gate method can adopt following concrete steps: after the histogram adjustment, the whole piece line segment is divided into different zones according to thresholding, thereby seeks the corresponding relation of each sub-line segment for the coupling of line segment: thresholding Th2 of (1) setting is one and is slightly less than 50% numerical value; (2) if Th2<30% then carries out histogram equalization to the adjusted line segment of histogram; (3) find brightness value DU make brightness greater than the pixel number of DU in two line segments shared toatl proportion just greater than Th2; (4) find brightness value DD make brightness less than the pixel number of DD in two line segments shared toatl proportion just greater than Th2; (5) pixel between statistics brightness DU and the DD is sought the local valley of its number; (6) if local valley do not occur, then reduce Th2, repeat (2)-(5); (7), repeat (2)-(5) if a plurality of valley then increases Th2; (8) with the thresholding of valley as thresholding.
Above-mentioned sub-line segment feature extracting method can adopt following concrete steps: (1) is cut apart to target segment with reference to line segment with above-mentioned thresholding; (2) point of the attribute of the same race that will link to each other joins together into section; (3) characteristic value of extracting each sub-line segment is maximum Mmax in the sub-line segment, the minimum M min in the sub-line segment, the length M length of sub-line segment, the average brightness Maverage of sub-line segment pixels.
The principle of above-mentioned subcharacter line segment coupling is as follows: with the average of the sub-line segment weight as corresponding points, can obtain the sub-line segment of target and with reference to the coupling corresponding relation between the sub-line segment.A sub-line segment has following several correspondence, i.e. correspondence, one-to-many, no correspondence one by one.If the one-to-many situation occurs, then to make sub-line segment and merge, it is corresponding one by one that it is converted into.The coupling of sub-line segment can make the problem of optimal path of coupling measure coefficient FV minimum with search in the space of a m * n, and as shown in Figure 9: how matched accuracy being quantified as coupling measure coefficient FV is the difficult point of this algorithm.The coupling of whole piece line segment be each sub-line segment coupling and effect, so total coupling measure coefficient FV in the alternative coupling of each bar path be on this coupling path each sub-line segment coupling measure coefficient and.The coupling measure coefficient of every strip line segment should have following character: (1) is approximate more with the corresponding sub-line segment of the length of sub-line segment proportional substantially (2), and this value is more little
The method of above-mentioned subcharacter line segment coupling can adopt following concrete steps: (1) hypothetical target line segment is split into the nonoverlapping sub-line segment of m bar, is designated as C[1] ... C[m]; Then be split into the sub-line segment of n bar non-overlapping copies with reference to line segment, be designated as R[1] ... R[m].Its characteristic value is the picture element mean value of corresponding sub-line segment; (2) establish every strip line segment corresponding weights and be respectively KC[i], KR[j], equal the length of corresponding sub-line segment respectively; (3) get a part (i in the space of m * n ... i+4, j ... j+4); (4) determine its matching degree: to sub-line segment coupling is right one to one: the sub-line segment C[i of hypothetical target line segment] in the reference line segment with sub-line segment R[j] corresponding, then the matching degree that produced of this sub-line segment correspondence is: FV [ i , j ] = KC [ i ] + KR [ j ] 2 × ( C [ i ] - R [ j ] ) One-to-many is mated sub-line segment: the sub-line segment C[i+1 of hypothetical target line segment] with C[i] in the reference line segment with sub-line segment R[j] corresponding, then the matching degree of this part is: FV [ i , j ] + FV [ i + 1 , j ] = KC [ i ] + KC [ i + 1 ] + KR [ j ] 2 × ( C [ i ] × KC [ i ] + C [ i + 1 ] × KC [ i + 1 ] KC [ i ] + KC [ i + 1 ] + R [ j ] ) To there not being coupling sub-line segment: C[i] or R[j], its matching degree of separate provision is:
FV[i,0]=KC[i]×OcP
FV[0, j]=KR[j] * OcP wherein OcP for blocking penalty factor (1) to each bar candidate matches path, calculate the FV[on its each height section respectively, ], then the final coupling tolerance factor S FV in whole piece coupling path is all FV[on the path ,] sum: (2) calculate the path candidate of smallest match measure coefficient.
The path candidate method concrete steps of above-mentioned smallest match measure coefficient are as follows, as shown in figure 10:
From 1 to n order line by line, and calculate minimum FV in all coupling paths from (0,0) to current point according to the order pointwise of i from 1 to m according to j at each interline.Only there are three directions of search in regulation to current point, and (i j) is current point to point, and for current point, this coupling path only may be from 1,2,3 three kind of direction enters in the drawings.Calculate (i, minimum total coupling tolerance factor S FV the time, 1,2,3 (thick dashed line) are respectively three path candidates in all coupling paths j); Notice simultaneously owing to allowing the coupling of one-to-many during the course, so must increase some several path candidates according to former matching result.For example for enter along direction 3 (i, path j) is if at (i, j-1) Optimum Matching path selected in the judgement is 1 (heavy line), then takes all factors into consideration, and this path is (i-1, j-2)-(i, j), so 4 (fine dotted lines) also should be a path candidate; Equally for enter along direction 1 (i, path j) is if at (i-1, j) Optimum Matching path selected in the judgement is 1 (heavy line), then takes all factors into consideration, and this path is (i-2, j-1)-(i, j), so 5 (fine dotted lines) also should be a path candidate.Therefore total path candidate number can be above 6.If stipulate that some directions are consistent with the optimal path approach axis of its pairing starting point then do not increase the candidate search path.With to every candidate matches path, advance to that (total coupling measure coefficient that i, total coupling measure coefficient j) equal the starting point of this time coupling adds that this mates the pairing coupling measure coefficient of sub-line segment.For example for path candidate 3, its advance (i, coupling measure coefficient j) is:
SFV (i, j)=SFV (i, j-1)+(i j-1) enters from all that (i selects the SFV reckling as entering (i, optimal path j) in path candidate j) to FV.Continue then a bit carrying out down.Up to (m, n) till.Only need this moment from (m n) prolongs each some approach axis pointwise and falls back back, just can find whole piece Optimum Matching path up to (0,0).According to the method that preamble is mentioned each height section in Optimum Matching path is analyzed, just can be drawn the corresponding relation between the sub-line segment.Final step is that many line segments with the same sub-line segment of correspondence are merged, and so just obtains this final result of matching.
The method that the continuing of the sub-line segment of above-mentioned coupling cut apart judgement can adopt following concrete steps: (1) is to carry out the 3D object segmentation in view of the purpose of whole algorithm, be included into the sub-line segment of object or background scope for whole line segment, needn't have further mated; (2) brightness does not have the sub-line segment of fluctuating, i.e. the sub-line segment of those of Mmax-Mmin<a certain thresholding Th3; (3) the too short sub-line segment of length, the i.e. sub-line segment of those of Mlength<a certain thresholding Th4; (4) Dui Ying whole sub-line segment meets above-mentioned 3 sub-line segment; (5) the sub-line segment that will mate makes equal in length by interpolation, asks whole line segment sad value again, and this value is right less than those sub-line segments of a certain thresholding Th5; (6) to the processing of no matching section,, think the blocked area, further do not mate for the sub-line segment that does not have coupling.
The present invention adopts many features to the method that video object extracts on the basis of depth information, may further comprise the steps: (1) replenishes judgement with colouring information to the result of depth information analysis; (2) with movable information depth information is replenished judgement; (3) also can adopt further expanding of other information; (4) adopt division-merging method that video object is cut apart.
Above-mentionedly with colouring information the result of depth information analysis is replenished decision method and can adopt following concrete steps: the thresholding that (1) adopts directivity to face territory minimal difference figure is divided target image is carried out spatial sub area dividing based on color; (2) adopt the zone to overflow the water algorithm to the regional merging of the spatial sub of color; (3) with the combining of depth information, carry out subregion degree of depth thresholding according to the maximum likelihood mean depth of color subregion and cut apart.
Above-mentioned movable information replenishes decision method to depth information can adopt following concrete steps: (1) is with the criterion of different motor patterns as the subregion area dividing; (2) with the identical motor pattern of different subregions as the foundation that merges; (3) interframe of cutting apart as object according to motion vector is inherited.
The above-mentioned method that further expands based on other information comprises the employing marginal information, more advanced processes information etc.
The method that above-mentioned employing division-merging method is cut apart video object can adopt following concrete steps: at first divide, its concrete steps can comprise: decision function F is divided in one of (1) definition Seg(A|I) wherein I is a target image to be split, and A is the subregion of a connection on it; (2) when dividing decision function at the value of subregion A division thresholding, i.e. F greater than certain setting Seg(A|I)>Th SegThe time, subregion A further is divided into the m sub regions; (3) foundation of Hua Fening is certain metric function sum minimalization on A, that is: ( m , A 1 . . . . A m ) = Para ( min ( Σ i = 1 m D ( A i ) ) ) Wherein D (.) is that the subregion that is adopted is divided metric function.
Merge then, its concrete steps can comprise: concurrent sentencing function F of (1) definition Merge(A 1, A 2..., A n| I) A wherein i(i=1,2 ..., n) be any n communicated subarea among the I; (2) when concurrent sentencing function during, this n sub regions is merged into a sub regions A less than the thresholding of certain setting; Above-mentioned division methods and merging method will replace iteration and carry out.
The video object that the present invention is used for multiple information characteristics with above-mentioned division-merging method is cut apart can adopt following concrete steps: (1) adopts N feature (F 1, F 2..., F N) T, at first they are divided into two groups of not mutual exclusion:
U seg=(F i1,F i2,…,F iK) T
U Merge=(F I1, F I2..., F IK) T(2) U wherein SegFor being used for divided characteristic collection U MergeBe the feature set that will be used to merge; (3) respectively according to U SegAnd U MergeDesign F Seg(A|I) and F Merge(A 1, A 2..., A n| I), and divide metric function D (.); (4) with the F that obtains Seg(A|I), F Merge(A 1, A 2..., A n| I) and in the above-mentioned division of D (.) substitution-merging method formula, promptly ( m , A 1 . . . . A m ) = Para ( min ( Σ i = 1 m D ( A i ) ) ) F Merge(A 1, A 2..., A n| I) just obtain the division merge algorithm of a various features combination; (5) can adopt following concrete steps with the subregion maximum likelihood degree of depth as the above-mentioned maximum likelihood degree of depth of the division merge algorithm decision method that the various features of concurrent sentencing combines: (1) definition makes posterior probability
The maximum x of P (d (z)=x|z ∈ A, I, Dd (I)) is the maximum likelihood degree of depth of subregion A.Wherein d (z) is the degree of depth of z pixel, and A is subregion to be adjudicated, and I is a target image to be split, and Dd (I) is an optical parallax field.(2) the subregion maximum likelihood degree of depth is reduced to the two-value criterion:
F Dis=P (d (z)<Th d| z ∈ A, I, Dd (I)) be the ratio of the subregion mid point degree of depth less than certain certain threshold.(3) depth information is included in the step of division-merge algorithm.
The method of above-mentioned 3D object segmentation based on matching result can comprise the steps: (1) matching result according to sub-line segment, and coupling starting point and the sub-line segment that coupling terminal point parallax all surpasses a certain thresholding Th6 are divided into object; (2) will mate starting point and be divided into background with the sub-line segment that coupling terminal point parallax all is no more than a certain thresholding Th6; (3) continue to cut apart the coupling iteration for other zones; (4) till whole segmentation result can satisfy required precision.Depth information rapid extraction experimental simulation result of the present invention is as shown in figure 11: wherein: ball letter sequence experimental result, Figure 11 (a) is ball letter left side frame video input figure (500 * 500), Figure 11 (b) is the right frame video input figure (500 * 500) of ball letter, and figure (c) is the ball_letter left side frame segmentation result iteration number of plies 1.Operation time 31ms.PII-400 PC, C language are adopted in computing.
Man sequence experimental result, Figure 11 (d) is a man sequence left side frame video input imagery (384 * 384), Figure 11 (e) is the right frame video input imagery (384 * 384) of man sequence, and Figure 11 (f) is 3 operation times of man sequence segmentation result iterations: 50 frame 8.74s.PII-400 PC, C language are adopted in computing.
The embodiment of the video image communication system that multi-cam video object of the present invention extracts is described as follows:
One of embodiment:
Adopt P-II 400 PCs, be equipped with two above USB CMOS OV6620 cameras, adopt many USB plug-in card that vision signal is inputed to PC, meeting under the coaxial condition, adopting rapid extracting method and the present invention of depth information between a plurality of video flowings that the present invention obtains multi-cam that the fast algorithm that extracts based on many feature video target of depth information is analyzed multi-video stream.Utilize the degree of depth that scene is divided into different prospects and background, obtain different video target binary time series.Thereby can adopt coding method (as MPEG-4) that video object is encoded based on video object.Network Transmission can adopt based on IP protocol with hardware plug-in card
Two of embodiment:
Adopt the hardware-accelerated subplan of PC plug-in card, rapid extracting method and the present invention of depth information are finished by the hardware plug-in card the computing of extracting based on many feature video target of depth information and the parallel Video Object Extraction of finishing based on many features of PC between a plurality of video flowings that obtain with multiple video strems input and to multi-cam.All the other calculate with embodiment 1).Plug-in card mainly contains the multiple video strems input unit and video arithmetic element able to programme is formed.Such as adopting programmable chip Trimedia as the hardware core device.
Three of embodiment:
The hardware embodiment of complete divorced from computer.Hardware system is by the multiple video strems input unit, and video arithmetic element able to programme and Network Transmission interface unit are formed.Such as adopting programmable chip Trimedia as the hardware core device.

Claims (14)

1.一种多摄像头视频目标提取的视频图象通信系统,包括由视频对象提取单元和视频对象编码单元组成的发射端,由视频对象解码单元和视频对象显示单元组成的接收端,所说的发射端与接收端通过通信信道相连;其特征在于,所说的视频对象提取单元为与多个摄像头相连同时对多个视频流进行匹配运算,提取视频对象的深度信息,在深度信息的基础上,结合视频对象的运动特征,颜色特征,形状特征对视频目标分割的基于多视的视频对象提取单元。1. The video image communication system that a kind of multi-camera video object extracts, comprises the transmitting end that is made up of video object extracting unit and video object coding unit, the receiving end that is made up of video object decoding unit and video object display unit, said The transmitting end is connected to the receiving end through a communication channel; it is characterized in that the said video object extraction unit is connected with multiple cameras and simultaneously performs matching operations on multiple video streams to extract the depth information of the video object, based on the depth information , a video object extraction unit based on multi-view for segmenting video objects by combining motion features, color features, and shape features of video objects. 2.一种实现如权利要求1所述系统的方法,包括以下步骤:2. A method for realizing the system as claimed in claim 1, comprising the steps of: (1)在发射端,由多个摄像头输入视频图象,其中一个视频流为目标图象,其余视频流为辅助图象;(1) At the transmitting end, a plurality of cameras input video images, wherein one video stream is the target image, and the remaining video streams are auxiliary images; (2)在辅助图象的帮助下,对目标图象进行所说深度信息的分析和提取,及进行基于深度信息的运动特征,颜色特征,形状特征视频目标提取综合判断,再进行基于分析多个视频流之间像素的位置对应关系,从而计算被拍摄物体的深度信息的匹配结果的3D物体分割,从而提取出视频目标,其结果表示为视频目标的二值图象序列;(2) With the help of the auxiliary image, analyze and extract the depth information of the target image, and carry out motion features based on the depth information, color features, shape feature video target extraction comprehensive judgment, and then carry out multiple judgments based on analysis The positional relationship of pixels between two video streams, thereby calculating the 3D object segmentation of the matching result of the depth information of the object to be photographed, thereby extracting the video target, and the result is expressed as a binary image sequence of the video target; (3)视频对象编码单元根据视频目标的二值图象序列,对源目标图象进行基于视频对象的编码,从而形成基于视频对象的码流,发送至通信信道;(3) The video object coding unit carries out coding based on the video object to the source object image according to the binary image sequence of the video object, thereby forming a code stream based on the video object, and sending to the communication channel; (4)在接收端,视频对象解码单元将基于视频对象的码流还原成基于视频对象的图象;(4) At the receiving end, the code stream based on the video object is restored to an image based on the video object by the video object decoding unit; (5)视频对象显示单元对各个视频对象进行独立的显示。(5) The video object display unit independently displays each video object. 3、如权利要求2所述的实现方法,其特征在于,所说的多摄像头获取的多个视频流之间深度信息的快速分析和提取方法,采用多重迭代,逐层细化的算法,每一层包括以下步骤:3. The implementation method as claimed in claim 2, characterized in that, the method for fast analysis and extraction of depth information between a plurality of video streams obtained by said multi-camera adopts multiple iterations and a layer-by-layer refinement algorithm, each One layer consists of the following steps: (1)分别输入目标线段和参考线段;(1) Input target line segment and reference line segment respectively; (2)对所说的目标线段和参考线段分别进行直方图调整;(2) carry out histogram adjustment to said target line segment and reference line segment respectively; (3)对调整后的线段确立特征门限;(3) Establishing a feature threshold for the adjusted line segment; (4)用上述门限对线段进行粗分割得到子线段,然后根据直方图对子线段提取特征;(4) Roughly segment the line segment with the above-mentioned threshold to obtain sub-line segments, and then extract features from the sub-line segments according to the histogram; (5)将目标子线段和参考子线段进行特征匹配;(5) carrying out feature matching with the target sub-line segment and the reference sub-line segment; (6)对匹配结果进行是否要再分割的判断;(6) Carry out the judgment whether to divide again to matching result; (7)如果不满足条件则进入下一层,重复步骤(1)至步骤(7);(7) If the condition is not satisfied, then enter the next layer, and repeat steps (1) to (7); 最后各层匹配结果统一输入分割模块,从而完成规定精度的分割与匹配。Finally, the matching results of each layer are uniformly input into the segmentation module, so as to complete the segmentation and matching with specified accuracy. 4、如权利要求3所述的实现方法,其特征在于,所述的直方图调整按两个视场的目标线段和参考线段分别进行,具体包括以下步骤:4. The implementation method according to claim 3, wherein said histogram adjustment is carried out respectively according to the target line segment and the reference line segment of the two fields of view, specifically comprising the following steps: (1)统计整条目标线段的最高亮度值Max与最低亮度值Min;(1) Count the highest brightness value Max and the lowest brightness value Min of the entire target line segment; (2)如果Max与Min的差值小于某一个域值Th1,则将此线段上的所有点的亮度置为其亮度均值,否则对线段上每一点作如下亮度变换: g ( x ) = f ( x ) - Min Max - Min &times; VMax (2) If the difference between Max and Min is less than a certain threshold value Th1, then set the brightness of all points on the line segment to their brightness average, otherwise, perform the following brightness transformation on each point on the line segment: g ( x ) = f ( x ) - Min Max - Min &times; VMax 其中f(x)为变换目标值,g(x)为变换结果,Vmax是系统的亮度的变化范围。Where f(x) is the transformation target value, g(x) is the transformation result, and Vmax is the variation range of the brightness of the system. 5、如权利要求3所述的实现方法,其特征在于,所说的确立特征门限方法具体步骤如下:5. The implementation method according to claim 3, characterized in that the specific steps of said method for establishing a feature threshold are as follows: (1)设定一个域值Th2为一个略小于50%的数值;(1) Setting a threshold value Th2 to be a value slightly less than 50%; (2)如果Th2<30%,则对直方图调整过的线段进行直方图均衡化;(2) If Th2<30%, perform histogram equalization on the line segment adjusted by the histogram; (3)找到亮度值DU使亮度大于DU的像素点数在两条线段中所占的总比例刚刚大于Th2;(3) Find the brightness value DU so that the total ratio of the number of pixels with a brightness greater than DU in the two line segments is just greater than Th2; (4)找到亮度值DD使亮度小于DD的像素点数在两条线段中所占的总比例刚刚大于Th2;(4) Find the brightness value DD so that the total proportion of the pixel points whose brightness is less than DD in the two line segments is just greater than Th2; (5)统计亮度DU与DD之间的像素,寻找其数目的局部谷值;(5) Count the pixels between the brightness DU and DD, and find the local valley value of the number; (6)如果没有出现局部谷值,则减小Th2,重复(2)-(5);(6) If there is no local valley, reduce Th2 and repeat (2)-(5); (7)如果出现多个谷值,则增大Th2,重复(2)-(5);(7) If there are multiple valleys, increase Th2 and repeat (2)-(5); (8)以谷值作为域值的门限。(8) Take the valley value as the threshold of the threshold. 6、如权利要求3所述的实现方法,其特征在于,所述子线段特征提取具体步骤如下:6. The implementation method according to claim 3, characterized in that, the specific steps of feature extraction of the sub-line segment are as follows: (1)用上述门限对目标线段和参考线段进行分割;(1) Segment the target line segment and the reference line segment with the above-mentioned threshold; (2)将相连的同种属性的点连缀成段;(2) connect the connected points of the same attribute into segments; (3)提取各子线段的特征值为,子线段中最大值Mmax,子线段中的最小值Mmin,子线段的长度Mlength,子线段像素的亮度平均值Maverage。(3) Extract the eigenvalues of each sub-segment, the maximum value Mmax in the sub-segment, the minimum value Mmin in the sub-segment, the length Mlength of the sub-segment, and the average brightness Maverage of the pixels in the sub-segment. 7、如权利要求3所述的实现方法,其特征在于,所述的子线段特征匹配的具体步骤如下:7. The implementation method according to claim 3, wherein the specific steps of said sub-line feature matching are as follows: (1)假设目标线段被分割为m条不重叠的子线段,记为C[1]…C[m];而参考线段则被分割为n条互不重叠的子线段,记为R[1]…R[m];其特征值为对应子线段的象素点平均值;(1) Assume that the target line segment is divided into m non-overlapping sub-line segments, denoted as C[1]...C[m]; and the reference line segment is divided into n non-overlapping sub-line segments, denoted as R[1] ]…R[m]; its characteristic value is the average value of the pixel points corresponding to the sub-line segment; (2)设每条子线段相应的权重分别为KC[i]、KR[j],分别等于对应子线段的长度;(2) The corresponding weights of each sub-line segment are respectively KC[i], KR[j], which are respectively equal to the length of the corresponding sub-line segment; (3)取m×n的空间的一部分(i…i+4,j…j+4);(3) Take a part of the space of m×n (i...i+4, j...j+4); (4)确定其匹配度:(4) Determine its matching degree: (5)对一一对应的子线段匹配对:假定目标线段的子线段C[i]在参考线段中与子线段R[j]对应,则该子线段对应所产生的匹配度为: FV [ i , j ] = KC [ i ] + KR [ j ] 2 &times; ( C [ i ] - R [ j ] ) (5) Matching pairs of one-to-one sub-segments: assuming that the sub-segment C[i] of the target line segment corresponds to the sub-segment R[j] in the reference line segment, the corresponding matching degree of the sub-segment is: FV [ i , j ] = KC [ i ] + KR [ j ] 2 &times; ( C [ i ] - R [ j ] ) 对一对多匹配子线段:假定目标线段的子线段C[i+1]与C[i]在参考线段中与子线段R[j]对应,则这部分的匹配度为: FV [ i , j ] + FV [ i + 1 , j ] = KC [ i ] + KC [ i + 1 ] + KR [ j ] 2 &times; ( C [ i ] &times; KC [ i ] + C [ i + 1 ] &times; KC [ i + 1 ] KC [ i ] + KC [ i + 1 ] + R [ j ] ) One-to-many sub-segment matching: Assume that the sub-segments C[i+1] and C[i] of the target line segment correspond to the sub-segment R[j] in the reference line segment, then the matching degree of this part is: FV [ i , j ] + FV [ i + 1 , j ] = KC [ i ] + KC [ i + 1 ] + KR [ j ] 2 &times; ( C [ i ] &times; KC [ i ] + C [ i + 1 ] &times; KC [ i + 1 ] KC [ i ] + KC [ i + 1 ] + R [ j ] ) 对无匹配子线段:C[i]或R[j],分别规定其匹配度为:For non-matching sub-segments: C[i] or R[j], the matching degree is specified as:                             FV[i,0]=KC[i]×OcPFV[i, 0]=KC[i]×OcP                             FV[0,j]=KR[j]×OcPFV[0, j]=KR[j]×OcP 其中OcP为遮挡惩罚因子where OcP is the occlusion penalty factor (6)对每一条候选匹配路径,分别计算其各个子段上的FV[,],则整条匹配路径的最终匹配度量因子SFV为路径上所有FV[,]之和;(6) For each candidate matching path, calculate FV[,] on each sub-section respectively, then the final matching measure factor SFV of the whole matching path is the sum of all FV[,] on the path; (7)计算最小匹配度量因子的候选路径。(7) Calculate the candidate path with the minimum matching metric factor. 8、如权利要求3所述的实现方法,其特征在于,所述的匹配子线段的继续分割判断的方法具体步骤如下:8. The implementation method according to claim 3, characterized in that, the specific steps of the method for continued segmentation judgment of the matching sub-line segment are as follows: (1)鉴于整个算法的目的是进行3D物体分割,对于整个线段已经被归入物体或背景范围的子线段,不必进行进一步的匹配;(1) In view of the fact that the purpose of the whole algorithm is to perform 3D object segmentation, no further matching is necessary for sub-line segments whose entire line segment has been classified into the object or background range; (2)亮度无起伏的子线段,即Mmax-Mmin<某一域值Th3的那些子线段;(2) Sub-line segments with no fluctuation in brightness, that is, those sub-line segments with Mmax-Mmin<a certain threshold value Th3; (3)长度过短的子线段,即Mlength<某一域值Th4的那些子线段;(3) sub-line segments with too short length, that is, those sub-line segments with Mlength<a certain threshold value Th4; (4)对应的全部子线段符合上述3条的子线段;(4) All corresponding sub-line segments conform to the above three sub-line segments; (5)将匹配的子线段通过插值使长度相等,再求整个线段SAD值,该值小于某一域值Th5的那些子线段对;(5) Make the lengths of the matched sub-line segments equal by interpolation, and then find the SAD value of the entire line segment, which is less than those sub-line segment pairs of a certain threshold value Th5; (6)对无匹配段的处理,对于无匹配的子线段,认为是遮挡区,不进行进一步的匹配。(6) For the processing of unmatched segments, for unmatched sub-line segments, they are considered as occluded areas and no further matching is performed. 9、如权利要求2所述的实现方法,其特征在于,所述的多特征对视频目标进行提取的方法,包括以下步骤:9. The implementation method according to claim 2, wherein the method for extracting video objects with multiple features comprises the following steps: (1)用颜色信息对深度信息分析的结果进行补充判决;(1) Use color information to make supplementary judgments on the results of depth information analysis; (2)用运动信息对深度信息补充判决;(2) Use motion information to supplement the judgment of depth information; (3)也可以采用其他信息的进一步扩展;(3) Further expansion of other information may also be used; (4)采用划分—合并方法对视频目标进行分割;(4) The video object is segmented by using the division-merging method; (5)上述用颜色信息对深度信息分析的结果进行补充判决方法具体步骤如下:(5) The specific steps of the above-mentioned supplementary judgment method for the result of depth information analysis using color information are as follows: (6)采用方向性临域最小差值图的门限划分对目标图象进行基于颜色的空间子区域划分;(6) adopting the threshold division of the directional neighborhood minimum difference map to carry out color-based spatial subregion division of the target image; (7)采用区域漫水算法对颜色的空间子区域合并;(7) Merge the spatial sub-regions of the color by using the regional flooding algorithm; (8)与深度信息的结合,根据颜色子区域的最大似然平均深度进行子区域深度域值分割。(8) Combined with the depth information, the sub-region depth threshold segmentation is performed according to the maximum likelihood average depth of the color sub-region. 10、如权利要求9所述的实现方法,其特征在于,所上述运动信息对深度信息补充判决方法具体步骤如下:10. The implementation method according to claim 9, characterized in that, the specific steps of the above-mentioned motion information supplementary judgment method for depth information are as follows: (1)以不同的运动模式作为子区域区域划分的判据;(1) Using different motion patterns as the criterion for sub-region division; (2)以不同子区域相同的运动模式作为合并的依据;(2) Use the same motion pattern in different sub-regions as the basis for merging; (3)根据运动矢量作物体分割的帧间继承。(3) Inter-frame inheritance of object segmentation based on motion vectors. 11、如权利要求9所述的实现方法,其特征在于,所上述采用划分—合并方法对视频目标进行分割的方法具体步骤如下:11. The method according to claim 9, characterized in that, the above-mentioned method of dividing and merging the video object using the method of dividing and merging is characterized in that the specific steps are as follows: 首先进行划分,具体包括:First divide, specifically include: (1)定义一个划分判决函数(1) Define a partition decision function (2)Fseg(A|I)(2) F seg (A|I) (3)其中I为待分割目标图象,A为其上一个连通的子区域;(3) Wherein I is the target image to be segmented, and A is a connected sub-region on it; (4)当划分判决函数在子区域A的取值大于某个设定的划分门限,即Fseg(A|I)>Thseg时,将子区域A进一步划分为m个子区域;(4) When the value of the division decision function in sub-area A is greater than a certain set division threshold, that is, when F seg (A|I)>Th seg , sub-area A is further divided into m sub-areas; (5)划分的依据是某个度量函数在A上之和取极小值,即: ( m , A 1 . . . . A m ) = Para ( min ( &Sigma; i = 1 m D ( A i ) ) ) (5) The division is based on the fact that the sum of a certain measurement function on A takes a minimum value, namely: ( m , A 1 . . . . A m ) = para ( min ( &Sigma; i = 1 m D. ( A i ) ) ) 其中D(.)为所采用的子区域划分度量函数;Among them, D(.) is the sub-area division metric function adopted; 然后进行合并,具体包括:Then merge, including: (1)定义一个合并判决函数(1) Define a combined decision function    Fmerge(A1,A2,...,An|I)F merge (A 1 ,A 2 ,...,A n |I)    其中Ai(i=1,2,...,n)是I中任意n个连通子区域;Wherein A i (i=1, 2, ..., n) is any n connected subregions in I; (2)当合并判决函数小于某个设定的域值时,将这n个子区域合并为一个子区域A;(2) When the combined decision function is less than a certain threshold value, these n sub-regions are merged into one sub-region A;    上述划分方法和合并方法将交替迭代进行The above division method and merging method will be iteratively carried out alternately 12、如权利要求9或11所述的实现方法,其特征在于,所述的划分—合并方法用于多种信息特征的视频目标分割的具体步骤如下:12. The implementation method according to claim 9 or 11, characterized in that, the specific steps of the division-merging method for video object segmentation of multiple information features are as follows: (1)采用N个特征(F1,F2,…,FN)T,首先将它们分成不互斥的两组:(1) Using N features (F 1 , F 2 , ..., F N ) T , first divide them into two groups that are not mutually exclusive:    Useg=(Fi1,Fi2,…,FiK)T U seg = (F i1 , F i2 , . . . , F iK ) T (2)(2)    Umerge=(Fi1,Fi2,…,FiK)T U merge = (F i1 , F i2 , . . . , F iK ) T (3)其中Useg为将用于划分的特征集而Umerge为将用于合并的特征集;(3) where U seg is the feature set that will be used for division and U merge is the feature set that will be used for merging; (4)分别根据Useg和Umerge设计Fseg(A|I)和Fmerge(A1,A2,...,An|I),以及划分度量函数D(.);(4) Design F seg (A|I) and F merge (A 1 , A 2 ,..., A n |I) according to U seg and U merge respectively, and divide the metric function D(.); (5)将获得的Fseg(A|I)、Fmerge(A1,A2,...,An|I)和D(.)代入上述划分—合并方法算式中,即 ( m , A 1 . . . . A m ) = Para ( min ( &Sigma; i = 1 m D ( A i ) ) ) (5) Substitute the obtained F seg (A|I), F merge (A 1 , A 2 ,..., A n |I) and D(.) into the above division-merge formula, that is ( m , A 1 . . . . A m ) = para ( min ( &Sigma; i = 1 m D. ( A i ) ) ) Fmerge(A1,A2,...,An|I)F merge (A 1 ,A 2 ,...,A n |I) 就得到一个多种特征结合的划分合并算法;A partitioning and merging algorithm combining multiple features is obtained; (6)以子区域最大似然深度作为合并判决的多种特征相结合的划分-合并算法。(6) A split-merge algorithm that uses the sub-region maximum likelihood depth as a combination of multiple features for merging decisions. 13、如权利要求12所述的实现方法,其特征在于,所述最大似然深度判决方法具体步骤如下:13. The implementation method according to claim 12, wherein the specific steps of the maximum likelihood depth judgment method are as follows: (1)定义使后验概率(1) Define the posterior probability P(d(z)=x|z∈A,I,Dd(I))P(d(z)=x|z∈A, I, Dd(I)) 最大的x为子区域A的最大似然深度;其中d(z)为z象素的深度,A为待判决子区域,I为待分割目标图象,Dd(I)为视差场;Maximum x is the maximum likelihood depth of subregion A; Wherein d(z) is the depth of z pixel, A is the subregion to be judged, I is the target image to be segmented, and Dd(I) is the parallax field; (2)将子区域最大似然深度简化为二值判据:(2) Simplify the sub-region maximum likelihood depth into a binary criterion: Fdis=P(d(z)<Thd|z∈A,I,Dd(I))F dis =P(d(z)<Th d |z∈A,I,Dd(I)) 即子区域中点深度小于某特定门限的比例;That is, the proportion of the depth of the point in the sub-region is less than a certain threshold; (3)将深度信息纳入划分—合并算法的步骤之中。(3) Incorporate depth information into the steps of the divide-merge algorithm. 14、如权利要求2所述的实现方法,其特征在于,所述的基于匹配结果的3D物体分割的方法步骤如下:14. The realization method according to claim 2, characterized in that, the steps of the method of 3D object segmentation based on the matching result are as follows: (4)根据子线段的匹配结果,将匹配起点与匹配终点视差都超过某一域值Th6的子线段分割为物体;(4) According to the matching result of the sub-line segment, the sub-line segment with the disparity of the matching start point and the matching end point exceeding a certain threshold Th6 is divided into objects; (5)将匹配起点与匹配终点视差都不超过某一域值Th6的子线段分割为背景;(5) The sub-line segments whose disparity between the matching start point and the matching end point do not exceed a certain threshold value Th6 are divided into backgrounds; (6)对于其他区域进行继续分割匹配迭代;(6) Continue segmentation and matching iterations for other regions; (7)直到整个分割结果能够满足精度要求为止(7) Until the entire segmentation result can meet the accuracy requirements
CNB001214411A 2000-07-21 2000-07-21 Video image communication system and implementation method for multi-camera video target extraction Expired - Fee Related CN1134175C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB001214411A CN1134175C (en) 2000-07-21 2000-07-21 Video image communication system and implementation method for multi-camera video target extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB001214411A CN1134175C (en) 2000-07-21 2000-07-21 Video image communication system and implementation method for multi-camera video target extraction

Publications (2)

Publication Number Publication Date
CN1275871A CN1275871A (en) 2000-12-06
CN1134175C true CN1134175C (en) 2004-01-07

Family

ID=4588797

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB001214411A Expired - Fee Related CN1134175C (en) 2000-07-21 2000-07-21 Video image communication system and implementation method for multi-camera video target extraction

Country Status (1)

Country Link
CN (1) CN1134175C (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607582A (en) * 2005-07-27 2014-02-26 赛达克雷斯特合伙公司 System, apparatus, and method for capturing and screening visual images for multi-dimensional display
WO2007100303A1 (en) * 2006-03-01 2007-09-07 Agency For Science, Technology & Research A method and system for obtaining multiple views of an object for real-time video output
CN101453662B (en) * 2007-12-03 2012-04-04 华为技术有限公司 Stereo video communication terminal, system and method
CN101540916B (en) * 2008-03-20 2010-12-08 华为技术有限公司 Method and device for coding/decoding
ES2389401T3 (en) * 2008-06-17 2012-10-25 Huawei Device Co., Ltd. Method, device and communication system through video
CN101662694B (en) * 2008-08-29 2013-01-30 华为终端有限公司 Method and device for presenting, sending and receiving video and communication system
JP2011029905A (en) * 2009-07-24 2011-02-10 Fujifilm Corp Imaging device, method and program
CN102195894B (en) * 2010-03-12 2015-11-25 腾讯科技(深圳)有限公司 The system and method for three-dimensional video-frequency communication is realized in instant messaging
KR20140011481A (en) * 2011-06-15 2014-01-28 미디어텍 인크. Method and apparatus of motion and disparity vector prediction and compensation for 3d video coding
WO2013089662A1 (en) * 2011-12-12 2013-06-20 Intel Corporation Scene segmentation using pre-capture image motion
CN102722080B (en) * 2012-06-27 2015-11-18 杭州南湾科技有限公司 A kind of multi purpose spatial image capture method based on many lens shootings
CN107547889B (en) * 2017-09-06 2019-08-27 新疆讯达中天信息科技有限公司 A kind of method and device carrying out three-dimensional video-frequency based on instant messaging
CN111915740A (en) * 2020-08-13 2020-11-10 广东申义实业投资有限公司 Rapid three-dimensional image acquisition method

Also Published As

Publication number Publication date
CN1275871A (en) 2000-12-06

Similar Documents

Publication Publication Date Title
US11398037B2 (en) Method and apparatus for performing segmentation of an image
CN104061907B (en) The most variable gait recognition method in visual angle based on the coupling synthesis of gait three-D profile
US7986810B2 (en) Mesh based frame processing and applications
CN1134175C (en) Video image communication system and implementation method for multi-camera video target extraction
US8582866B2 (en) Method and apparatus for disparity computation in stereo images
CN111355956B (en) Deep learning-based rate distortion optimization rapid decision system and method in HEVC intra-frame coding
US9723296B2 (en) Apparatus and method for determining disparity of textured regions
CN105100640A (en) Local registration parallel video stitching method and local registration parallel video stitching system
CN103826125B (en) Concentration analysis method and device for compression monitor video
CN110443883A (en) A kind of individual color image plane three-dimensional method for reconstructing based on dropblock
Guo et al. Real-Time Free Viewpoint Video Synthesis System Based on DIBR and A Depth Estimation Network
Shi et al. Multilevel cross-aware RGBD indoor semantic segmentation for bionic binocular robot
Chen et al. Bidirectional optical flow NeRF: high accuracy and high quality under fewer views
CA2466247C (en) Mesh based frame processing and applications
Deng et al. ScanPCGC: Learning-Based Lossless Point Cloud Geometry Compression using Sequential Slice Representation
CN113487487B (en) Super-resolution reconstruction method and system for heterogeneous stereo image
Zhang et al. A compact stereoscopic video representation for 3D video generation and coding
He et al. Fast mode decision and PU size decision algorithm for intra depth coding in 3D-HEVC
Zhou et al. Dynamic point cloud compression with spatio-temporal transformer-style modeling
Shi et al. Multilevel Cross-Aware RGBD Semantic Segmentation of Indoor Environments
CN109951705A (en) A reference frame synthesis method and device for vehicle object coding in surveillance video
Chai et al. Cascade network for self-supervised monocular depth estimation
Liu et al. Bus scheduling method based on image texture and color analysis
Jiang et al. Sparse Point Clouds Assisted Learned Image Compression
Camps et al. System theoretic methods in computer vision and image processing

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20040107

Termination date: 20110721