CN100407798C

CN100407798C - Three-dimensional geometric mode building system and method

Info

Publication number: CN100407798C
Application number: CN2005100122739A
Authority: CN
Inventors: 汪国平; 王宇宙; 张凯; 葛文兵
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2005-07-29
Filing date: 2005-07-29
Publication date: 2008-07-30
Anticipated expiration: 2025-07-29
Also published as: CN1747559A

Abstract

The present invention discloses a three-dimensional geometric model building system which comprises a plurality of video input devices, a plurality of single-video stream visual analysis units, a multiple-video stream visual analysis unit, a real-time interactive semantic recognition unit, a three-dimensional geometric model building unit, a three-dimensional model plotting unit and a video output device, wherein the video input devices are used for collecting video streams used for a designer to design actions, the single-video stream visual analysis units are used for detecting athletic areas and non athletic areas in the video streams for estimating the athletic direction and the speed of an article, predicting a next position, calculating the edge contour of an athletic article, and estimating contour characteristics, the multiple-video stream visual analysis unit is used for carrying out binocular three-dimensional match, carrying out three-dimensional reconstruction and the match of the athletic track of the article, and calculating the cross section of the athletic article, the real-time interactive semantic recognition unit is used for processing the output of the multiple-video stream visual analysis unit to obtain human-computer interactive semantic meanings, the three-dimensional geometric model building unit is used for obtaining a three-dimensional geometric designing model, and the three-dimensional model plotting unit is used for plotting the geometric model on the video output device which is used for displaying the geometric shape of the article and the three-dimensional geometric model designed by a geometric shape designer.

Description

Three-dimensional geometric mode building system and method

Technical field

The present invention relates to a kind of three-dimensional geometric mode building system and method, relate in particular to real-time, interactive three-dimensional geometric mode building system and the method that are applied to the computer-assisted conception design based on computer stereo vision.

Background technology

At present, in the Design of Industrial Product field, the applied computer-aided design of the detailed design phase in later stage (CAD) technology is quite ripe.For example in auto manufacturing, whole automotive type design flow waterline almost all is to finish by area of computer aided.Yet the conceptual design of product still realizes by cartographical sketching.The designer is according to the creation intention cartographical sketching of oneself.The client selects some patterns that meet demand from tens of the sketches of drawing, feed back to the designer in conjunction with the requirement of some personalizations.The designer requires further refinement sketch in view of the above.Through submission with feedback repeatedly repeatedly, finally determine the conceptual design of a new model.Obviously, the cartographical sketching on plane does not have the vision intuitive, is not easy to the collaborative design in strange land, can't provide direct data for the accurate modeling of next step detailed design.From domestic and international automobile industry development, automobile market is tending towards saturated at present, and increasingly competitive, the automotive development cycle foreshortens to 1 year.These new challenge propelling vehicle manufacturing industry make full use of seeks more efficiently and effectively method for designing to accelerate product development.In the level of informatization more and more higher today, the designer wishes and can freely express design idea by the auxiliary Conceptual Design of automation, and realizes being connected with the automatic of other operation, thus the complete informationization of realization Automobile Design flow process.

Therefore, research computer-assisted conception design Geometric Modeling Method and device, with people's geometry designs notion, the man-machine interaction mode by nature is input to computer system, sets up THREE DIMENSION GEOMETRIC MODELING intuitively.Such apparatus and method have important use to auto manufacturing and are worth.

A kind of natural real-time, interactive three-dimensional geometric design system like this relates to three technical fields, i.e. the vision computing technique of the real time human-machine interaction technology of 3 d geometric modeling technology, nature and support nature real-time, interactive geometry designs.

Aspect the THREE DIMENSION GEOMETRIC MODELING modeling technique, the design of curve form and expression both can be with the method for definite control vertex commonly used, as the NURBS method, also can use curved surface (or three-dimensional body, or curve) to generate by forms of motion.Non-plane motion generation method has wide range of applications, for example, and the design and the expression of the object curved surface of elongated strip; Aircraft configuration also can resolve into the union of strip curved surface (entity).Movement generation method has intuitively, characteristic of simple, makes many surface modeling work obtain simplifying, thereby is subjected to liking of designer deeply.This curved surface form that is called sweeping surface is more meeting the requirements aspect efficient and the quality than other formative method in many occasions, rather than is confined to the formative method of this stationary object of push-and-pull of control vertex.

The THREE DIMENSION GEOMETRIC MODELING research that the real-time, interactive motion generates curved surface relates to man-machine interaction theory and its implementation.Man-machine interaction modelling people to reach the purpose of modeling, is one of restriction key in application factor for the control of moving object.Better human-computer interaction technology makes computer be easy to use, and can enhance productivity.The breadboard HMGR project of University of Washington's human-computer interaction technology (Hand Motion Gesture Recognition System) research gesture identification, they use hidden Markov model to carry out gesture identification.By this system, the interactive user interface designer can realize the multichannel input system, is the gesture sign format with the movement conversion of hand in the three dimensions, thereby the input of other situations such as phonetic entry, static gesture language can be combined.But this device uses the data glove transducer, and only is confined to simple application.

The interactive interface of the heuristic 3 D rendering of Brown Univ USA's development is intended to improve the availability at gesture interaction interface, improves the availability based on the modeling of order.In this system, by relevant how much parts in the high brightness scene, the user passes on the clue that need carry out which kind of operation to system.Possible user's operation is inferred according to this clue by system, and shows with the form of thumbnail.The user finishes edit operation by the method for clicking thumbnail.Operation clue mechanism makes the geometrical relationship between the graphic assembly in user's given scenario.But under the lower situation of operation model discrimination, can use many thumbnails operation indicating to solve problem.

Number of patent application is 00118340 Japan Patent, and a kind of device and recognition methods and program record medium that complicated hand shape image is carried out hand shape and gesture identification is provided.

The Nara, Japan technical college has been set up a kind of hybrid three-dimensional object modeling NIME of system (ImmersiveModeling Environment).Inherited the advantage of conventional two-dimensional gui interface and three-dimensional immersion modeling environment, this system uses the back projection type display device that tilts that the two-dimensional/three-dimensional modeling environment is combined into an integral body.On display surface, use two-dimentional gui interface modeling mutual.Simultaneously, make use preface three-dimensional imaging and form of a stroke or a combination of strokes input unit, realize the bumpless transfer of two-dimensional/three-dimensional modeling environment with six-freedom degree.

The real-time, interactive motion generates curved surface 3 d geometric modeling method and also relates to theory of vision computing and technology.The goal in research that vision is calculated is to make computer have ability by the cognitive three-dimensional environment information of two dimensional image, be that three-dimensional shape is rebuild, object space motion and in the geometric position in space, be to obtain and study the objective world object to scan motion and sports envelope thereof, set up the most important theories instrument of sports envelope geometrical model.Over past ten years, real-time dense stereoscopic parallax coupling becomes a reality.But up to date, can realize that really the system that handles in real time all needs the specialized hardware support as digital signal processor (DSP) or programmable gate array (FPGA).For example, the three-dimensional matching system of J.Woodfill and Von Herzen adopts 16 Xilinx 4025 FPGA, with the image of the velocity process 320*240 pixel of per second 42 frames.And P.Corke uses similar algorithm with Dunn, adopts FPGA hardware to realize, can be with the velocity process 256*256 pixel image of per second 30 frames.

Number of patent application is 03153504 Chinese patent use phase place and stereovision technique, grating is projected on the body surface, and then realize the object dimensional surface profile measurement.

Aspect conceptual design cartographical sketching identification facility, the cartographical sketching instrument of Brown Univ USA's computer graphics study group in conjunction with some characteristics of a paper sketch and computer CAD system, provides the rough 3D polyhedron modeling based on gesture interaction.This instrument adopts early stage traditional 2D interface notion, by this form of cartographical sketching, provides the user to draw the ability of various three-dimensional basic elements according to the regular grass of simple placement.

Tokyo Univ Japan has designed the mutual free form surface design tool based on cartographical sketching, target be set up simply, unmounted model design system fast, as the modelling of roubded and bulging toy or similar object.The user is mutual drafting two dimension stroke on screen, constructs three-dimensional polygon surface from two-dimentional silhouettes.

The ARCADE research project use virtual design desktop of Germany Fraunfofer computer graphics study institute is carried out the Free Surface Modeling Technique Research.Subscriber station used the data glove gesture to realize the modeling of Free Surface before the virtual design desktop.The ARCADE system uses the 3D input equipment to realize effective and accurate modeling ability.Interaction technique comprise the free space Object Creation and based on establishment, the implicit expression boolean operation of other object, 3D picks up with fast moving based on the contextual modification of layout, discrete operations, both hands input etc.

Application number a kind of device that has been 00103458 patent disclosure is used for the process of word processing, and the user carries out character calligraph and manuscript editing with pen and editor's gesture.

From top realization system as can be seen, the technology of the each side that the 3 d geometric modeling instrument of natural interaction is related, under the promotion of Product Conceptual Design application demand, obtained extensive studies, yet many problems that also exist of said system need be solved.Conclusion is got up, and major defect comprises the following aspects:

Man-machine interaction lacks enough naturalities.No matter be virtual design table, data glove, 3D mouse, touch sensor, still online cartographical sketching recognition device all needs the user to dress, directly contact three-dimensional input tool.Thisly can make troubles to the user by upgrade kit and direct man-machine physical interconnection mode.Non-natural design tool can influence catching immediately of designer's creation thought inspiration.Illustrate this problem one of the most direct evidence be exactly, but still to be topmost design tool in the actual design process although need to make repeated attempts in the manual drawing sketch process.

The direct deficiency that geometric modeling is mutual.Cartographical sketching need be converted to two dimensional model with notion, is converted to threedimensional model by identification facility then.Online cartographical sketching system has increased the weight of the constraint to the designer with these two processes by facility constraints together on the contrary.System based on data glove passes through the operation of the action realization of hand for model in the virtual scene, the designer is to be undertaken by indirect mode for modelling and modification, when the designer attempts to change the shape of model, need operate the control point, with the conceptual model of Feedback Design person's design.Design idea is often retrained and interrupts by design tools such as data glove, causes the discontinuity of design process, influences design effect.

Range of application and application mode underaction.Sketch system, pen type input, data glove, device for force feedback and virtual design desktop etc. can only be realized the outside input of single mode.For example, may wish to obtain certain silhouettes of a certain concrete material object in designer's design process, set up geometrical model by this profile by simple mode.Existing system uses special equipment, and the user is directly introduced conceptual design process, also has bigger difficulty.

Realization technology itself is perfect inadequately.Cartographical sketching need be converted to threedimensional model with two dimensional model, has identification error rate problem.Because cartographical sketching has random and the very big degree of freedom, sample collection is very difficult, especially the semanteme of sketch has more ambiguity and uncertainty, both can't enumerate recognition objective with template definition fully, also is difficult to adopt the mode of predefine dictionary library to support its semantic interpretation; Online cartographical sketching has improved recognition correct rate, but in the actual use, geometry designs reciprocal process is usually interrupted by system interaction, and service efficiency is not high; Data glove etc. also exist problems such as spatial dimension that transducer uses, positioning resolution.

System's versatility deficiency.Said system is used special equipment, therefore costs an arm and a leg.Both improve product design costs, improved training cost and use cost again.This has improved the threshold that the user enters undoubtedly, has limited range of application.The reduction of the sub-cost of universal attribution of computer and the raising of versatility.

Summary of the invention

For the problem that overcomes the existing system existence the present invention has been proposed.The objective of the invention is to use general, convenient, economic physical unit, by nature, real-time interaction method, according to both hands in the three dimensions or hand-held object surface shape and space motion path, use based drive Geometric Modeling Method, finish establishment, modification and the editor of three-dimensional geometry body, realize three-dimensional objects profile conceptual design Geometric Modeling.

In order to realize this purpose, the invention provides a kind of 3 d geometric modeling method, comprising:

One, video input step is used for gathering video flowing from being distributed in designer's a plurality of video input devices (101) on every side.

Two, single video stream visual analysis step, a plurality of video flowings that above-mentioned collection video flowing step is gathered are handled through a single video stream visual analysis separately, to detect moving region and the non-moving region in the video flowing, estimate moving object travel direction and speed and predict next movement position and calculate the edge contour of moving object and estimate contour feature.

Three, multiple video strems visual analysis step, be used to receive the result of above-mentioned some single video stream visual analysises, carry out the binocular solid coupling, and carry out three-dimensional reconstruction and movement locus of object match and calculate the moving object cross section, and the cross section profile of the object model, movement locus of object and the object that are obtained is offered the semantic identification of real-time, interactive treatment step based on the profile that is obtained and feature.

Four, the semantic identification step of real-time, interactive be used for the output of many videos visual analysis step is handled with acquisition man-machine interaction semanteme, and utilization is stored in the man-machine interaction semanteme that a semantical definition explanation in the semantic model memory cell is obtained in advance.

Five, 3 d geometric modeling step, the output that is used for object model, movement locus of object, object cross section profile and real-time, interactive semanteme identification step to the output of many videos visual analysis step is handled, thereby obtain the three-dimensional geometric design moulding, and result is stored in the 3-D geometric model memory cell.

Six, threedimensional model plot step is used for the 3-D geometric model of 3-D geometric model memory cell real-time storage is plotted to video output device.

Seven, video output step is used for the designed THREE DIMENSION GEOMETRIC MODELING of display design teacher on video output device.

In addition, this also provides a kind of three-dimensional geometric mode building system clearly, comprising:

One, a plurality of video input devices are distributed in the video flowing that the designer is used to gather designer's design action on every side.

Two, the single video corresponding to each video input device flows the visual analysis unit, make a plurality of video flowings of gathering by above-mentioned video input device flow the processing of visual analysis unit separately through a described single video, to detect moving region and the non-moving region in the video flowing, estimate moving object travel direction and speed and predict next movement position and calculate the edge contour of moving object and estimate contour feature.

Three, multiple video strems visual analysis unit, be used to receive the result of above-mentioned a plurality of single video stream visual analysises unit, carry out the binocular solid coupling, and carry out three-dimensional reconstruction and movement locus of object match based on profile that is obtained and feature, calculate the moving object cross section, and the cross section profile of the object model, movement locus of object and the object that are obtained is offered the semantic recognition unit of real-time, interactive.

Four, the semantic recognition unit of real-time, interactive be used for the output of many videos visual analysis unit is handled with acquisition man-machine interaction semanteme, and utilization is stored in the man-machine interaction semanteme that a semantical definition explanation in the semantic model memory cell is obtained in advance.

Five, 3 d geometric modeling unit, be used for the output result of many videos visual analysis unit and the output of the semantic recognition unit of real-time, interactive are carried out integrated treatment, thereby obtain the three-dimensional geometric design moulding, and result is stored in the 3-D geometric model memory cell.

Six, threedimensional model drawing unit, the 3-D geometric model that is used for 3-D geometric model memory cell real-time storage is input, and geometrical model is plotted on the video output device.

Seven, video output device is used for the designed THREE DIMENSION GEOMETRIC MODELING of display design teacher.

According to a further aspect in the invention, a kind of single video stream visual analysis processing method is provided, comprise: the image analysing computer method, to handling from the vision signal of video input apparatus collection, acquisition has the characteristic video stream of different resolution yardstick and different characteristic element, for motion detection and three-dimensional coupling provide input; Real time movement detection method detects the moving region in the video flowing and the background area of non-motion; Estimation and Forecasting Methodology are estimated travel direction and speed and are predicted next movement position.The profile computational methods are used to calculate the edge contour of moving object and estimate contour feature.

According to another aspect of the invention, a kind of multiple video strems visual analysis processing method is provided, comprise: Stereo Matching Algorithm, make from the data of two Video Motion Detection outputs and calculate through three-dimensional coupling and parallax, obtain the Region Segmentation and the moving object depth information of moving object; The threedimensional model method for building up is set up three-dimensional stereo model by the depth data of solid coupling output; From the method for contour recovery solid, according to the profile data of description that profile calculate to obtain, set up movable body threedimensional model and main projection properties thereof already, and, obtained the three-dimensional profile of moving object from the algorithm of contour recovery shape; The cross section computational methods according to outline data and movement locus, are set up the cross section profile with respect to movement locus; The track fitting method for carrying out smoothing processing from estimation and prediction acquisition data, obtains level and smooth movement locus;

According to a fifth aspect of the invention, a kind of real-time, interactive method for recognizing semantics is provided, comprise: collision checking method, movement locus and appearance profile according to moving object determine that semantic type is the operability semanteme, carry out collision detection with the 3-D geometric model of having set up already afterwards, determine position and mode that collision takes place; The operational semantics analytical method according to the result of collision detection, is determined operational semantics such as semantic operand, action type; The interaction semantics analytical method is handled according to the input that movement locus is determined and cross section profile obtains, and obtains the analysis result of interaction semantics; The voice semantic analysis, the semanteme that obtains interactive voice from speech analysis is resolved;

According to a sixth aspect of the invention, provide a kind of 3 d geometric modeling method, having comprised: reprocessing method is used for eliminating repetition, the plyability motion of video image moving object in motion process; Motion processing method is used for eliminating moving object trembling and shake in the generation of motion process; Comprise computational methods, be used for calculating the enveloping surface that object of which movement produces according to movement locus and the cross section profile eliminating shake and repeat; The moulding edit methods is used for the 3-D geometric model of being set up is made amendment.

According to a seventh aspect of the invention, provide a kind of picture output device to comprise: display unit is used to show geometrical model, locus and the attitude of moving object, and shows the THREE DIMENSION GEOMETRIC MODELING of having set up already;

According to an eighth aspect of the invention, provide a kind of method for drafting, the three-dimensional geometric design modeling rendering on display unit, is plotted in moving object geometrical model, relative position and attitude on the display unit.

Use native system or application this method to carry out the product design conceptual design and will produce useful effect in the following areas.

1 by using physical equipment cheap, non-special use, reduces product design costs, widens range of application.

The real-time, interactive mode of 2 natures is convenient to domestic consumer is introduced open conceptual design cyclic process, helps eliminating passivity, one-sidedness in the product shape geometry designs, directly includes many-sided factor in Products Development, design, manufacture process.

3 design environments based on vision and common apparatus are convenient to introduce environmental characteristic.Environmental characteristic is a key factor that influences product design, comprises the environmental physics feature of microcosmic level and the Social Culture feature of macro-level.The device of this invention allows to walk out the design office, carries out conceptual design in the environment for use of product, is a kind of solution that realizes that the good environment feature is introduced.

4 support the three-dimensional visualization design.The subject matter of conceptual design is with the deisgn product modelling.The modeling method of conceptual design comprises from the regular high-rise visual expression that is defined into.At present using more model tormulation mode to comprise language model, geometrical model, graphical model, object model, knowledge model and iconic model, is iconic model near people's the thinking and the model of reasoning.This invention is the important method that visual thinking model is used for PRACTICE OF DESIGN.

5 direct 3-dimensional digital products.This invention provides direct 3-dimensional digital product to export as system, can be designed efficiently and estimate feedback.

Description of drawings

Fig. 1 is the block diagram of structure of the three-dimensional geometric mode building system 100 of first specific embodiment according to the present invention;

Fig. 2 is the schematic diagram of the concrete layout type of video camera of first specific embodiment according to the present invention;

Fig. 3 is the block diagram of the list/multiple video strems visual analysis cellular construction of first specific embodiment according to the present invention;

Fig. 4 is the operational flowchart of the multiple video strems visual analysis unit of first specific embodiment according to the present invention;

Fig. 5 shows the coordinate system of stereo restoration operation of the present invention;

Fig. 6 is the semantic identification process block diagram of first specific embodiment according to the present invention;

Fig. 7 is the 3 d geometric modeling flow chart of first specific embodiment according to the present invention;

Fig. 8 is the schematic diagram of the first specific embodiment computer system according to the present invention;

Fig. 9 is the block diagram of the three-dimensional geometric mode building system 200 of second specific embodiment according to the present invention;

Figure 10 is the semantic identification process block diagram of second specific embodiment according to the present invention;

Figure 11 is the block diagram of the three-dimensional geometric mode building system 300 of the 3rd specific embodiment according to the present invention;

Figure 12 shows the concrete layout type of video camera of the 3rd specific embodiment according to the present invention;

Figure 13 is the operational flowchart of the multiple video strems visual analysis unit of the 3rd specific embodiment according to the present invention;

Figure 14 is the schematic diagram of the 3rd specific embodiment computer system according to the present invention.

Figure 15 shows the mutual gesture mode of first specific embodiment according to the present invention and gives an example

Embodiment

First specific embodiment

Fig. 1 is the block diagram of structure of the three-dimensional geometric mode building system 100 of first specific embodiment according to the present invention.As shown in Figure 1, video input device 101 can be a digital camera, is used to absorb THREE DIMENSION GEOMETRIC MODELING designer's geometry designs modeling motion image.In the present embodiment, video input device 101 is made up of four digital camera C01, C02, C03 and C04, its concrete layout type in one embodiment of the invention as shown in Figure 2, they are placed on respectively, and conceptual design person is right-hand, right front, four the different positions in left front and left, height and attitude apart from ground are expressed as suitable to be fit to designer's gesture, the design action that is the designer should be able to be shot with video-corder fully by video camera, and does not influence designer's operation and other activity.Provide a single video stream visual analysis unit 104 respectively corresponding to each digital camera.Every digital camera is directly connected to the pairing single video stream of this digital camera visual analysis unit 104 by general-purpose interface according to known connected mode, respectively the continuous video flowing of gathering from its pairing video input device 101 is handled by each single video stream visual analysis unit 104, detect moving region and non-moving region in the video flowing, estimate the direction and the speed of object of which movement and predict next movement position, and calculate the edge contour of object and estimate contour feature, and result is offered multiple video strems visual analysis unit 105.Multiple video strems visual analysis unit 105 receives the processing output of above-mentioned 4 single video stream visual analysis unit 104, comprises the moving region testing result, the direction of motion of moving object and speed, the profile of object and contour feature.Afterwards, the 105 pairs of inputs in multiple video strems visual analysis unit data are handled, and carry out the binocular solid coupling, carry out three-dimensional reconstruction and movement locus of object match based on profile and feature, calculate moving object cross section etc.The output result that the processing of above-mentioned single video stream visual analysis unit 105 produces is an object model, movement locus of object and object cross section profile.The result of multiple video strems visual analysis unit 105 is offered the semantic recognition unit 106 of real-time, interactive as input.The semantic recognition unit 106 of real-time, interactive is used for the output of many videos visual analysis unit 105 is handled with acquisition man-machine interaction semanteme, and utilizes the semantical definition that is stored in advance in the semantic model memory cell 110 to explain the man-machine interaction semanteme that is obtained.Integrated treatment is carried out in the output of the semantic recognition unit 106 of the output result of the unit of video visual analysis more than 107 pairs, 3 d geometric modeling unit 105 and real-time, interactive, thereby obtains the three-dimensional geometric design moulding.The result of 3 d geometric modeling unit 107 stores in the 3-D geometric model memory cell 111.Threedimensional model drawing unit 108 is plotted to geometrical model on the video output device 109 based on the 3-D geometric model of real-time storage in the 3-D geometric model memory cell 111.Video output device 109 is used to show the geometric shape and the designed THREE DIMENSION GEOMETRIC MODELING of geometrical body designer of object.

Specifically describe the operation and the system configuration of the three-dimensional geometric mode building system of present embodiment below with reference to the accompanying drawings.System is made up of several groundwork states such as system initialization, camera calibration, object model foundation, the designs of motion geometric modeling.

＜system initialization 〉

The initial work state of descriptive system at first.After the three-dimensional modeling designer opens this three-dimensional geometric mode building system 100, at first begin the system initialization process.System initialization comprises that system's initial parameter is loaded and foundation, and the background video statistical model is set up.

The system initialization process is according to predetermined set and current system configuration, loading system initial work environmental parameter at first, and system's initialization environmental parameter is as shown in table 1 in this example.

Table 1 embodiment initiation parameter table:

System operational parameters

User ID
User ID	User list
The initial model sign	User list
The initial model sign	The model tabulation
The camera parameters table	The model tabulation
The camera parameters table	Video camera layout parameter table
Image resolution	Video camera layout parameter table
Image resolution	Scale factor
Background reference images table	Scale factor
Background reference images table	The working document catalogue
Data file content	The working document catalogue

Subscriber's meter

User ID
User ID	The initial geometric model sign
Working model identifies recently	The initial geometric model sign

The model table

Model identification
Model identification	Types of models
The model data structured fingers	Types of models
The model data structured fingers	The model file pointer

The camera parameters table

The video camera number
The video camera number	The video camera list index
Video camera layout parameter pointer	The video camera list index

The video camera table

First video camera sign
First video camera sign	The first camera parameters table
Second video camera sign	The first camera parameters table
Second video camera sign	The second camera parameters table
...	The second camera parameters table

I camera parameters table

Video camera layout parameter table

The first video camera layout parameter head pointer
The first video camera layout parameter head pointer	The second video camera layout parameter head pointer
...	The second video camera layout parameter head pointer

I video camera layout parameter table

Base length to the 1st video camera
Base length to the 1st video camera	Base direction to the 1st video camera
Base length to the 2nd video camera	Base direction to the 1st video camera
Base length to the 2nd video camera	Base direction to the 2nd video camera
...	Base direction to the 2nd video camera

After the initial work parameter loads, by video input device 101 (C01, C02, C03 and C04) to operation background continuous acquisition multiple image, and these images are offered the pairing single video of each video input device stream visual analysis unit 104, set up the statistical model of initial background by it by handling these images.For example in the present embodiment, as shown in Figure 3, single video stream visual analysis unit 104 comprises that also an image analysing computer unit 1041 is used for the vision signal of being gathered by video input device 101 is handled to obtain the statistical model of background, promptly has the characteristic video stream of different resolution yardstick and different characteristic element.

Image analysing computer unit 1041 in a specific embodiment of the present invention has adopted the following method of setting up background statistical model.For in the system with each video camera C _iCorresponding background B _i, set up an initial back-ground model M _iTo B _iIn each pixel p, the definition μ _pBe the expectation of this color value, σ _p ²Variance for color value distributes has following formula:

μ_{p} = \frac{1}{n} Σ_{t = 1}^{n} h_{p - - - (1)}^{t}

σ_{p}^{2} = \frac{1}{n} Σ_{t = 1}^{n} {(h_{p}^{t} - μ_{p})}^{2} - - - (2)

Wherein, h _p ^tBe the p o'clock color value on t frame image.Like this, (the μ of each some p _p, σ _p ²) formation B _iBackground model:

M_{i} = {(μ_{p}, σ_{p}^{2}) | p &Element; B_{i}} - - - (3)

In addition, system generates according to predetermined set has simple, the surperficial initial object model of regular geometric, for example, and cuboid, spheroid etc.Be provided with by the modification system, can select to generate which kind of predefined object or not generate any initial object.

＜camera calibration 〉

When using this system first or layout, position and the attitude of video camera change, also or when having changed video camera, need carry out camera calibration, promptly begin a camera calibration course of work of setting up camera parameters.Under this operating state, each video camera in the system will obtain image and according to well known to a person skilled in the art that camera marking method calculates the inner parameter and the external parameter of each video camera.If not using this system first and not changing layout, position and the attitude of video camera, do not change the camera calibration work that video camera does not then need to set up camera parameters yet.

＜object model is set up 〉

After the above-mentioned camera calibration course of work finished, the designer just can use the described system of present embodiment to utilize hand or hand-held object to carry out 3 d geometric modeling in the preceding appropriate location of shooting unit.The geometrical model of setting up according to the geometric shape of the profile of hand or hand-held object is called the object dimensional model, or abbreviates object model as.Object model is a kind of dynamic model, with setting about and the variation of the locus of hand-held object, attitude, shape and instant the variation.The present invention uses object model as design tool, carries out the THREE DIMENSION GEOMETRIC MODELING design.Simultaneously, simply hand-held object model itself also can be used as the initial model of THREE DIMENSION GEOMETRIC MODELING design.

＜motion geometric modeling design 〉

After system has experienced initialization, camera calibration, object model and sets up three runnings, enter motion geometric modeling design work state.The designer uses the described system of present embodiment to utilize hand or hand-held object in the preceding appropriate location of shooting unit, by the profile and the motion thereof of hand or hand-held object, oneself three-dimensional geometric design design is input in the three-dimensional geometric mode building system 100 to obtain the 3 d geometric modeling result.The 3-D geometric model of being set up file format according to the rules by real-time storage in 3-D geometric model memory cell 111 and outputed in real time on the video output device 109 such as CRT or LCD display.For example, can pass through storage medium, for example magnetic disc store is stored, and also can transmit and stores by computer network facility or movable memory equipment.In a specific embodiment, the form of a 3-D geometric model can be stored with the form shown in the table 2.

Table 2 3-D geometric model data structure

The model table

Model based coding
Model based coding	Model identification
The types of models coding	Model identification

The types of models sign
The types of models sign	The model attributes table
The model parameter table	The model attributes table
The model parameter table	Version number
Number of objects	Version number
Number of objects	The list object pointer

List object

Object type
Object type	Object identity
The parent object pointer	Object identity
The parent object pointer	The subobject pointer
The point data number	The subobject pointer
The point data number	The limit number
The face number	The limit number
The face number	The point data structured fingers
Point data length	The point data structured fingers
Point data length	Line data structure pointer
The line data length	Line data structure pointer
The line data length	Face data structure pointer
The face data length	Face data structure pointer

The point data structure

Point numbering (8 byte)
Point numbering (8 byte)	X (4 byte)
Y (4 byte)	X (4 byte)
Y (4 byte)	Z (4 byte)

The limit data structure

Limit numbering (8 byte)

Count (8 byte)
Count (8 byte)	Point numbering data chainning head pointer

The face data structure

Face numbering (8 byte)
Face numbering (8 byte)	Limit number (8 byte)
Limit numbering data chainning head pointer	Limit number (8 byte)

As shown in table 2, the 3-D geometric model data structure comprises: model based coding, model identification, types of models coding, types of models sign, model attributes table, model parameter table, version number, data item such as number of objects list object pointer.Wherein, model identification is used for this model of unique identification.The types of models coding is used for representing the type of this model.In the present embodiment, the 3-D geometric model that generates as design result uses identical data structure to store with the object model that generates as design tool.Therefore, types of models is used for distinguishing 3-D geometric model and object model.Simultaneously, in order to design conveniently, system classifies to the predefine object model and gives unique types of models sign.The model attributes table definition attribute that geometrical model possessed, scale properties for example, position attribution etc.Version number is used for representing the version of geometrical model data structure.A model can have some objects to constitute, and the attribute of list object description object comprises object type, object identity, parent object pointer, subobject pointer, geometric data storage organization or the like.

The designer can be by the geometry designs design of various ways to its conceptual design of computer expression.

First kind of mode directly carried out three-dimensional geometric design by designer's hand.Be exactly that the designer expresses the three-dimensional geometric design design by the sports envelope and the gesture of hand specifically.

The second way is to carry out the design of three-dimensional geometry body by hand-held object.Can be divided into specifically: 1. the designer expresses the three-dimensional geometric design design that will set up by the profile of hand held object merely, for example, the designer wants to design a ball, before he can be placed into video camera with a ball, system will set up the three-dimensional geometry profile of this ball automatically as object model.Then, the designer copies as 3-D geometric model output by issuing an order with object model; 2. the designer unites expression sports envelope 3-D geometric model by the profile and the motion thereof of this hand held object, and for example, the designer is hand-held ball before video camera, and system will set up the three-dimensional geometry profile of this ball automatically as object model.Hand-held this ball of designer is done circular-arc spatial movement, system is according to as the object model of the three-dimensional geometry profile of this ball and the movement locus of this ball, set up a three-dimensional geometric design model that forms by the sports envelope of ball, i.e. space circle arc pipe line output of going forward side by side.

The third mode is to the editor of the 3-D geometric model of having set up and modification.The designer uses hand or/and hand held object is carried out the three-dimensional geometry editor according to predetermined interaction semantics model to existing 3-D geometric model, and for example stretch, distortion etc. generates new 3-D geometric model.

Certainly, such just as readily understood by the skilled person, the designer can comprehensively use above-mentioned three kinds of modes to express design concept.

No matter with the combination of above-mentioned which kind of mode or which kind of mode, the described three-dimensional geometric mode building system of present embodiment all to the video data that obtains by a plurality of video cameras via following processing:

● single video stream visual analysis

In this link, each single video stream visual analysis unit 104 is handled the continuous video flowing of gathering from its pairing video input device 101 respectively, detect moving region and non-moving region in the video flowing, estimate the direction and the speed of object of which movement and predict next movement position, and calculate the edge contour of object and estimate contour feature, result is offered multiple video strems visual analysis unit 105.Particularly, comprise following operation:

I. image analysing computer

As shown in Figure 3, each single video stream visual analysis unit 104 also comprises an image analysing computer unit 1041, and for the input of each road video input device 101, image analysing computer unit 1041 will carry out the processing of following steps.At first, from each video input device 101, obtain each image frame, then, set up hierarchy, the image sequence of output layering according to image resolution.

The implementation method that the image individual-layer data is set up in image analysing computer unit 1041 in one embodiment of the invention is: adopt three grades of pyramid structures, to raw video M _LSet up a group image sequence { M _L, M _L-1, M _L-2, M wherein _I-1Be M _iReduce the image that obtains after the half-resolution.And with M _LBe called the pyramid bottom, or resolution layer; With M _L-1Be called the pyramid middle level, or the intermediate-resolution layer; With M _L-2Be called the pyramid top layer, or low-resolution layer.The image pyramid data structure is as shown in table 3.

Table 3 image data buffer queue is described

Maximum table is long
Maximum table is long	Team's head pointer
The tail of the queue pointer	Team's head pointer
The tail of the queue pointer	Frame data pointer 1
Frame data pointer 2	Frame data pointer 1
Frame data pointer 2	Frame data pointer 3
Frame data pointer 4	Frame data pointer 3

Frame data pointer 5
Frame data pointer 5	Frame data pointer 6
Frame data pointer 7	Frame data pointer 6

Frame data structure

The frame buffer sequence mark
The frame buffer sequence mark	The frame type mark
The raw video data pointer	The frame type mark
The raw video data pointer	Intermediate-resolution image data pointer
Low resolution image data pointer	Intermediate-resolution image data pointer
Low resolution image data pointer	Heir pointer
Forerunner's pointer	Heir pointer

In processing procedure, the time series image data is stored in the image processing buffer queue successively, and queue length is 7 image frames.Formation uses the quiet cycle table to realize.

II. real time kinematics detects

As shown in Figure 3, each single video stream visual analysis unit 104 also comprises a motion detection unit 1042, is used for that the result of each image analysing computer unit, road 1041 is carried out real time kinematics and detects.The target of motion detection is to detect moving region on the image and the direction of motion to cut apart and the edge, moving region to obtain accurately the moving region.The real time kinematics detecting unit detects the moving region and obtains the direction of motion by optical flow method by the image difference algorithm on the middle layer image of multiresolution image bearing layer.In the present embodiment, 1042 pairs of individual-layer data results from image analysing computer unit 1041 of motion detection unit are carried out following operation:

A) background is eliminated

For each video camera C _iThe image I that gathers constantly at t _i ^t, carry out foreground area according to following method and extract:

If image I _i ^tThe color value of mid point p is h _p, by following formula with image binaryzation:

d_{p} = \{\begin{matrix} 1 & | h_{p} - μ_{p} | \leq 3 σ_{p} \\ 0 & | h_{p} - μ_{p} | > 3 σ_{p} \end{matrix} - - - (4)

Image I _i ^tIn all d _pBe that zero some p constitutes foreground area F _i

B) image Difference Calculation:

Differential images I _d(i, j) be bianry image d (i, j):

d (i, j) = \{\begin{matrix} 0 & | f_{1} (i, j) - f_{2} (i, j) | \leq ϵ \\ 1 & other \end{matrix}\} - - - (5)

Wherein, f _k(i is two adjacent frame images of front and back in the time series image j), and ε is a very little positive number.In differential images, numerical value is that 1 location of pixels shows the place that motion takes place.Thus, utilize this formula to obtain the moving region.

C) plane motion calculation of parameter

Describe below according to movement velocity c (u, operation v) on the optical flow method calculating projection plane.The basic calculating step is as follows:

1. for pixels all on the piece image (i, j), estimate light stream initial value c (i, j)=0;

2. make k represent iterations, for all pixels (i, j), utilize formula (6), (7) evaluation:

u^{k} (i, j) = {\overset{&OverBar;}{u}}^{k - 1} (i, j) - f_{x} (i, j) \frac{P (i, j)}{D (i, j)} - - - (6)

v^{k} (i, j) = {\overset{&OverBar;}{v}}^{k - 1} (i, j) - f_{y} (i, j) \frac{P (i, j)}{D (i, j)} - - - (7)

Wherein,

P(i,j)＝f _x(i,j)u+f _y(i,j)v (8)

D (i, j) = λ^{2} + f_{x}^{2} (i, j) + f_{y}^{2} (i, j) - - - (9)

U and v represent the average in u neighborhood and the v neighborhood, and this average can calculate by utilizing the image local smoothing operator.According to the size of noise in the image to the λ value.When noise is big, get less value; When noise hour, get bigger value.

3. work as

\underset{i}{Σ} \underset{j}{Σ} E^{2} (i, j) < ϵ - - - (10)

The time, iterative process stops; Wherein,

E^{2} (x, y) = {(f_{x} u + f_{y} v + f_{t})}^{2} + λ (u_{x}^{2} + u_{y}^{2} + v_{x}^{2} + v_{y}^{2}) - - - (11)

III. profile calculates

As shown in Figure 3, described single video stream visual analysis unit 104 also comprises a profile computing unit 1043, and it passes through to obtain based on the detection algorithm of the colour of skin zone of hand on the moving object zone.Remove the zone of hand, obtain the zone of hand-held object.Calculate the edge contour in zone on this basis.In the present embodiment, 1043 pairs of results from motion detection unit 1042 of profile computing unit carry out on hand that edge obtains and meticulous profile detecting operation:

A) edge obtains operation on hand

In the present embodiment, using the strategy of movable information and skin color information fusion to carry out cutting apart with profile of hand region detects.

The zone of at first describing based on movable information obtains:

In this course, video camera remains static, and the color image sequence of its shooting is by R, G, the color image sequence that the B component is formed.In profile computing unit 1043, (t express time coordinate system, i are represented any one component in the rgb space to definition s=for x, y) presentation video plane space coordinate system.I then _t ⁱExpression component i is at t luminance picture constantly.Utilize t-Δ t, t, constantly continuous 3 two field pictures of t+ Δ t calculate t i component moving image d constantly _t ⁱFor

d_{t}^{i} (s) = \min (| I_{t}^{i} (s) - I_{t - Δt}^{i} (s) |, | I_{t}^{i} (s) - I_{t + Δt}^{i} (s) |) - - - (12)

i＝r，g，b (13)

Comprehensively (b) component can get color list at t moving image d constantly for r, g _tFor

d_{t} (s) = \max (d_{t}^{r} (s), d_{t}^{g} (s), d_{t}^{b} (s)) - - - (14)

At last, moving image is carried out level and smooth and binary conversion treatment, obtain moving image

Next hand zone identification based on Face Detection described:

We know color of the same race brightness difference under the illumination of different distributions, but the sensation of color is to keep constant basically.Profile computing unit 1043 has utilized the human body skin color in Luv spatial distributions and this feature of brightness just, carries out Face Detection on the Luv color space.The operating procedure of profile computing unit 1043 is as follows:

1. color space conversion.Rgb color space is converted to the Luv color space;

2. with former frame Face Detection result (or initial features of skin colors) as initial value, adopt the rolling average algorithm that the moving region is cut apart.With the density fonction of each look unit of present frame as a probability density function.The average and the difference between the central value of the probability function of definition one's respective area are the mean transferred vector.The mean transferred vector can find the actual direction of maximal density by search so always along the direction of maximum probability density.Concrete computational methods are:

If pixel p in the image _iColor character vector x _iCan be defined as:

x _i＝(L，u，v) (15)

Wherein, L, u, v are the relative brightness and the u of image ^*, v ^*Chromaticity coordinate.Make x ₀Expression p ₀The color character vector of point, x _iP in the expression window _iThe characteristic vector of point.Window size is 7 in the present embodiment.By the iterative computation of following two steps, obtaining density gradient is zero point.

Calculate average mobile vector m _{H, G (x)}(x)

m_{h, G (x)} (x) = \frac{Σ_{i = 1}^{n} x_{i} g ({| | \frac{x - x_{i}}{h} | |}^{2})}{Σ_{i = 1}^{n} g ({| | \frac{x - x_{i}}{h} | |}^{2})} - x - - - (16)

Wherein, h is a color-resolution, and g (x) is polynary normal function

g (x) = {(2 π)}^{- d / 2} \exp (- \frac{1}{2} {| | x | |}^{2}) - - - (17)

Press m _{H, G (x)}(x) translation kernel function G (x).Wherein, x is a current window central feature vector, m _{H, G (x)}(x) being in the window is the poor of the weighted average of power and window center with G.Above-mentioned iterative process must restrain and converge on density gradient by level and smooth track is zero point.

3. after determining local maximum point, the feature class that partial structurtes are determined and maximum of points interrelates according to feature space obtains the physical location of the colour of skin in the space.Comprehensively, obtain the edge contour of hand based on the testing result and the based drive testing result of the colour of skin.

B) meticulous profile detects

As mentioned above, the motion detection result on the layer image has obtained the Region Segmentation of low resolution in multiresolution.The profile computing unit 1043 that will describe present embodiment below detects step according to Region Segmentation result and profile, in the wide step of calculating with the comparatively accurate contour of object of acquisition of the enterprising road wheel of raw video.In the present embodiment, the edge detection method of utilization S.M.Smith and J.M.Brady detects the edge, and this method is used 5*5 circular window template.Concrete steps are:

1. set up the rim detection district according to the Region Segmentation result, this zone is near the edge in certain pixel wide;

2. window center is placed on each image point position in the rim detection district calculation window central point r ₀Has the number n (r of the point of close brightness with other pixel r in the window ₀) to determine whether this pixel is the image border point.Utilize following formula (18) to calculate n (r ₀)

n (r_{0}) = \underset{r}{Σ} c (r, r_{0}) - - - (18)

Wherein, c (r, r ₀) the interior brightness I (r) and window center point r that puts r of expression window ₀Brightness I (r ₀) similarity degree

c (r, r_{0}) = e^{- {(\frac{| I (r) - I (r_{0}) |}{t})}^{6}} - - - (19)

Wherein, t represents luminance threshold.Obviously, when the difference of 2 brightness during less than t, c (r, r ₀)=1.By n (r ₀) can calculate the center and the direction at edge, suppress the refinement edge by non-maximum.

Thus, profile computing unit 1043 has obtained the edge contour of hand and hand held object.

IV. estimation and prediction

As shown in Figure 3, single video stream visual analysis unit 104 comprises that also an estimation and predicting unit 1044 are used to follow the tracks of the movement locus of the profile that is calculated by profile computing unit 1043.One embodiment of the present of invention use the center of based target profile as focus, and pursuit movement also obtains time-discrete movement locus.The result of motion tracking is a series of plane coordinatess that have the profile center of time tag, and promptly (t i) represents four-tuple for x, y, and wherein, i is used to represent video camera.In order to improve system time efficient, present embodiment uses Kalman filtering to carry out motion prediction, predicts the outcome as the input of motion detection apparatus, and pre-estimating of next frame motion detection is provided.

Thus, each parts and structure in the single video stream visual analysis unit 104 have been described, with and concrete operation.The output of single video stream visual analysis unit comprises: the image sequence of resolution demixing, the contour feature sequence of image sequence and space point sequence.The output result data structure of single video stream visual analysis unit 104 is as shown in table 4.

Table 4 single video flow analysis output data structure is described

The video camera sign
The video camera sign	The frame sequence sign
Time marking	The frame sequence sign
Time marking	The frame type sign
The image data attribute list	The frame type sign
The image data attribute list	Point characteristic attribute list
The provincial characteristics data attribute list	Point characteristic attribute list
The provincial characteristics data attribute list	The edge feature data attribute list
The raw video data pointer	The edge feature data attribute list
The raw video data pointer	Intermediate-resolution image data pointer
Low resolution image data pointer	Intermediate-resolution image data pointer
Low resolution image data pointer	The characteristic point data list index
Feature structure tables of data pointer	The characteristic point data list index
Feature structure tables of data pointer	Moving region tables of data pointer
Background area tables of data pointer	Moving region tables of data pointer
Background area tables of data pointer	Moving region marginal date table

Single video is analyzed output sequence and is comprised the video camera sign, frame sequence sign, time marking, the frame type sign, image data attribute list, some characteristic attribute list, the provincial characteristics data attribute list, edge feature data attribute list, raw video data pointer, intermediate-resolution image data pointer, low resolution image data pointer, characteristic point data list index, feature structure tables of data pointer, moving region tables of data pointer, background area tables of data pointer, moving region marginal date table etc.Wherein, the video camera sign is used for the different video camera of compartment system.The frame sequence sign is the frame sequence numbering of this video camera.Time marking is used to write down the acquisition time of this image frame.The attribute list of various characteristics has been described the information such as length of giving the category feature data.The Various types of data pointer has provided the address of image data and characteristic storage organization.

● the multiple video strems visual analysis

Describe the multiple video strems visual analysis unit 105 of present embodiment below with reference to Fig. 3 and Fig. 4, the output of all four single video stream visual analysis unit 104 is accepted in this unit, and carries out following processing:

I. three-dimensional coupling

It is right that four road video input apparatus can be combined into three groups of inputs in twos.And the multiple video strems visual analysis unit 105 of present embodiment comprises a three-dimensional matching unit 1051, this unit uses two pairs of three-dimensional couplings of dual-view input structure dual-view of three pairs of input centerings, thereby calculate the three dimensional space coordinate of object, promptly by the plane coordinates (x on solid coupling and the photographic plane, y) calculate depth coordinate (z coordinate), and then determine the object dimensional space coordinates according to camera parameters with respect to video camera.Can realize depth reconstruction by the three-dimensional coupling of dual-view image.At first, determining an impact point in the non-background area on the image of unit 1051 above-mentioned acquisitions, is that center definition size is the template window of m * n with the impact point.In order to seek the match point of this impact point on another image, the gray matrix that possible size of matching area definition is (m+c) * (n+d) on another image is realized the image coupling as search window by block matching algorithm.Block matching algorithm is exactly in search window the moving die plate window and calculates the similarity matrix of size for (c+1) * (d+1) by match measure.Image blocks in the similarity matrix in the pairing search window of maximum (minimum) value is exactly the optimum Match of template window.

The three-dimensional matching unit 1051 of present embodiment use as shown in Equation (20) poor absolute value and computing formula calculate similarity measure:

ρ (i, j) = Σ_{u = 1}^{m} Σ_{v = 1}^{n} | I (u, v) - I^{'} (u + i, v + j) |, 0 \leq i \leq c, 0 \leq j \leq d - - - (20)

Wherein, and I (u, v) and I ' (u, v) two view images; M, n are width and height, the width of c+m, d+n region of search and the height of match window.During specific implementation, can be by selecting different threshold values one by one, the simplicity that obtains ratio absolute value and method realizes implementation method more fast.

The three-dimensional matching unit 1051 of present embodiment uses mobile window in possible object boundary zone, that is, thereby the position of moving window obtains more overlay area on the input image, therefrom selects best match position.And use normal window in other zone.

II.SFS﹠amp; FBR (from contour recovery three-dimensional and based on the three-dimensional reconstruction of feature)

As shown in Figure 3 and Figure 4, multiple video strems visual analysis unit 105 also comprises a SFS﹠amp; FBR unit 1052 and object model memory cell 1056.Wherein, SFS﹠amp; FBR unit 1052 is three-dimensional and based on the three-dimensional reconstruction unit of feature from contour recovery, and object model memory cell 1056 is used to store the object model of reconstruction.SFS﹠amp; FBR unit 1052 is used for according to the profile result of calculation of single video stream visual analysis unit 104 and the object model of having set up in 1056 storages of object model memory cell, utilization recovers the object surfaces shape from the method for contour recovery shape and based on the Target Recognition Algorithms of feature.This unit carries out the contour recovery shape manipulation based on spatial variations, and algorithm is as follows:

1. for spatial point P, calculate the some P ' on the image, (x according to formula (24), (25) _h, y _h)

P_{x}^{'} = \frac{(P - C) \cdot \overset{&RightArrow;}{h}}{(P - C) \cdot \overset{&RightArrow;}{a}} - - - (24)

P_{y}^{'} = \frac{(P - C) \cdot \overset{&RightArrow;}{v}}{(P - C) \cdot \overset{&RightArrow;}{a}} - - - (25)

Wherein,

\overset{&RightArrow;}{h} = f \cdot {\overset{&RightArrow;}{h}}^{'} + \overset{&RightArrow;}{a} \cdot x_{h} - - - (26)

\overset{&RightArrow;}{v} = f \cdot {\overset{&RightArrow;}{v}}^{'} + \overset{&RightArrow;}{a} \cdot y_{h} - - - (27)

P ' is that (Z) the some P in is in the projection of photo coordinate system for X, Y for world coordinate system.C is the vector of world coordinate system initial point to projection centre,

Be the unit vector of camera light direction of principal axis, Be the unit vector of horizontal direction on the photo coordinate system,

It is the unit vector of vertical direction on the photo coordinate system.Detailed coordinate system is seen Fig. 5, i.e. W-XYZ world coordinate system and c-xy image plane coordinate system.

If 2. P ' is positioned at the background area on the image, then spatial point P is removed point; Otherwise P is retained;

3. use simple space Octree algorithm, simplify computational process;

4. cut out by many views profile, obtain the reconstructed results of several different angles;

III. object model is set up

As shown in Figure 3 and Figure 4, multiple video strems visual analysis unit 105 also comprises a modelling unit 1053, is used for using flux of light method to calculate three dimensions Three-dimension Target coordinate after obtaining three-dimensional coupling.The concrete operations of this unit are as follows:

Suppose one group of some X in the three dimensions _jBy matrix is P ⁱOne group of shot by camera.Use x ⁱ _jI spatial point of mark at j video camera as the coordinate on the plane, known image coordinate x then ⁱ _jSet, ask video camera matrix P ⁱWith spatial point X _jMake

P ⁱX _j＝x ⁱ _j (21)

If for X _jPerhaps P ⁱDo not do further constraint, above-mentioned reconstruct is projective reconstruction, i.e. an X _jDiffer a three-dimensional arbitrarily projective transformation with real reconstruct.

Since factors such as noise, matching error, EQUATION x ⁱ _j=P ⁱX _jCan be dissatisfied fully.Usually such error of supposition satisfies Gaussian Profile, obtains maximum likelihood then and separates.At this, need to estimate projection matrix

Really project to picture point

Spatial point

, promptly

{\hat{x}}^{i}_{j} = {\hat{P}}^{i} {\hat{X}}_{j} - - - (22)

And in each two field picture, minimize the image distance between re-projection point and the picture point, that is:

\min \underset{{\hat{P}}_{i}, W {\hat{X}}_{j}}{Σ} d ({\hat{P}}^{i} {\hat{X}}_{j}, {x^{i}}_{j}) - - - (23)

Wherein, (x y) is several picture distance between homogeneous some x and the y to d.Estimate X by the beam of adjusting between each video camera center and the three dimensions point _jAnd P ⁱThe maximum likelihood value.

The employed initial value of said method is the projection matrix parameter of camera calibration process acquisition, the estimated value and the initial three-dimensional reconstruction estimated value of previous frame.And, obtain the Euclidean three-dimensional reconstruction by the camera parameters constraint.

It is point cloud model that the threedimensional model that said method obtains is expressed.Then point cloud model is converted into the geometrical model that subdivision surface is represented.Because this transfer process is known for those of ordinary skill in the art, therefore do not repeat them here.

The three-dimensional model of object is set up in modelling unit 1053, comprises the three-dimensional model of hand, has the three-dimensional model of the hand-held object of simple geometry profile.The hand-held object of simple geometry profile can be an elastic steel strip with definite shape.Simple geometry profile hand held object also can be a spheroid, and a cuboid etc., the cross sectional shape of these objects are used as the section line that motion generates curved surface.

According to operation control command, system can rebuild the 3 D surface shape of stationary object in the environment, and reconstructed results is can be by interactive editor's object model, and this result is stored in the object model memory cell 1056.Table 5 has been described the document format data of 3-D geometric model.

Table 5 3-D geometric model document format data

The point data structure

Point numbering (8 byte)

X (4 byte)

Y (4 byte)

Z (4 byte)

The limit data structure

Limit numbering (8 byte)

Count (8 byte)

Point numbering (4 byte)

.....

The face data structure

Face numbering (8 byte)

Limit number (8 byte)

Limit numbering (8 byte)

.....

IV. track fitting

Movement velocity and contour of object that single video stream visual analysis unit 104 obtains are the projection of object space motion on the video camera photographic plane, be based on the movement locus coordinate sequence of video camera as the plane, so these plane coordinatess are that space-time is discrete.As shown in Figure 3 and Figure 4, multiple video strems visual analysis unit 105 also comprises a track fitting unit 1055 and a movement locus memory cell 1058, a plurality of video camera photographic planes and video camera relative orientation parameter that described track fitting unit 1055 usage spaces distribute estimate space continuous motion track by space coordinates intersection, curve fit and describe.Output region point coordinates sequence also stores in the described movement locus memory cell 1058.The operation of this unit is as follows:

A) calculate based on the spatial point of photography matrix

For one group of video camera C _k, k=1 ..., n, n are the video camera total numbers.The absolute fix parameter of each video camera is P _i(z), the absolute orientation parameter is R for x, y _i(α, beta, gamma).Acquired each video camera as the plane coordinates constantly of t in the plane motion trajectory coordinates sequence be s (x, y, t, i), wherein, x, y are the picture plane coordinatess, t is the time, i is a camera number.By video camera external parameter and the unique definite projection matrix M of inner parameter _i

M_{i} = [\begin{matrix} m_{11}^{i} & m_{12}^{i} & m_{13}^{i} & m_{14}^{i} \\ m_{21}^{i} & m_{22}^{i} & m_{23}^{i} & m_{24}^{i} \\ m_{31}^{i} & m_{32}^{i} & m_{33}^{i} & m_{34}^{i} \end{matrix}] - - - (28)

Synchronization, spatial point P (X, Y, Z) each as the s of projection coordinate on the plane (x, y, t, i) satisfy equation (29), (30):

(x^{i} m_{31}^{i} - m_{11}^{i}) X + (x^{i} m_{32}^{i} - m_{12}^{i}) Y + (x^{i} m_{33}^{i} - m_{13}^{i}) Z = m_{14}^{i} - x^{i} m_{34}^{i} - - - (29)

(y^{i} m_{31}^{i} - m_{21}^{i}) X + (y^{i} m_{31}^{i} - m_{22}^{i}) Y + (y^{i} m_{33}^{i} - m_{23}^{i}) Z = m_{24}^{i} - y^{i} m_{34}^{i} - - - (30)

According to n video camera projection matrix and picture plane coordinates separately, can construct 2n above-mentioned equation, by least square method solve the spatial point coordinate (X, Y, Z).

B) calculate based on the spatial point of triangulation

Under the condition that obtains multiple-camera and Camera Positioning orientation parameter,, can determine the locus of hand by principle of triangulation and least square method.Concrete method of operation can be: the position and the attitude of video camera in the three dimensions are projected on the coordinate plane of three quadratures.On each plane, each coordinate components of difference computer memory point.This method does not need the calibrating camera inner parameter.

C) track fitting

Adopt three basic spline methods to carry out track fitting.And use the fairing condition to determine the boundary condition of spline-fit.

V. calculate in the cross section

As shown in Figure 3 and Figure 4, multiple video strems visual analysis unit 105 also comprises a cross section computing unit 1054 and a cross section profile memory cell 1057.Described interface computing unit 1054 is determined the normal plane at each image frame place on the track according to the continuous motion track that is obtained by described track fitting unit 1054.The projection of object on normal plane is exactly the moving object cross section contour of each frame position.

So far, the key data processing unit 1051,1052,1053,1054 of multiple video strems visual analysis unit 105 and 1055 function have been described.Three data memory cell have also been comprised in the unit 105, i.e. object model memory cell 1056, cross section profile memory cell 1057 and movement locus memory cell 1058.

Object model memory cell 1056 is system's permanent storage units, the 3-D geometric model of storing moving object.Here, after permanent storage unit was meant system roll-back, the content of this cell stores was constant.In this example, the data structure of object model is as shown in table 2.Whole object models of being set up in the system all are stored in the object model memory cell 1056.In unit 1056, each object model is stored with a model list data structure.In each model table, the number of object in pattern number, model identification, types of models numbering, types of models sign, model attributes table, model parameter table, the model and the storage list structure of each object have been stored.For rigid objects, each object is corresponding to a model table.For hand and affined deformable bodies, will store several model tables to the basic distortion of each object and this object thereof in the unit 1056.For simple objects, only comprise a geometric object in the model table; For complex object, store each constituent of complex objects in the model table with a plurality of geometric objects.But the base attribute of model genotype table descriptive model and extended attribute.The basic parameter of model parameter table memory model.

Cross section profile memory cell 1057 is system works memory cell, the time discrete cross section profile of storage moving object on its direction of motion.In the present embodiment, use the long-time sequence contour of object of round-robin queue's memory limited tables of data.For example, can deposit the contour of object tables of data of continuous hundreds of frames before the work at present time.

Movement locus memory cell 1058 is system works memory cell, the spatial movement time discrete track of storage moving object.In the present embodiment, use the long-time sequence movement locus of object of round-robin queue's memory limited tables of data, track data comprises the space coordinates and the attitude of object geometric center.For example, can deposit corresponding to object cross section profile movement locus of object tables of data.

● the semantic identification of real-time, interactive

The semantic identification of real-time, interactive is finished by the semantic recognition unit 106 of real-time, interactive, as Fig. 1 and shown in Figure 6.The semantic recognition unit 106 of the real-time, interactive of three-dimensional geometric mode building system 100 is accepted the output of multiple video strems visual analysis unit 105, and from the semantic model memory cell 110 that has defined, read semantic model, carry out analysis and the explanation and the order of output three-dimensional modeling of movement semantic.The semantic recognition unit 106 of real-time, interactive comprises a collision detection unit 1061, be used to read the movement locus of movement locus memory cell 1058, by the collision between the object model of three-dimensional geometric design model in the collision checking method detection 3-D geometric model memory cell 111 and object model memory cell 1056.Collision detection result will be as the input of operational semantics analytic unit 1062 and interaction semantics analytic unit 1063.The semantic recognition unit 106 of real-time, interactive also comprises interaction semantics analytic unit 1063 and operational semantics analytic unit 1062, and they obtain the operational semantics of moving object for 3-D geometric model according to the data that the collision detection result of collision detection unit 1061 and object model memory cell 1056, movement locus memory cell 1058, semantic model memory cell 110 are stored.The order of semantic analysis result output three-dimensional modeling, and be stored in the three-dimensional modeling command storage unit 1065.

The semantic recognition unit 106 of real-time, interactive comprises and carries out following basic operation:

I. collision detection

Object model, cross section profile and movement locus that collision detection unit 1061 is set up according to multiple video strems visual analysis unit 105 are by the collision result between collision checking method detection three-dimensional geometric design model and the object model.As previously mentioned, object model, cross section profile and movement locus are stored in respectively in object model memory cell 1056, cross section profile memory cell 1057 and the movement locus memory cell 1058.The context environmental that collision detection unit 1061 provides interaction semantics to analyze, simultaneously for the designer to the operation of geometry designs model with visual feedback is provided alternately.

Collision detection result will directly offer the semantic analysis unit, and concrete analytic process will specifically describe hereinafter.Real time collision detection between the two articles adopts AABB tree algorithm known in those skilled in the art to realize, does not repeat them here.

II. operational semantics analytic unit

Operational semantics analytic unit 1062 is used for the semanteme of interpreter operation.For example, " selection " operational semantics, " cancellation " operational semantics etc. in one embodiment of the invention, contact and during certain time, are judged as " selection " operational semantics when fingerprint type and Three Dimensional Design Model bump; Under fingerprint type and situation that Three Dimensional Design Model has contacted, when continuing contact and keeping static a period of time, then be judged to be " cancellation is selected " operational semantics.Semantic analysis result depends on the attitude and the predetermined semantic model of current object.For the flexibility that guarantees to operate, mode of operation can comprise several types: to the operation of the model that generated with to the operation of interface menu and tool bar.

A) to the operation of interface menu and tool bar

Such interface operation has dual mode: mouse-keyboard mode and virtual hand mode.The mouse-keyboard mode is traditional planar graph interface mode.The virtual hand mode is similar to the operation of touch-screen in the prior art, by the motion and the click of virtual hand menu, the tool bar at interface is operated.When virtual hand moved to system figure interface operation zone, system automatically switched to the interface command operator scheme, and mouse-keyboard mode and virtual hand mode automatically switch by the instant input of response.

B) to the operation of generation model

Interaction semantics commonly used is configured to the float ball of virtual 3d space.These float ball are called as the operation ball, and each ball is defined as a kind of operation.After virtual hand is arrested an operation ball, the expression virtual hand will begin this operation.According to the operation context environment, operate the automatic blanking of ball, appear in one's mind and change its degree of depth in virtual three-dimensional space.The selecteed operation ball of most probable will be in the position that virtual hand is the most easily arrested.

Virtual Space operation, gui interface operation and these three kinds of operations of speech command operation are automaticallyed switch by context environmental.When virtual hand moves to the menucommand district of system interface, automatically switch to the virtual mouse pattern.When virtual hand moves to drawing area, be transformed to virtual three-dimensional space molding operation mode.

III. interaction semantics analytic unit

Interaction semantics analytic unit 1063 is determined interaction semantics according to collision detection result, object model memory cell 1056 and movement locus memory cell 1058 and semantic model memory cell 110.In the present embodiment, the interactivity semanteme is expressed by the mutual gesture of static state.1063 pairs of static gestures of interaction semantics analytic unit adopt and carry out semantic analysis based on apparent recognition methods.Promptly, carry out the gesture semantic analysis according to the method for template matches according to predefine gesture template.In the present embodiment, carry out according to following method based on the semantic analysis of template matches:

At first edge image is carried out range conversion, promptly bianry image is carried out the distance map figure of range conversion with the equidimension that obtains corresponding former edge image, the new value of each " pixel " among this distance map figure is a distance value, and range conversion is defined as follows:

D(p)＝min(d _e(p，q)) q∈O (31)

Wherein, d _e(p, q) remarked pixel point p, the Euclidean distance between q, O is the element set of target object.Euclidean range conversion d _e(p q) is defined as follows:

d_{e} (p, q) = \sqrt{{(p_{x} - q_{x})}^{2} + {(p_{y} - q_{y})}^{2}} - - - (32)

For reducing amount of calculation, extracting operation is omitted, and promptly uses formula (33) to replace formula (32)

d _e(p，q)＝(p _x-q _x) ²+(p _y-q _y) ² (33)

Through as above-mentioned range conversion after, the new value of each point among the distance map figure of formation is in original image the distance apart from the nearest object pixel of this point.

Present embodiment uses unidirectional Hausdorff distance h, and (M I) carries out Model Matching.M is the gesture template edge pixel set of choosing, and I is the set of image edge pixels point behind the edge extracting.During coupling identification, earlier the image to be identified behind the edge extracting is implemented the Euclidean range conversion to obtain distance map figure, in metric space, template is carried out the translation coupling on distance map figure then.Correspondingly, h _j(M, I) (subscript j is the translation number of times) is taken as the maximum in some values of the edge pixel point corresponding position on the current distance mapping graph in the template, and it has measured template at the maximum degree of not matching between the corresponding pixel points on current translation position and the edge image.The Hausdorff of citation form apart from the decision rule of template matches is: get the above-mentioned h that obtains in all translation couplings _j(M, I) minimum value in the value is as the metric of similarity between this template corresponding objects that might exist in this template and this image.In the translation matching process, if the similarity that several models and current image to be identified occur very near the time, then again edge directional information is joined in the differentiation of translation coupling, judge that being in a certain translation point q (pixel with the template lower left corner in certain translation is a reference point) locates this moment, the condition whether mate template and edge image corresponding position is:

1. establishing the j time translation when coupling, is R if meet the ratio that point that coupling requires and template pixel always count in the template _j

R_{j} = \max_{q &Element; Q} {n ([m &Element; M | {&Exists;}_{i} &Element; I, | | (m + q) - i | | < τ, | | Ang (m) - Ang (i) | | < θ]) / n (M)} - - - (34)

In the following formula, Q is the point set that template is formed by template lower left corner pixel in the several times translation on edge image; The operation of n () for getting element number in the set; Function Ang (x) is the angle value of the edge pixel point x that tries to achieve; τ is given range difference threshold value; θ is given direction radian difference limen value.

2. get P (k)=max (R _j), wherein: k=1,2 ..., be the template number, J carries out the number of times of several times translation to some templates.

3. getting the indicated gesture of template corresponding with max P (k) is final recognition result.

Carry out template matches in the Hausdorff distance that present embodiment has been taked to revise, promptly by h to obtaining in the several times translation _i(M, I) on average obtain the similarity of template with respect to image to be identified,

h_{i} (M, I) = \frac{1}{N} h_{i} (M, I) - - - (35)

Wherein, N is the number of the edge pixel point in the template.

According to the method described above, can determine the semantic substantially of several mutual gestures.Figure 15 shows several examples of interaction semantics gesture.

● 3 d geometric modeling

As shown in Figure 7, three-dimensional set modeling 100 comprises that also a 3 d geometric modeling unit 107 is used for based on 105 acquisition motion and the gesture recognition from multiple video strems visual analysis unit, and based on obtaining semantic analysis result from the semantic recognition unit 106 of real-time, interactive, it is the three-dimensional modeling order, by reprocessing unit 1071, dithering process unit 1072, envelope computing unit 1073 and moulding edit cell 1074, set up new three-dimensional geometric design body, and existing three-dimensional geometry body edited replacement, and read and store three geometrical model memory cell 111 in real time.3 d geometric modeling unit 107 comprises following treatment step:

I. reprocessing

The repetitive operation that produces in the 1071 pairs of object of which movement processes in reprocessing unit is handled, and eliminates repetition, the plyability motion of object.

II. dithering process

The motion of the 1072 pairs of objects in dithering process unit is carried out smoothly, fairing processing, eliminates the small shake of movement locus of object and attitude.

III. envelope computing unit

The basic function of envelope computing unit 1073 is according to movement locus and the cross section profile eliminating shake already and repeated, solves the envelope differential equation, utilizes Runge-Kutta algorithm to calculate the enveloping surface that object of which movement produces then.The enveloping surface that this unit output object of which movement produces.

IV. moulding edit cell

The sports envelope face that moulding edit cell 1074 usefulness envelope computing units 1073 produce is replaced existing THREE DIMENSION GEOMETRIC MODELING model, and carries out the smooth connection processing of new and old dough sheet.Described replacement process is connected with level and smooth according to constraint and modification rule process splice point and Mosaic face, and the 3-D geometric model of being set up is made amendment.

● modeling rendering and demonstration

Modification to 3-D geometric model will activate the modeling rendering process, and drawing result is outputed on the display unit 109.

Thus, technical scheme according to first specific embodiment of the present invention has been described.In addition, as shown in Figure 8, the three-dimensional geometric mode building system 100 of present embodiment can be made of three general purpose digital computers, 1201,1202 and backend computers 1203 of two front-end computers.Per two digital cameras are connected on the front-end computer (1201,1202), and two front-

end computers

1201 and 1202 are connected on the backend computer 1203, and backend computer 1203 links to each other with video output device 109.In front-

end computer

1201,1202, all provide with its video input device that is connected 101 corresponding single video visual analysis unit 104 and many videos visual analysis unit 105 in three-dimensional matching unit 1051.In backend computer 1203, provide each establishment and the semantic recognition unit 106 of real-time, interactive of the many videos visual analysis unit 105 except that three-dimensional matching unit 1051,3 d geometric modeling unit 107, threedimensional model drawing unit 108.The storage system of computer 1203 provides semantic model memory cell 110 and 3-D geometric model memory cell 111.Above-mentioned establishment of the present invention can realize by software, firmware and integrated circuit or the like.

Second specific embodiment

As shown in Figure 9, the three-dimensional geometric mode building system 200 according to second embodiment of the invention also comprises a voice input device 202 and audio identification unit 203.Voice input device 202 can be general microphone and the sound card equipment that natural-sounding is converted to audio digital signals.In the present embodiment, voice input device equipment provides the auxiliary mutual input of multichannel user interactions mode as auxiliary input device.After obtaining the audio frequency input, voice recognition unit 203 carries out speech recognition, promptly realizes the audio frequency input is converted to the function of restricted language.Voice recognition unit 203 is only discerned and is limited to predefined speech pattern, and undefined phonetic entry will be dropped.For example, can use Microsoft Speech 5.X Microsoft speech recognition engine that basic voice are discerned.Be provided with by the limited syntax that are provided with in the speech recognition engine, can limit the phonetic entry of other behavior that may lead to system abnormity so that the voice command that system only provides in the recognizing grammar effectively improves discrimination based on the XML file.

The voice that voice recognition unit 203 identifies offer the semantic recognition unit 206 of real-time, interactive.The semantic recognition unit 206 of real-time, interactive also comprises a voice semantic analysis unit 2064 except that operations such as the collision detection of carrying out first embodiment, operation and interaction semantics analysis, the voice of voice recognition unit 203 identifications are carried out semanteme discern.When the phonetic entry that obtains from voice recognition unit 203 through identification, voice semantic analysis unit 2064 carries out semantic interpretation according to system's current context environment to obtaining voice.Show a semantic resolution file example below, wherein with＜O〉＜/O〉be that the semantic analytic grammar of mark is the optional syntax.The user can revise the content of this part according to concrete needs, embodies the thought of personalized man-machine interaction.

<P>

<O>

<L>

＜P〉please＜/P 〉

＜P〉I think＜/P 〉

</L>

</O>

<L>

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" CrePoint "〉the establishment point＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" CreLine "〉the establishment line＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" CreSurface "〉establishment face＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Delete "〉deletion＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Cancel "〉cancel＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Done "〉finish＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Edit "〉editor＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" SelePoint "〉choice point＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" SeleLine "〉selection wire＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" SeleSurface "〉selection face＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" zin "〉dwindle＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" zout "〉amplify＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Translation "〉translation＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" X "〉the X coordinate figure＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Y "〉the Y coordinate figure＜/P 〉

＜P PROPNAME=" TYPE_RULEREF " VALSTR=" Z "〉the Z coordinate figure＜/P 〉

</L>

<L>

<P VALSTR＝″value″>Value</P>

</L>

</P>

</RULE>

</GRAMMAR>

Simultaneously, the operation of carrying out in the semantic recognition unit 206 of real-time, interactive as shown in figure 10.The semantic recognition unit 206 of the real-time, interactive of present embodiment comprises collision detection unit 2061, operational semantics analytic unit 2062, interaction semantics analytic unit 2063, voice semantic analysis unit 2064 and three-dimensional modeling command storage unit 2065.The groundwork step of the semantic recognition unit 206 of real-time, interactive is as follows:

Collision detection unit 2061 reads the movement locus of movement locus memory cell 2058, detects the collision between the object model of the three-dimensional geometric design model of three-dimensional geometric design memory cell 211 and object model memory cell 2056 by collision checking method.Collision detection result will be as the input of operational semantics analytic unit 2062 and interaction semantics analytic unit 2063.Interaction semantics analytic unit 2063 and operational semantics analytic unit 2062 obtain the operational semantics of moving object for 3-D geometric model according to the data of storage in the collision detection result of collision detection unit 2061 and object model memory cell 2056, movement locus memory cell 2058, the semantic model memory cell 210, generate the three-dimensional modeling demanded storage in three-dimensional modeling command storage unit 2065.Voice semantic analysis unit 2064 obtains the semantic recognition result of restricted language from voice identification result unit 203, definition resolved in semanteme according to semantic model memory cell 210, obtains the explanation of voice semanteme and generate the three-dimensional modeling demanded storage in three-dimensional modeling command storage unit 2065.

At this, voice operating can be used as the control operation in command operation and the drawing process as auxiliary exchange channels in system's man-machine interaction process.

Except that foregoing description, identical with first embodiment according to a second embodiment of the present invention to obtain the technical scheme that similar part implements be identical.

The 3rd specific embodiment

Figure 11 shows the structure chart of the 3rd specific embodiment of using three-dimensional geometric mode building system 300 of the present invention.Video input apparatus 301 shown in Figure 11 is a kind of digital cameras.In the present embodiment, video input apparatus 301 is made up of six digital cameras; Corresponding to each digital camera has a single video stream visual analysis unit 304, and every digital camera is directly connected to the general purpose computing device interface according to common connected mode by connecting line, is handled by single video stream visual analysis unit 304.Voice input device 302 is a general microphone and the sound card equipment that natural-sounding is converted to audio digital signals.Different with first specific embodiment, this embodiment has not only adopted more camera head, and connected mode is also different.Specifically, video camera C01, C02 and C03 are positioned at the position contour with designer's hand exercise, and video camera C04, C05 and C06 are positioned at the designer top.This configuration is owing to the number that has increased video camera, thereby the occlusion issue that can better avoid motion process to cause.It during another difference the change of the connected mode of video camera.As shown in figure 13,6 video cameras are equally divided into two groups, and three video cameras of every group connect into two can be right for the image input that dual-view coupling is used, and offer multiple video strems visual analysis unit 305 and carry out that the degree of depth is obtained and three-dimensional coupling.

Six camera heads are placed on conceptual design person the place ahead, left front, right front, top, six the different positions in upper left side and upper right side respectively, its height and attitude apart from ground is expressed as suitable to be fit to designer's gesture, the design action that is the designer should be able to be shot with video-corder fully by video camera, and does not influence designer's operation and other activity.A kind of concrete layout execution mode of digital camera is illustrated by Figure 12.

In the present embodiment, computer system is made of three general purpose digital computers, as shown in figure 14.Per two video cameras are connected on the front-

end computer

1901,1902, and two front-end computers are connected on the backend computer 1903, and backend computer links to each other with audio input device, display device.

The running of present embodiment and first specific embodiment are basic identical.Open the computer system of this device as the three-dimensional modeling designer after, at first begin the system initialization process.Generate the 3-D geometric model of simple objects then.Different is that with respect to the multiple video strems visual analysis of first specific embodiment, this multiple video strems visual analysis of present embodiment is accepted the output of all six single video stream visual analysises.Its processing procedure is handled video input according to combination shown in Figure 14.

The implementation of three Geometric Modeling system and methods of the present invention has more than been described.Though top description is carried out with reference to specific embodiment of the present invention, should be appreciated that under the situation that does not break away from its spirit and can carry out various modifications.Therefore disclosure embodiment is exemplary rather than restrictive in all respects, and scope of the present invention is described by claims rather than front and represented, therefore belongs to the equivalents of claim and all changes of scope include therein.

Claims

1. a 3 d geometric modeling method comprises

Video input step: gather the staff video flowing from being distributed in designer's a plurality of video input devices (101) on every side;

Single video stream visual analysis step: a plurality of video flowings that above-mentioned collection video flowing step is gathered are handled through a single video stream visual analysis separately, to detect moving region and the non-moving region in the video flowing, estimate moving object travel direction and speed and predict next movement position and calculate the edge contour of moving object and estimate contour feature;

Multiple video strems visual analysis step: the result that receives above-mentioned some single video stream visual analysises, carry out the binocular solid coupling, and carry out three-dimensional reconstruction and movement locus of object match and calculate the moving object cross section, and the cross section profile of the object model, movement locus of object and the object that are obtained is offered the semantic identification of real-time, interactive treatment step based on the profile that is obtained and feature;

The semantic identification step of real-time, interactive: object model, cross section profile and movement locus according to multiple video strems visual analysis step is set up, detect the collision between 3-D geometric model and the object model, determine position and mode that collision takes place; And determine operational semantics such as semantic operand, action type according to predetermined semantic model according to the result of collision detection; And, determine interaction semantics according to the semantic model that collision detection result, the object model of being stored and the movement locus of being stored and semantic model memory cell are stored in advance;

The 3 d geometric modeling step: the output to the semantic identification step of object model, movement locus of object, object cross section profile and real-time, interactive of many videos visual analysis step output is handled, thereby obtain the three-dimensional geometric design moulding, and result is stored in the 3-D geometric model memory cell;

Threedimensional model plot step: the 3-D geometric model of real-time storage in the 3-D geometric model memory cell is plotted on the video output device;

Video output step: the designed THREE DIMENSION GEOMETRIC MODELING of display design teacher on video output device.

2. 3 d geometric modeling method according to claim 1 is characterized in that, described single video stream visual analysis step also comprises:

The image analysing computer step: the vision signal to its pairing video input device collection is handled, and obtains to have the characteristic video stream of different resolution yardstick and different characteristic element;

Real time kinematics detects step: the characteristic video stream that is obtained according to described image analysing computer step detects moving region in the video flowing and the direction of motion to be cut apart and the edge, moving region to obtain accurately the moving region;

The profile calculation procedure: by obtain the zone of hand based on Face Detection, the zone of removing hand obtains the zone of hand-held object, thereby calculates the edge contour in zone on the moving object zone;

Estimation and prediction steps: estimate moving object travel direction and speed and predict next movement position.

3. 3 d geometric modeling method according to claim 1 is characterized in that, described multiple video strems visual analysis step comprises:

Three-dimensional coupling step: receive motion detection result, calculate Region Segmentation and the moving object depth information that obtains moving object through three-dimensional coupling and parallax from the motion detection step output in two described single video stream visual analysis steps;

SFS﹠amp; The FBR step: the profile according to the profile calculation procedure output in the described single video stream visual analysis step, calculate and recover the object surfaces shape;

The track fitting step: a plurality of video camera photographic planes and video camera relative orientation parameter that usage space distributes estimate space continuous motion track by space coordinates intersection, curve fit;

The cross section calculation procedure: the continuous motion track according to the outline data and the above-mentioned track fitting step of the profile calculation procedure in described single vision video analysis step output obtained, calculate the cross section profile of moving object;

The object model establishment step: the indicated depth information data of result by above-mentioned three-dimensional coupling step are set up object model.

4. 3 d geometric modeling method according to claim 1 is characterized in that, described 3 d geometric modeling step comprises:

Reprocessing step: the repetitive operation that produces in the object of which movement process is handled, eliminated repetition, the plyability motion of object;

The dithering process step: to the motion of object carry out smoothly, fairing processing, eliminate the small shake of movement locus of object and attitude;

Envelope calculation procedure:, calculate the enveloping surface that object of which movement produces according to movement locus and the cross section profile eliminating shake and repeat;

Moulding edit step: the 3-D geometric model of being set up is made amendment.

5. 3 d geometric modeling method according to claim 1 is characterized in that, described method also comprises:

Step from speech input device input voice command;

Speech recognition steps: the voice command of importing from speech input device is discerned according to predetermined speech model; Wherein

The semantic identification step of described real-time, interactive also comprises a voice semantic analysis step: the output of many videos visual analysis step and the output of speech recognition steps are handled, acquisition man-machine interaction semanteme, and the man-machine interaction semantic results that is obtained offered the 3 d geometric modeling step.

6. 3 d geometric modeling method according to claim 1 is characterized in that, the quantity of described video input device is 4, is distributed in designer's right-hand, right front, left front and left respectively and places to be fit to the position that designer's gesture is expressed.

7. 3 d geometric modeling method according to claim 1, it is characterized in that, the quantity of described video input device is 6, is distributed in designer's the place ahead, left front, right front, top, upper left side and upper right side respectively and places to be fit to the position that designer's gesture is expressed.

8. three-dimensional geometric mode building system comprises:

A plurality of video input devices: be distributed in the staff video flowing that the designer is used to gather designer's design action on every side;

A plurality of single video stream visual analysises unit: corresponding to each video input device, make a plurality of video flowings of gathering by above-mentioned video input device flow the processing of visual analysis unit separately through a described single video, to detect moving region and the non-moving region in the video flowing, estimate moving object travel direction and speed and predict next movement position and calculate the edge contour of moving object and estimate contour feature;

Multiple video strems visual analysis unit: the result that is used to receive above-mentioned a plurality of single video stream visual analysises unit, carry out the binocular solid coupling, and carry out three-dimensional reconstruction and movement locus of object match based on profile that is obtained and feature, calculate the moving object cross section, and the cross section profile of the object model, movement locus of object and the object that are obtained is offered the semantic recognition unit of real-time, interactive;

The semantic recognition unit of real-time, interactive: be used for object model, cross section profile and the movement locus set up according to multiple video strems visual analysis unit, detect the collision between 3-D geometric model and the object model, determine position and mode that collision takes place; And determine operational semantics such as semantic operand, action type according to predetermined semantic model according to the result of collision detection; And, determine interaction semantics according to the semantic model that collision detection, the object model of being stored and the movement locus of being stored and semantic model memory cell are stored in advance;

3 d geometric modeling unit: be used for the output result of many videos visual analysis unit and the output of the semantic recognition unit of real-time, interactive are carried out integrated treatment, thereby obtain the three-dimensional geometric design moulding, and result is stored in the 3-D geometric model memory cell;

Threedimensional model drawing unit: be used for the 3-D geometric model of 3-D geometric model memory cell real-time storage is plotted to video output device;

Video output device: be used for the designed THREE DIMENSION GEOMETRIC MODELING of display design teacher.

9. three-dimensional geometric mode building system according to claim 8 is characterized in that, described single video stream visual analysis unit also comprises:

Image analysing computer unit: be used for the vision signal of its pairing video input device collection is handled, obtain to have the characteristic video stream of different resolution yardstick and different characteristic element;

Real time kinematics detecting unit: be used for the characteristic video stream that obtained according to described image analysing computer unit and detect the moving region of video flowing and the direction of motion and cut apart and the edge, moving region to obtain accurately the moving region;

The profile computing unit: be used for passing through to obtain based on Face Detection on the moving object zone zone of hand, the zone of removing hand obtains the zone of hand-held object, thereby calculates the edge contour in zone;

Estimation and predicting unit: be used to estimate moving object travel direction and speed and predict next movement position.

10. three-dimensional geometric mode building system according to claim 8 is characterized in that, described multiple video strems visual analysis unit comprises:

Three-dimensional matching unit: be used for receiving the motion detection result that two described single video flow the motion detection unit output of visual analysis unit, three-dimensional coupling of process and parallax calculating obtain the Region Segmentation and the moving object depth information of moving object;

SFS﹠amp; FBR unit: be used for profile result of calculation, recover the object surfaces shape according to the profile computing unit output of described single video stream visual analysis unit;

Track fitting unit: be used for a plurality of video camera photographic planes and video camera relative orientation parameter that usage space distributes, estimate space continuous motion track by space coordinates intersection, curve fit;

Cross section computing unit: be used for calculating the cross section profile of moving object according to the outline data of the profile computing unit output of described single video stream visual analysis unit and the continuous motion track that above-mentioned track fitting unit is obtained;

Object model is set up the unit: be used for setting up object model by the indicated depth information data of the result of above-mentioned three-dimensional matching unit.

11. three-dimensional geometric mode building system according to claim 8 is characterized in that, described 3 d geometric modeling unit comprises:

Reprocessing unit: be used for the repetitive operation that the object of which movement process produces is handled, eliminate repetition, the plyability motion of object;

The dithering process unit: be used for to the motion of object carry out smoothly, fairing processing, eliminate the small shake of movement locus of object and attitude;

Envelope computing unit: be used for calculating the enveloping surface that object of which movement produces according to movement locus and the cross section profile eliminating shake and repeat;

Moulding edit cell: be used for the 3-D geometric model of being set up is made amendment.

12. three-dimensional geometric mode building system according to claim 8 is characterized in that, described system also comprises:

Speech input device: be used to import voice command;

Voice recognition unit: be used for according to predetermined speech model to discerning from the voice command of speech input device input; Wherein

The semantic recognition unit of described real-time, interactive is used for the output of many videos visual analysis unit and the output of voice recognition unit are handled, and obtains the man-machine interaction semanteme, and the man-machine interaction semantic results that is obtained is offered the 3 d geometric modeling unit.

13. three-dimensional geometric mode building system according to claim 8 is characterized in that, the quantity of described video input device is 4, is distributed in designer's right-hand, right front, left front and left respectively and places to be fit to the position that designer's gesture is expressed.

14. three-dimensional geometric mode building system according to claim 8, it is characterized in that, the quantity of described video input device is 6, is distributed in designer's the place ahead, left front, right front, top, upper left side and upper right side respectively and places to be fit to the position that designer's gesture is expressed.