CN105338370A

CN105338370A - Method and apparatus for synthetizing animations in videos in real time

Info

Publication number: CN105338370A
Application number: CN201510716312.7A
Authority: CN
Inventors: 殷元江
Original assignee: Beijing 7d Vision Technology Co Ltd
Current assignee: Beijing 7d Vision Technology Co Ltd
Priority date: 2015-10-28
Filing date: 2015-10-28
Publication date: 2016-02-17

Abstract

The embodiment of the invention discloses a method and apparatus for synthetizing animations in videos in real time. The method and apparatus are applied to videos which are acquired in real time. The method includes the following steps that: a face data and face coordinates which are acquired in real time by face capture equipment are obtained, a face model is generated in a target region in a virtual area according to region position information, the face data and the face coordinates; and a virtual camera keeps synchronous with a main camera, so that synthetic video data can be obtained according to the animations data of face animations which are synthetized in real time by a 3D coordinate system in video data, and the synthetic video data are outputted in real time, and therefore, when the synthetic video data are played, the face animations are synthetized at a position corresponding to the target region in the displayed synthetic video. According to the method and apparatus of the invention, by means of the cooperation of the processor of a rendering engine and the processor of a server and the synchronization of the virtual camera and the main camera, the face animations can be synthetized in the videos which are synthetized in real time, and existing video visual communication requirements can be satisfied.

Description

A kind of method and apparatus synthesizing animation in video in real time

Technical field

The present invention relates to real-time video synthesis field, particularly relate to a kind of method and apparatus synthesizing animation in video in real time.

Background technology

Video belongs to a kind of common media formats, such as, just can obtain real-time live video data by the live TV programme of camera acquisition.In live process, in order to improve live effect or in order to increase artistic expression, can at the scene in video data by the mode of Data Synthesis, animation effect is synthesized at the scene in video, a kind of emerging visual communication form, the such as animation of strengthening performance personage expression, the artistic word of offscreen voice and some background animation effects etc.

But, the animation compound be applied at present in TV or Internet video broadcasting mainly still relies in the process of video post-processing and completes, namely non-live when, carry out animation compound in the video data recorded after, then by the video data after synthesis by TV or netcast.In live live video data, all cannot anticipation owing to gathering the shooting angle of video camera of video data, camera position, animation is synthesized in live video in live if want, due to the change of video pictures cannot be estimated, in order to ensure that synthetic effect is not lofty, can only synthesize in video at most some duration very short, the animation of 2D or picture.

Visible, at present animation compound effect in video especially in live live video in real time the ability of synthesis animation substantially do not have, effectively cannot meet the demand of existing visual communication.

Summary of the invention

In order to solve the problems of the technologies described above, the invention provides a kind of method and apparatus synthesizing animation in video in real time, achieving the function of synthesizing FA Facial Animation in video in real time, effectively meet existing video visual and pass on demand.

The embodiment of the invention discloses following technical scheme:

Synthesize a method for animation in video in real time, be applied in the video of Real-time Collection, the fixed area gathering described video comprises at least one video camera, described video gather by the main camera in described at least one video camera; Server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection; Described server uses render engine to set up virtual region according to described fixed area and described 3D coordinate system, and the positional information of described fixed area in described 3D coordinate system and the positional information of described virtual region in described 3D coordinate system have proportionate relationship; Described server uses described render engine to arrange virtual video camera in described virtual region, synchronous described virtual video camera and described main camera, what make the positional information of described virtual video camera and video acquisition parameter keep with described main camera in real time is consistent; Described method comprises:

Described server obtains the zone position information of described target area in described 3D coordinate system according to the target area for the synthesis of animation determined;

Described server obtains the face data and facial coordinate that are arrived by facial capture device Real-time Collection, and described facial coordinate and described 3D coordinate system have corresponding relation;

Described server generates mask according to described zone position information, face data and facial coordinate in the described target area of described virtual region;

Described server obtains the video data of the described video of described main camera Real-time Collection; Described server keeps synchronous with described main camera by virtual video camera, obtains described mask in described virtual region according to the FA Facial Animation that described face data is formed;

The animation data of described FA Facial Animation extracts by described server from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data and simultaneously real-time output, to make when playing described synthetic video data, the described FA Facial Animation of position synthesis of corresponding described target area in the synthetic video of displaying.

Optionally, before the animation data of described skeleton cartoon extracts by described server from described render engine, also comprise:

Described server obtains the action data and action coordinate that are arrived by motion capture equipment Real-time Collection, and described action coordinate and described 3D coordinate system have corresponding relation;

Described server, according to described zone position information, action data and action coordinate, generates skeleton model in the described target area of described virtual region;

Described server keeps synchronous with described main camera by virtual video camera, obtains described skeleton model in described virtual region according to the skeleton cartoon that described action data is formed;

The animation data of described FA Facial Animation extracts by described server from described render engine, also comprises:

The animation data of the animation data of described skeleton cartoon and described FA Facial Animation extracts by described server from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data and simultaneously real-time output in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video of displaying.

Optionally, comprise some sub-action datas in described action data, described sub-action data has respective node identification, and described node identification is for representing the active node of the action collection target being gathered action data by described action collecting device; Described server, according to described zone position information, action data and action coordinate, generates skeleton model, comprising in the described target area of described virtual region:

Described server determines the bone node on the described skeleton model corresponding to described node identification;

Described server, according to described sub-action data and corresponding action coordinate, determines the described position of bone node in described skeleton model;

Described server, according to the described position of bone node in described skeleton model determined, generates described skeleton model in the described target area of described virtual region.

Optionally, described server sets up the 3D coordinate system of described fixed area, and the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection, comprising:

Described server sets up the 3D coordinate system of described fixed area by the multiple collecting devices arranged in described fixed area, and described server is the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera according to described multiple collecting device and the reflector Real-time Collection that arranges on described at least one video camera respectively.

Optionally, described collecting device comprises infrared pick-up head, and described reflector comprises infrared reflective device.

Synthesize a device for animation in video in real time, be applied in the video of Real-time Collection, the fixed area gathering described video comprises at least one video camera, described video gather by the main camera in described at least one video camera; Server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection; Described server uses render engine to set up virtual region according to described fixed area and described 3D coordinate system, and the positional information of described fixed area in described 3D coordinate system and the positional information of described virtual region in described 3D coordinate system have proportionate relationship; Described server uses described render engine to arrange virtual video camera in described virtual region, synchronous described virtual video camera and described main camera, what make the positional information of described virtual video camera and video acquisition parameter keep with described main camera in real time is consistent; Described device comprises:

Location information acquiring unit, for obtaining the zone position information of described target area in described 3D coordinate system according to the target area for the synthesis of animation determined;

Face acquiring unit, for obtaining the face data and facial coordinate that are arrived by facial capture device Real-time Collection, described facial coordinate and described 3D coordinate system have corresponding relation;

Mask generation unit, for generating mask according to described zone position information, face data and facial coordinate in the described target area of described virtual region;

FA Facial Animation acquiring unit, for obtaining the video data of the described video of described main camera Real-time Collection; Described server keeps synchronous with described main camera by virtual video camera, obtains described mask in described virtual region according to the FA Facial Animation that described face data is formed;

Real-time synthesis unit, for the animation data of described FA Facial Animation is extracted from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data and simultaneously real-time output, to make when playing described synthetic video data, the described FA Facial Animation of position synthesis of corresponding described target area in the synthetic video of displaying.

Optionally, also comprise:

Action acquiring unit, for before the real-time synthesis unit of triggering, obtain the action data and action coordinate that are arrived by motion capture equipment Real-time Collection, described action coordinate and described 3D coordinate system have corresponding relation;

Skeleton model generation unit, for according to described zone position information, action data and action coordinate, generates skeleton model in the described target area of described virtual region;

Skeleton cartoon acquiring unit, for keeping synchronous with described main camera by virtual video camera, obtains described skeleton model in described virtual region according to the skeleton cartoon that described action data is formed;

Described real-time synthesis unit is also for extracting the animation data of the animation data of described skeleton cartoon and described FA Facial Animation from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data and simultaneously real-time output in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video of displaying.

Optionally, comprise some sub-action datas in described action data, described sub-action data has respective node identification, and described node identification is for representing the active node of the action collection target being gathered action data by described action collecting device; Described skeleton model generation unit, comprising:

Node determination subelement, for determining the bone node on the described skeleton model corresponding to described node identification;

Subelement is determined in position, for according to described sub-action data and corresponding action coordinate, determines the described position of bone node in described skeleton model;

Generate subelement, for according to the described position of bone node in described skeleton model determined, in the described target area of described virtual region, generate described skeleton model.

As can be seen from technique scheme, determining the 3D coordinate system of fixed area, video camera used is especially after the positional information of main camera and video acquisition parameter, the virtual region corresponding with described fixed area is set up by render engine, virtual video camera in described virtual region keeps synchronous with the main camera in described fixed area, the region that the region that collected in described virtual region by virtual video camera and described main camera are collected in described fixed area can be consistent constantly, real-time animation compound can be realized.After determining the positional information for the synthesis of the target area of animation according to described 3D coordinate system, mask is set up in the position of render engine corresponding described target area in virtual region by the face data gathered by facial capture device and facial coordinate of described server, by the FA Facial Animation that described virtual video camera keeps the mode synchronous with described main camera to obtain described mask being formed according to described action data, described server extracts the animation data of described FA Facial Animation from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data, to make when playing described synthetic video data, the described FA Facial Animation of position synthesis of corresponding described target area in the synthetic video shown.By the cooperation of render engine its own processor and described processor-server, and virtual video camera and main camera is synchronous, thus realizes the real-time function of synthesizing FA Facial Animation in video, effectively meets existing video visual and passes on demand.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

A kind of method flow diagram synthesizing animation method in video in real time that Fig. 1 provides for the embodiment of the present invention;

The Organization Chart of the video production system of a kind of FA Facial Animation that Fig. 2 provides for the embodiment of the present invention;

A kind of method flow diagram synthesizing animation method in video in real time that Fig. 3 provides for the embodiment of the present invention;

The exemplary plot of a kind of three-dimensional animation model action that Fig. 4 provides for the embodiment of the present invention;

A kind of structure drawing of device synthesizing animating means in video in real time that Fig. 5 provides for the embodiment of the present invention;

A kind of structure drawing of device synthesizing animating means in video in real time that Fig. 6 provides for the embodiment of the present invention.

Embodiment

For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

For live live video data, all cannot anticipation or know in advance owing to gathering the shooting angle of video camera of video data, camera position, be difficult to directly carry out animation compound in video.Traditional video cartoon synthetic technology, often can only be realized by the mode of post-production, is difficult to realize animation compound in the video of shooting, real-time live broadcast at the scene.Even if some company can realize real-time animation synthesis at present, but because the shooting angle of the video camera of above-mentioned collection video data, camera position all cannot anticipation or the problem known in advance cannot solve, the change of video pictures cannot be estimated, in order to ensure that synthetic effect is not lofty, can only synthesize in video at most some duration very short, the animation of 2D or picture, effectively cannot meet the demand of existing visual communication.

For this reason, embodiments provide a kind of method and apparatus synthesizing animation in video in real time, determining the 3D coordinate system of fixed area, video camera used is especially after the positional information of main camera and video acquisition parameter, the virtual region corresponding with described fixed area is set up by render engine, virtual video camera in described virtual region keeps synchronous with the main camera in described fixed area, the region that the region that collected in described virtual region by virtual video camera and described main camera are collected in described fixed area can be consistent constantly, real-time animation compound can be realized.After determining the positional information for the synthesis of the target area of animation according to described 3D coordinate system, mask is set up in the position of render engine corresponding described target area in virtual region by the face data gathered by facial capture device and facial coordinate of described server, by the FA Facial Animation that described virtual video camera keeps the mode synchronous with described main camera to obtain described mask being formed according to described action data, described server extracts the animation data of described FA Facial Animation from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data, to make when playing described synthetic video data, the described FA Facial Animation of position synthesis of corresponding described target area in the synthetic video shown.By the cooperation of render engine its own processor and described processor-server, and virtual video camera and main camera is synchronous, thus realizes the real-time function of synthesizing FA Facial Animation in video, effectively meets existing video visual and passes on demand.

Further, in the embodiment of the present invention, can also on the basis of mask, the action data that the render engine of described server arrives according to motion capture equipment Real-time Collection and action coordinate, set up skeleton model, by the skeleton cartoon that described virtual video camera keeps the mode synchronous with described main camera to obtain described skeleton model being formed according to described action data, described server extracts the animation data of described skeleton cartoon and the animation data of described FA Facial Animation from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video shown.By the cooperation of render engine its own processor and described processor-server, and virtual video camera and main camera is synchronous, thus realize the real-time function of synthesizing skeleton cartoon and FA Facial Animation in video, meet existing video visual further and pass on demand.

Embodiment one

A kind of method flow diagram synthesizing animation method in video in real time that Fig. 1 provides for the embodiment of the present invention.Described method is applied in the video of Real-time Collection, before the described method of enforcement, needs first to get relevant parameter.

The fixed area gathering described video comprises at least one video camera, described video gather by the main camera in described at least one video camera; Server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection; Described server uses render engine to set up virtual region according to described fixed area and described 3D coordinate system, and the positional information of described fixed area in described 3D coordinate system and the positional information of described virtual region in described 3D coordinate system have proportionate relationship; Described server uses described render engine to arrange virtual video camera in described virtual region, synchronous described virtual video camera and described main camera, what make the positional information of described virtual video camera and video acquisition parameter keep with described main camera in real time is consistent.

Illustrate, described fixed area has clear and definite zone boundary, can be a film studio, a room or place's outdoor location etc.Described 3D coordinate system is for identifying described fixed area, and in described 3D coordinate system, described fixed area has fixing positional information such as coordinate figure.

For shooting, at least one video camera may be needed.The quantity of video camera is along with the demand change of taking.When for multiple cameras, output video data be main camera, main camera can switch in described multiple cameras along with shooting demand.The seat in the plane of described main camera, movement and shooting angle can be determined by the positional information of main camera in described at least one video camera and video acquisition parameter, even also have the visual angle in shooting process further and zoom out.

Optionally, the invention provides a kind of mode set up described 3D coordinate system efficiently and effectively, determine camera position and video acquisition parameter, described server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection, comprising:

Illustrate, if described fixed area is in indoor, multiple collecting device can being hung on the ceiling, several corner goes out to arrange multiple collecting device, for setting up the described 3D coordinate system of corresponding described fixed area, and gathering the positional information of described fixed area in described 3D coordinate system.For video camera, reflector can be set on video camera, the signal can launched by the described collecting device of reflection or other modes, make the positional information of collecting device determination reflector in described 3D coordinate system and partial video acquisition parameter such as shooting angle, shooting seat in the plane etc.Further, reflector can be set in video camera, make collecting device determine that furthering of described main camera camera lens such as to zoom out at the operation by reflector.If described fixed area is outdoor, also above-mentioned functions can be realized by hanging the modes such as multiple collecting devices aloft.Comparatively effectively can be realized the function of station acquisition by infrared ray, therefore optionally, described collecting device comprises infrared pick-up head, described reflector comprises infrared reflective device.And be arranged in main camera and can be specially sensor device for what detect main camera camera lens.

The render engine that described server has is similar to the video card of described server, there is independently processor or to can be understood as graphic process unit (English: GraphicsProcessingUnit, abbreviation: GPU), can effectively for described server processor or can be understood as central processing unit (English: CentralProcessingUnit, abbreviation: CPU) share work for the treatment of.Described server can use described render engine to set up out virtual region according to each location parameter obtained before.Between described virtual region and described fixed area, positional information has proportionate relationship, and the size of the described virtual region such as set up can be consistent with described fixed area etc.

Described virtual video camera can be understood as the visual angle observing described virtual region, can, according to the positional information of described main camera and described video acquisition parameter, utilize the computing of quadravalence matrix to determine the position etc. of described virtual video camera in described virtual region.The shooting such as angle, the seat in the plane relevant parameter that described virtual video camera can be made to take in described virtual region by synchronous described virtual video camera and described main camera is all consistent with the relevant parameter that described main camera is taken in described fixed area.If when it should be noted that proportionate relationship between described virtual region and described fixed area for 1:1, also need when synchronous described virtual video camera and described main camera the impact considering described proportionate relationship.

As shown in Figure 1, described method comprises:

S101: described server obtains the zone position information of described target area in described 3D coordinate system according to the target area for the synthesis of animation determined.

Due to the animation that there is not synthesis actual in described fixed area, therefore the described target area of synthesis animation is generally prior determines.When taking in described fixed area like this, be convenient to participant such as host, welcome guest in prompting program, take part in a performance people etc., the animation accomplished and will synthesize in described target area carries out interaction or does not enter this target area or the region between this target area and main camera, avoids the situations such as the animation blocking synthesis.Because described fixed area and described virtual region are all dependent on described 3D coordinate system, therefore the particular location of described target area in described fixed area and described virtual region can be specified by described zone position information.

S102: described server obtains the face data and facial coordinate that are arrived by facial capture device Real-time Collection, and described facial coordinate and described 3D coordinate system have corresponding relation.

Illustrate, described facial capture device can for being worn on the data acquisition unit on face, by functions such as shootings, identifies and export the change of characteristic point on face, thus obtaining face data and facial coordinate, described facial coordinate can be the coordinate figure of human face characteristic point.

Next how detailed description is implemented to collect action data and action coordinate by facial capture device.As shown in Figure 2, a kind of Organization Chart of video production system of FA Facial Animation that provides for the embodiment of the present invention of Fig. 2.Server 1 is connected with main camera 2 and motion capture equipment 3 (be worn in diagram on action gathered person in other words the second objective body on) respectively.Wherein, the mode of connection can be wired, also can be wireless, and the application is also not specifically limited.

Main camera 2, for gathering the video data of first object body (can be the participant in fixed area).Because first object body is in fixed area, the video data of the first object body that main camera 2 collects, except including first object volume image, also includes spatial scene image.A concrete example of this kind of scene is, video camera shooting host presides over video image during program.

Face capture device 3, is worn on the second objective body, for obtaining the action data of described second objective body.Wherein, the second objective body is any objective body that can be movable, as human body.The face data that face capture device 3 gets is for representing facial expression and the action of the second objective body.Still for above-mentioned Sample Scenario, this human body can be the other host with this host's interaction.

It should be noted that, main camera 2 and facial capture device 3, in video production process, are image data incessantly, and the data collected are sent to server 1 in real time.Server 1 can carry out the video that real-time manufacturing comprises FA Facial Animation according to the data received.Be understandable that, video is synthesized by multi-frame video image sets, the video creating method of what the embodiment of the present invention provided comprise FA Facial Animation, that the manufacturing process of an only just independent frame video image is described, the method that the manufacturing process of other each frame video images can provide see the embodiment of the present invention equally.

S103: described server, according to described zone position information, face data and facial coordinate, generates mask in the described target area of described virtual region.

Illustrate, utilize three-dimensional animation drawing tool, draw mask.It should be noted that, mask is drawn according to the second objective body, is used for simulation second objective body.Certainly, when mask is illustrated in the final target video drawn, can with the second objective body equal proportion, or, also can be different proportion.In addition, the outward appearance of mask can be that the second objective body is identical, or, also can be different from the second objective body, represent the face action such as the expression expression of face as utilized animal cartoon image.

Such as, mask simulates the mode of the second objective body, can be the face feature point according to the second objective body, arranges the facial node of respective numbers in mask, and each facial node is used for uniquely representing a face feature point of the second objective body.Like this, when face feature point moves, the facial node of mask can correspondingly move.Because mask is made up of facial node, after determining the action of facial node, just can obtain the FA Facial Animation of mask.It should be noted that, this frame FA Facial Animation is a frame animation image of mask.

Coordinate in described facial coordinate and described 3D coordinate system has corresponding relation, can determine by a facial coordinate coordinate figure that this facial coordinate is corresponding in described 3D coordinate system.

S104: described server obtains the video data of the described video of described main camera Real-time Collection; Described server keeps synchronous with described main camera by virtual video camera, obtains described mask in described virtual region according to the FA Facial Animation that described face data is formed.

Step S103 and S104 performs simultaneously, also can be that S103 performs prior to S104.

If perform simultaneously, the shooting that can be understood as face seizure and described video is carried out simultaneously, namely carry out in live process in described fixed area, face catches and also carries out in another region simultaneously, utilize the face data of Real-time Obtaining and facial coordinate to set up skeleton model, and virtual video camera collect described FA Facial Animation according to the positional information of the main camera in on-the-spot broadcasting process and video acquisition parameter synchronization.

If S103 performs prior to S104, can be understood as and first complete face seizure, and set up mask by the parameter caught.By the time start when described fixed area on-the-spot broadcasting, then gather described FA Facial Animation according to synchronous virtual video camera.But need the consistency of main time axle in the process gathered, the correlated characteristic of time shaft belongs to mature technology, just repeats no more here.

Such as, when needing to start to make video image, related personnel can start the operation of making by triggering video on the server, thus generating video starts the instruction of making.After server receives this instruction, start the video data receiving main camera transmission, and the action data that facial capture device sends.Wherein, the video data that main camera sends is the video image of the first object body himself collected; The action data that face capture device sends is the action data of the second objective body himself captured.Wherein, the action of first object body and the second objective body can be interactive.

S105: the animation data of described FA Facial Animation extracts by described server from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data and simultaneously real-time output, to make when playing described synthetic video data, the described FA Facial Animation of position synthesis of corresponding described target area in the synthetic video of displaying.

It should be noted that because the foundation of mask and the collection of FA Facial Animation are all completed by described render engine substantially, effectively reduce the processing pressure of described processor-server.Before carrying out animation compound, the animation data of the FA Facial Animation that described render engine process obtains extracts by described server from described render engine, under the process of the processor of described server self, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data is in real time obtained synthetic video data.Concrete, in the process of generated data, need the dimension scale etc. considering described FA Facial Animation and described video.

Such as, the video data of first object body is a frame video image, and a frame FA Facial Animation of mask is a frame animation image, is embedded in video image by animated image, thus generates the target video comprising FA Facial Animation, is specially a frame target video image.It should be noted that, the embodiment of the present invention can be uninterrupted multiple exercise, and therefore, can generate multiframe target video image, multiframe target video image is then combined as target video.

Exporting synthesizing the synthetic video data obtained in real time, can reach in the process of collection in worksite video, carrying out in live process, by the real-time synthesis of animation, output synthetic video data that can be real-time.So when watching live described video by TV or network, except the described video gathered in described fixed area can be seen, the animation synthesized in described video can also be seen.Improve visual communication ability, especially improve the ability that this real-time vision is passed in live.

Can find out according to above-described embodiment, determining the 3D coordinate system of fixed area, video camera used is especially after the positional information of main camera and video acquisition parameter, the virtual region corresponding with described fixed area is set up by render engine, virtual video camera in described virtual region keeps synchronous with the main camera in described fixed area, the region that the region that collected in described virtual region by virtual video camera and described main camera are collected in described fixed area can be consistent constantly, real-time animation compound can be realized.After determining the positional information for the synthesis of the target area of animation according to described 3D coordinate system, mask is set up in the position of render engine corresponding described target area in virtual region by the action data gathered by motion capture equipment and action coordinate of described server, by the FA Facial Animation that described virtual video camera keeps the mode synchronous with described main camera to obtain described mask being formed according to described action data, described server extracts the animation data of described FA Facial Animation from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data, to make when playing described synthetic video data, the described FA Facial Animation of position synthesis of corresponding described target area in the synthetic video shown.By the cooperation of render engine its own processor and described processor-server, and virtual video camera and main camera is synchronous, thus realizes the real-time function of synthesizing FA Facial Animation in video, effectively meets existing video visual and passes on demand.

Embodiment two

In the embodiment of the present invention, except can be real-time the FA Facial Animation of synthesis in video except, further, can also skeleton cartoon be synthesized, the visual communication effect of synthesis animation can be increased by the limb action of skeleton cartoon.

On the basis of embodiment corresponding to Fig. 1, a kind of method flow diagram synthesizing animation method in video in real time that Fig. 3 provides for the embodiment of the present invention.

S301: described server obtains the zone position information of described target area in described 3D coordinate system according to the target area for the synthesis of animation determined.

S302: described server obtains the action data and action coordinate that are arrived by motion capture equipment Real-time Collection, and described action coordinate and described 3D coordinate system have corresponding relation.And described server obtains the face data and facial coordinate that are arrived by facial capture device Real-time Collection, described facial coordinate and described 3D coordinate system have corresponding relation.

It should be noted that what wear described motion capture equipment can be the 3rd objective body, the people wearing described facial capture device can be identical with the people wearing described motion capture equipment, also can be different.Namely, described second objective body can be identical with the 3rd objective body, also can be different.

S303: described server, according to described zone position information, action data and action coordinate, generates skeleton model in the described target area of described virtual region.And described server generates mask according to described zone position information, face data and facial coordinate in the described target area of described virtual region.

Illustrate, utilize three-dimensional animation drawing tool, draw skeleton model.It should be noted that, skeleton model is drawn according to the 3rd objective body, is used for simulation the 3rd objective body.Certainly, when skeleton model is illustrated in the final target video drawn, can with the 3rd objective body equal proportion, or, also can be different proportion.In addition, the outward appearance of skeleton model can be that the 3rd objective body is identical, or, also can be different from the 3rd objective body, as the action utilizing animal cartoon image to represent human body.

Such as, the mode of skeleton model simulation the 3rd objective body, can be the movable node according to the 3rd objective body, arrange the bone node of respective numbers in skeleton model, each bone node be used for uniquely representing a movable node of the 3rd objective body.Like this, when movable joint movements, the bone node of skeleton model can correspondingly move.Because skeleton model is made up of bone node, after determining the action of bone node, just can obtain the skeleton cartoon of skeleton model.It should be noted that, this frame skeleton cartoon is a frame animation image of skeleton model.

Coordinate in described action coordinate and described 3D coordinate system has corresponding relation, can determine by an action coordinate coordinate figure that this action coordinate is corresponding in described 3D coordinate system.

S304: described server obtains the video data of the described video of described main camera Real-time Collection; Described server keeps synchronous with described main camera by virtual video camera, obtains described skeleton model in described virtual region according to the skeleton cartoon that described action data is formed; And obtain described mask in described virtual region according to the FA Facial Animation that described face data is formed.

Step S303 and S304 performs simultaneously, also can be that S303 performs prior to S304.

If perform simultaneously, the shooting that can be understood as motion capture and described video is carried out simultaneously, namely carry out in live process in described fixed area, motion capture is also carried out in another region simultaneously, utilize the action data of Real-time Obtaining and action coordinate to set up mask, and virtual video camera collect described skeleton cartoon according to the positional information of the main camera in on-the-spot broadcasting process and video acquisition parameter synchronization.

If S303 performs prior to S304, can be understood as first execution and catch, and set up skeleton model by the parameter caught.By the time start when described fixed area on-the-spot broadcasting, then gather described skeleton cartoon according to synchronous virtual video camera.But need the consistency of main time axle in the process gathered, the correlated characteristic of time shaft belongs to mature technology, just repeats no more here.

Such as, when needing to start to make video image, related personnel can start the operation of making by triggering video on the server, thus generating video starts the instruction of making.After server receives this instruction, start the video data receiving main camera transmission, and the bone action data that motion capture equipment sends.Wherein, the video data that main camera sends is the video image of the first object body himself collected; The bone action data that motion capture equipment sends is the action data of the 3rd objective body himself captured.Wherein, the bone action of first object body and the 3rd objective body can be interactive.

S305: the animation data of the animation data of described skeleton cartoon and described FA Facial Animation extracts by described server from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data and simultaneously real-time output in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video of displaying.

It should be noted that because the foundation of skeleton model and the collection of skeleton cartoon are all completed by described render engine substantially, effectively reduce the processing pressure of described processor-server.Before carrying out animation compound, the animation data of the skeleton cartoon that described render engine process obtains extracts by described server from described render engine, under the process of the processor of described server self, the animation data synthesizing described skeleton cartoon according to described 3D coordinate system in described video data is in real time obtained synthetic video data.Concrete, in the process of generated data, need the dimension scale etc. considering described skeleton cartoon and described video.In the process of generated data, also need the dimension scale etc. considering described skeleton cartoon and described FA Facial Animation.

Such as, the video data of first object body is a frame video image, and a frame FA Facial Animation of skeleton model is a frame animation image, is embedded in video image by animated image, thus generates the target video comprising skeleton cartoon, is specially a frame target video image.It should be noted that, the embodiment of the present invention can be uninterrupted multiple exercise, and therefore, can generate multiframe target video image, multiframe target video image is then combined as target video.

As can be seen from the above-described embodiment, can also on the basis of mask, the action data that the render engine of described server arrives according to motion capture equipment Real-time Collection and action coordinate, set up skeleton model, by the skeleton cartoon that described virtual video camera keeps the mode synchronous with described main camera to obtain described skeleton model being formed according to described action data, described server extracts the animation data of described skeleton cartoon and the animation data of described FA Facial Animation from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video shown.By the cooperation of render engine its own processor and described processor-server, and virtual video camera and main camera is synchronous, thus realize the real-time function of synthesizing skeleton cartoon and FA Facial Animation in video, meet existing video visual further and pass on demand.

Embodiment three

In actual applications, motion capture equipment can be the action data gathering the 3rd objective body according to the frequency acquisition preset.Such as, the frequency acquisition preset is 50 times per second, and namely motion capture equipment gathered 50 action datas in one second.Certainly this numerical value is only that example illustrates, the present invention is not limited thereto.In addition, server, can be previously provided with refresh rate, for convenience of description, this refresh rate is called default refresh rate.Server generates every frame video image of target video according to default refresh rate.Such as, the refresh rate of server can be 40 times per second, then server generates the video image of 40 frames in one second.Certainly this numerical value is only that example illustrates, the present invention is not limited thereto.

There is a kind of situation is, the refresh rate of server is different from the frequency acquisition of motion capture equipment, then the action data that how basis receives controls the bone node of skeleton model, is the technical issues that need to address.To this, the invention provides the specific implementation of S103 and S104 in embodiment corresponding to above-mentioned Fig. 1.

Such as, motion capture equipment gathers the action data of the 3rd objective body according to default frequency acquisition, therefore, and many groups action data that receiving action capture device sends according to preset sending frequency.The concrete quantity of many groups action data can be determined by the performance parameter of motion capture equipment.

Correspondingly, determine that the specific implementation of the bone action of pre-rendered skeleton model comprises steps A 1 ~ steps A 2.

Steps A 1: when the default refresh rate of described server is lower than described preset sending frequency, in the described many group action datas received, extracts one group of subject performance data.

Wherein, motion capture equipment just sends to server after gathering action data, and therefore, the frequency acquisition of motion capture equipment can think preset sending frequency.After server receives this many groups action data, need to extract set data in many group action datas, for convenience of description, this group action data extracted is called subject performance data.It should be noted that, action data can represent with three-dimensional coordinate, then comprise multiple sub-action data in set data, sub-action data mark be coordinate on different dimensions.

It should be noted that, in many group action datas, extract the mode of subject performance data, can be random extraction, also can be extract according to preset rules.Many groups action data that motion capture equipment sends has the sequencing on acquisition time, then many group action datas can be sort successively according to the sequencing of acquisition time, correspondingly, preset rules can be in this many groups action data, extracts the set data of certain permanent order.Such as, many group action datas are 5 and sort successively, the sequence of movement that fixed extraction is the 3rd.

In actual applications, the detailed process that this kind extracts subject performance data mode according to preset rules can be, server according to the default refresh rate of self, the action data sent according to transmission frequency from action capture device, extraction subject performance data.Such as, the transmission frequency of motion capture equipment sends set data every 2 microseconds, and server refreshes once every 5 microseconds, and namely generate a frame target video image every 5 microseconds, therefore, server extracts one group of subject performance data every 5 microseconds.

Steps A 2: according to described subject performance data, determines the bone action of pre-rendered skeleton model.

Wherein, subject performance data is one groups in many group action datas, expression be the action of the 3rd objective body, skeleton model is drawn according to the 3rd objective body.Therefore, utilize this subject performance data, the action of skeleton model can be determined.

From above technical scheme, in order to the problem that the refresh rate of the frequency acquisition from server that solve motion capture equipment is different, one group of subject performance data can be extracted in many group action datas, utilize one group of subject performance data to control the bone action of skeleton model.

In embodiments of the present invention, the 3rd objective body can comprise several active nodes, and such as, the 3rd objective body is behaved, and the active node of people can be wrist, elbow joint and shoulder joint etc.Correspondingly, motion capture equipment is provided with several motion capture points, and when motion capture point is multiple, different motion capture points is for obtaining the action data of the different active node of the 3rd objective body.The action data of an active node is called a sub-action data, active node is several, then the action data that motion capture equipment gets can comprise some sub-action datas.In order to distinguish different active nodes, then sub-action data has respective node identification.

That is, optionally, comprise some sub-action datas in described action data, described sub-action data has respective node identification, and described node identification is for representing the active node of the action collection target being gathered action data by described action collecting device.

Described server, according to described zone position information, action data and action coordinate, generates skeleton model, comprising in the described target area of described virtual region:

Described server determines the bone node on the described skeleton model corresponding to described node identification.

Illustrate, skeleton model has bone node, and its bone node arranges according to the active node of the 3rd objective body.The sub-action data received has node identification, therefore, can determine the corresponding bone node of sub-action data.

Such as, the sub-action data received is: sub-action data 1, sub-action data 2, sub-action data 3 and sub-action data 4, the node identification of each sub-action data is respectively left upper extremity elbow joint, right upper extremity elbow joint, left lower extremity knee joint and right lower extremity knee joint, there is in skeleton model the bone node that these joints are corresponding, and then be respectively sub-action data 1, sub-action data 2, sub-action data 3 and sub-action data 4 and determine each self-corresponding bone node.

Described server, according to described sub-action data and corresponding action coordinate, determines the described position of bone node in described skeleton model.

Wherein, a kind of specific implementation form of sub-action data is, the 3D solid coordinate under described 3D coordinate system.Therefore, comprise three-dimensional coordinate in sub-action data, in order to distinguish with the coordinate of following skeleton model, this three-dimensional coordinate can be called 3D solid coordinate.

Such as, the sub-action data 1 in a upper example, the 3D solid coordinate that namely left upper extremity elbow joint is corresponding is (188.9,113.7,88.8); Sub-action data 2, the 3D solid coordinate that namely right upper extremity elbow joint is corresponding is (127.5,54.3,68.9); Sub-action data 3, the 3D solid coordinate that namely left lower extremity knee joint is corresponding is (111.1,158.3,56.9); Sub-action data 4, the 3D solid coordinate that namely right lower extremity knee joint is corresponding is (99.8,155.5,77.7).

Correspondingly, determine that the specific implementation of the position of target bone node in skeleton model can comprise step B1 ~ step B2.

Step B1: according to the proportionate relationship between described action coordinate and Virtual Space coordinate, determines the three-dimensional animation coordinate of described action coordinate under described Virtual Space.

Wherein, this proportionate relationship also can represent the relation of skeleton model size and the 3rd objective body size.

Step B2: the position determined by described three-dimensional animation coordinate, as the position of described target bone node.

Wherein, three-dimensional animation coordinate comprises three coordinate figures, and these three coordinate figures, in skeleton model, can determine a location point, and then this location point that will determine, as the position of bone node corresponding to this three-dimensional animation coordinate.Shown in Figure 4, the exemplary plot of a kind of three-dimensional animation model action that Fig. 4 provides for the embodiment of the present invention, the left upper extremity elbow joint in Fig. 4, right upper extremity elbow joint, left lower extremity knee joint and the kneed position of right lower extremity determine according to the sub-action data 1 in above-mentioned example, sub-action data 2, sub-action data 3 and sub-action data 4 respectively.

Embodiment four

A kind of structure drawing of device synthesizing animating means in video in real time that Fig. 5 provides for the embodiment of the present invention, be applied in the video of Real-time Collection, the fixed area gathering described video comprises at least one video camera, described video gather by the main camera in described at least one video camera; Server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection; Described server uses render engine to set up virtual region according to described fixed area and described 3D coordinate system, and the positional information of described fixed area in described 3D coordinate system and the positional information of described virtual region in described 3D coordinate system have proportionate relationship; Described server uses described render engine to arrange virtual video camera in described virtual region, synchronous described virtual video camera and described main camera, what make the positional information of described virtual video camera and video acquisition parameter keep with described main camera in real time is consistent;

The render engine that described server has is similar to the video card of described server, there is independently processor or can be understood as graphic process unit GPU, can effectively for described server processor or can be understood as central processor CPU and share work for the treatment of.Described server can use described render engine to set up out virtual region according to each location parameter obtained before.Between described virtual region and described fixed area, positional information has proportionate relationship, and the size of the described virtual region such as set up can be consistent with described fixed area etc.

Described device comprises:

Location information acquiring unit 501, for obtaining the zone position information of described target area in described 3D coordinate system according to the target area for the synthesis of animation determined.

Face acquiring unit 502, for obtaining the face data and facial coordinate that are arrived by facial capture device Real-time Collection, described facial coordinate and described 3D coordinate system have corresponding relation.

Can illustrating for Fig. 2 in reference example one, repeat no more here.

Mask generation unit 503, for according to described zone position information, face data and facial coordinate, generates mask in the described target area of described virtual region.

Illustrate, utilize three-dimensional animation drawing tool, draw mask.It should be noted that, mask is drawn according to the second objective body, is used for simulation second objective body.Certainly, when mask is illustrated in the final target video drawn, can with the second objective body equal proportion, or, also can be different proportion.In addition, the outward appearance of mask can be that the second objective body is identical, or, also can be different from the second objective body, as the action utilizing animal cartoon image to represent human body.

Described facial capture device can for being worn on the data acquisition unit on face, and by functions such as shootings, identify and export the change of characteristic point on face, thus obtaining face data and facial coordinate, described facial coordinate can be the coordinate figure of human face characteristic point.FA Facial Animation acquiring unit 504, for obtaining the video data of the described video of described main camera Real-time Collection; Described server keeps synchronous with described main camera by virtual video camera, obtains described mask in described virtual region according to the FA Facial Animation that described action data is formed.

Illustrate, mask generation unit 503 and FA Facial Animation acquiring unit 504 can trigger simultaneously, also can trigger prior to FA Facial Animation acquiring unit 504 by mask generation unit 503.

If mask generation unit 503 triggers prior to FA Facial Animation acquiring unit 504, can be understood as and first complete face seizure, and set up mask by the parameter caught.By the time start when described fixed area on-the-spot broadcasting, then gather described FA Facial Animation according to synchronous virtual video camera.But need the consistency of main time axle in the process gathered, the correlated characteristic of time shaft belongs to mature technology, just repeats no more here.

Real-time synthesis unit 505, for the animation data of described FA Facial Animation is extracted from described render engine, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data in real time obtains synthetic video data and simultaneously real-time output, to make when playing described synthetic video data, the described skeleton cartoon of position synthesis of corresponding described target area in the synthetic video of displaying.

It should be noted that because the foundation of mask and the collection of FA Facial Animation are all completed by described render engine substantially, effectively reduce the processing pressure of described processor-server.Before carrying out animation compound, the animation data of the FA Facial Animation that described render engine process obtains extracts by described real-time synthesis unit 505 from described render engine, under the process of the processor of described server self, the animation data synthesizing described FA Facial Animation according to described 3D coordinate system in described video data is in real time obtained synthetic video data.Concrete, in the process of generated data, need the dimension scale etc. considering described FA Facial Animation and described video.

Embodiment five

On the basis of embodiment corresponding to Fig. 5, a kind of structure drawing of device synthesizing animating means in video in real time that Fig. 6 provides for the embodiment of the present invention, also comprises:

Action acquiring unit 601, for before the real-time synthesis unit 505 of triggering, obtain the action data and action coordinate that are arrived by motion capture equipment Real-time Collection, described action coordinate and described 3D coordinate system have corresponding relation.

Skeleton model generation unit 602, for according to described zone position information, action data and action coordinate, generates skeleton model in the described target area of described virtual region.

Skeleton cartoon acquiring unit 603, for keeping synchronous with described main camera by virtual video camera, obtains described skeleton model in described virtual region according to the skeleton cartoon that described action data is formed.

Described real-time synthesis unit 505 is also for extracting the animation data of the animation data of described skeleton cartoon and described FA Facial Animation from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data and simultaneously real-time output in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video of displaying.

As can be seen from the above-described embodiment, can also on the basis of mask, the action data that the render engine of described server arrives according to facial capture device Real-time Collection and action coordinate, set up skeleton model, by the skeleton cartoon that described virtual video camera keeps the mode synchronous with described main camera to obtain described skeleton model being formed according to described action data, described server extracts the animation data of described skeleton cartoon and the animation data of described FA Facial Animation from described render engine, according to described 3D coordinate system, in described video data, the synthesis animation data of described skeleton cartoon and the animation data of described FA Facial Animation obtain synthetic video data in real time, to make when playing described synthetic video data, position synthesis described skeleton cartoon and the described FA Facial Animation of corresponding described target area in the synthetic video shown.By the cooperation of render engine its own processor and described processor-server, and virtual video camera and main camera is synchronous, thus realize the real-time function of synthesizing skeleton cartoon and FA Facial Animation in video, meet existing video visual further and pass on demand.

One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can have been come by the hardware that program command is relevant, foregoing routine can be stored in a computer read/write memory medium, this program, when performing, performs the step comprising said method embodiment; And aforesaid storage medium can be at least one in following medium: read-only memory (English: read-onlymemory, abbreviation: ROM), RAM, magnetic disc or CD etc. various can be program code stored medium.

It should be noted that, each embodiment in this specification all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for equipment and system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.Equipment and system embodiment described above is only schematic, the unit wherein illustrated as separating component or can may not be and physically separates, parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.

The above; be only the present invention's preferably embodiment, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of claim.

Claims

1. synthesize a method for animation in video in real time, it is characterized in that, be applied in the video of Real-time Collection, the fixed area gathering described video comprises at least one video camera, described video gather by the main camera in described at least one video camera; Server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection; Described server uses render engine to set up virtual region according to described fixed area and described 3D coordinate system, and the positional information of described fixed area in described 3D coordinate system and the positional information of described virtual region in described 3D coordinate system have proportionate relationship; Described server uses described render engine to arrange virtual video camera in described virtual region, synchronous described virtual video camera and described main camera, what make the positional information of described virtual video camera and video acquisition parameter keep with described main camera in real time is consistent; Described method comprises:

2. method according to claim 1, is characterized in that, before the animation data of described skeleton cartoon extracts by described server from described render engine, also comprises:

3. method according to claim 2, it is characterized in that, some sub-action datas are comprised in described action data, described sub-action data has respective node identification, and described node identification is for representing the active node of the action collection target being gathered action data by described action collecting device; Described server, according to described zone position information, action data and action coordinate, generates skeleton model, comprising in the described target area of described virtual region:

4. method according to claim 1 and 2, it is characterized in that, described server sets up the 3D coordinate system of described fixed area, and the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection, comprising:

5. method according to claim 4, is characterized in that, described collecting device comprises infrared pick-up head, and described reflector comprises infrared reflective device.

6. synthesize a device for animation in video in real time, it is characterized in that, be applied in the video of Real-time Collection, the fixed area gathering described video comprises at least one video camera, described video gather by the main camera in described at least one video camera; Server sets up the 3D coordinate system of described fixed area, the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection; Described server uses render engine to set up virtual region according to described fixed area and described 3D coordinate system, and the positional information of described fixed area in described 3D coordinate system and the positional information of described virtual region in described 3D coordinate system have proportionate relationship; Described server uses described render engine to arrange virtual video camera in described virtual region, synchronous described virtual video camera and described main camera, what make the positional information of described virtual video camera and video acquisition parameter keep with described main camera in real time is consistent; Described device comprises:

7. device according to claim 6, is characterized in that, also comprises:

8. device according to claim 7, it is characterized in that, some sub-action datas are comprised in described action data, described sub-action data has respective node identification, and described node identification is for representing the active node of the action collection target being gathered action data by described action collecting device; Described skeleton model generation unit, comprising:

9. the device according to claim 6 or 7, it is characterized in that, described server sets up the 3D coordinate system of described fixed area, and the positional information of at least one video camera in described 3D coordinate system and the video acquisition parameter of described main camera described in described server Real-time Collection, comprising:

10. device according to claim 9, is characterized in that, described collecting device comprises infrared pick-up head, and described reflector comprises infrared reflective device.