CN104933392A

CN104933392A - Probabilistic people tracking using multi-view integration

Info

Publication number: CN104933392A
Application number: CN201410275633.3A
Authority: CN
Inventors: S.梅达萨尼; Y.奥维可科; K.金
Original assignee: GM Global Technology Operations LLC
Current assignee: GM Global Technology Operations LLC
Priority date: 2014-03-19
Filing date: 2014-06-19
Publication date: 2015-09-23

Abstract

The invention relates to probabilistic people tracking using multi-view integration. A method for making the probability graphical representation of the position of an object in a working space comprises the step of acquiring a plurality of 2D images of the working space, the 2D images being captured by cameras arranged at different positions of the working space. The foreground parts of at least two of the 2D images are identified and each foreground part is projected to each of planes spaced in parallel. One area is identified in each of the planes and the plurality of projected foreground parts coincide in the area. The identification areas are combined to form a 3D boundary envelop of an object. The 3D boundary envelop is the probability graphical representation of the position of the object in the working space.

Description

The probability people of multi-view integration is used to follow the trail of

Technical field

The present invention relates generally to the visual surveillance system for tracker.

Background technology

Factory automation is used under many assembling environment.In order to realize manufacture process more flexibly, needs can allow robot and people naturally and effectively cooperate, to perform the system of the task that need not repeat.Man-robot interacts needs new cognition machint level, and it surmounts the control of the common recording/playback formula that wherein all parts start in known location.By this way, robot control system it must be understood that position and the behavior of people, and subsequently must based on the behavior of the action adjustment robot of people.

Summary of the invention

A kind of people's surveillance, comprises multiple camera and vision processor.Multiple camera is arranged around workspace areas, and wherein each camera is configured to capturing video feeding, and this video feed comprises multiple picture frame, and multiple picture frame is time synchronized between corresponding camera.

Vision processor is configured to receive multiple picture frame from the imaging device of multiple view-based access control model and detect the existence of people from using the pattern match that performs over an input image from least one multiple picture frame.Input picture for pattern match is the sliding window part of picture frame, and it is aimed at calibration coordinate system, thus the vertical axis in workspace areas is aimed at the vertical axis of input picture.

If detect that people is near automatic mobile apparatus, then system can provide warning and/or change the behavior of automatic mobile apparatus.In one structure, system/system processor can be configured to the probability diagram constructing the object/person being arranged in work space.

In work space, the illustrated method of the probability of constructed object position can comprise the multiple 2D images obtaining work space, and each corresponding 2D image obtains from the camera being arranged in diverse location in work space.Identify prospect part at least two in multiple 2D image, and each prospect part is projected to each in the plane that multiple parallel interval opens.Identify a region in each in multiple plane, multiple projection prospect partially overlaps in this region.These identified regions are combined the 3D border envelope forming object.

In one structure, if the overlapping predetermined space of border envelope, then system can perform control action.Control action such as can comprise the behavior of the contiguous robot of amendment, the performance of adjustment automation, or is given the alarm by sound or illumination.

In addition, system can determine the main body axis of the prospect part that each identifies.Main body axis is the center line average of corresponding prospect part and aims at the end point of image.Once be determined, then each detected main body axis can be mapped to floor by system, and described floor overlaps with the ground of work space.Observe in the position of the main body axis of various mapping, system can represent the location point of the position of object definitely in facial plane.If line is non-intersect is single position, then the least square function that location point can be selected as in the main body axis that each is mapped minimizes.

In one structure, processor can use border envelope to verify the location point determined.Such as, if location point is in the envelope of border, then system can record the coordinate of this location point.

System can be further configured to combined motion curve, and described movement locus represents the position of location point in certain time.In this movement locus, system can the recognizing site point portion of time section of moving in work space further, and the portion of time section that recognizing site point is static in work space.During this portion of time section that location point is static, system can be configured to the action determining that object performs.

In another structure, floor and multiple plane can be merged to form plane probability graph by system.In addition, system can determine the main axis of border envelope, the vertical axis of described main axis representative/object.The main axis of border envelope is selected as crossing with floor and limits second place point.Once determine, then second place point can merge with the location point determined via the body axis mapped, to form the location point of refinement.

In order to form the object prototypes of refinement, border envelope can illustrate with the voxel of work space further or three-dimensional depth illustrates and merges.System such as can monitor at least one in the speed of a part for the object prototype of refinement and acceleration, and can at least one changes the behavior of aut.eq. based on described in speed and acceleration.

According to an aspect of the present invention, provide a kind of illustrated method of probability of the object space be configured in work space, the method comprises:

Obtain multiple 2D images of work space, each corresponding 2D image obtains from the camera being arranged in diverse location in work space;

Identify the prospect part at least two in multiple 2D image;

By prospect part from each corresponding views project to the plane that multiple parallel interval opens each;

Identify the region in each in multiple plane, the prospect part of multiple projection is overlapping in this region

Identified region from each in multiple plane is combined, to form the 3D border envelope of object; With

Wherein border envelope is the 3D probability diagram of object space in work space.

Preferably, if described method comprises border envelope and predetermined space overlap further, then control action is performed.

Preferably, described method comprises the main body axis determining the prospect part that each identifies further, and this main body axis is the center line average of corresponding prospect part and aims at the end point of image;

The main body axis that each is detected is mapped to floor, and described floor overlaps with the ground of work space;

Location point definitely in facial plane, wherein location point each is mapped main body axis in least square function minimize; With

Wherein location point represents the some position of object in work space.

Preferably, if described method comprises location point further in the envelope of border, the coordinate of record position point.

Preferably, described method comprises combined motion curve further, and wherein movement locus representative is relative to the position of the location point of a period of time; With

The portion of time section that recognizing site point moves in work space, and the portion of time section that recognizing site point is static in work space.

Preferably, described method determines the action that object performs during being included in this static in work space part-time section of location point further.

Preferably, described method comprises further and floor and multiple plane being merged, to form plane probability graph.

Preferably, described method comprises further:

Determine the main axis of border envelope, wherein the main axis of border envelope is crossing with floor, to limit second place point; With

Location point through determining in floor and second place point are merged, to form location point perfect further.

Preferably, described method comprises the voxel of border envelope and work space to illustrate further and merges, to form object prototypes perfect further.

Preferably, described method comprises at least one in the speed of the part determining object prototypes perfect further and acceleration further.

Preferably, described method comprises the behavior changing aut.eq. based at least one in speed and acceleration further.

Preferably, the plane that wherein multiple parallel interval is opened comprises at least three planes; And

One wherein at least three planes comprises ground level.

According to a further aspect in the invention, a kind of system is provided, comprises:

Multiple camera, is arranged in the diverse location place in work space, and each is configured to from different view work spaces, and wherein, each the corresponding camera in multiple camera is configured to the 2D image of catching work space;

Processor, with each communication in multiple camera, and be configured to receive the 2D image of catching from each multiple camera, processor is further configured to:

Be identified in the prospect part at least two of multiple 2D image;

Each plane that prospect part is opened from each corresponding view projections to multiple parallel interval;

Identify the partly overlapping region of prospect of the multiple projections in each in multiple plane;

Combine the identified region from each in multiple plane, to form the 3D border envelope of object; With

Wherein border envelope is the 3D probability diagram of the position of object in work space.

Preferably, wherein processor is further configured to:

Determine the main body axis of the prospect part that each identifies, this main body axis is the average centerline of corresponding prospect part, and aims at picture drop-out point;

Each main body axis detected is mapped to the ground level overlapped with the ground of work space;

Location point definitely in plane, wherein, this location point minimizes the least square function in each main body axis mapped; With

The point position of the object wherein in this location point diagram work space.

Preferably, if wherein processor is further configured to location point in the envelope of border, then the coordinate of this location point is recorded.

Preferably, wherein processor is further configured to:

Combined motion curve, wherein movement locus illustrates the position of the location point on a period of time; With

The portion of time section that recognizing site point moves in perform region, and the portion of time section that location point is static in perform region.

Preferably, wherein processor be further configured to by object during the portion of time section that location point is static in work space, determine the action performed by object.

Preferably, wherein processor is further configured to ground level and multiple plane is merged to form plane probability graph.

Preferably, wherein processor is further configured to:

Determine the main axis of border envelope, wherein, the main axis of border envelope is crossing with ground level, to limit second place point; With

The location point determined in ground level and second place point are merged, to form location point perfect further.

When carrying out by reference to the accompanying drawings, the features and advantages of the present invention and other feature and advantage are by easily understanding the detailed description that enforcement better model of the present invention is made below.

Accompanying drawing explanation

Fig. 1 is the schematic block diagram of people's surveillance.

Fig. 2 is the schematic diagram of the multiple imaging devices about workspace areas location.

Fig. 3 is the schematic block diagram of activity monitoring process.

Fig. 4 is the schematic process process flow diagram using the multiple imaging devices about workspace areas location to detect the motion of people.

Fig. 5 A is the indicative icon of picture frame, and it comprises the sliding window input of Model Matching algorithm, the picture frame that the input of described sliding window is crossed in image coordinate space.

Fig. 5 B is the indicative icon of picture frame, and it comprises the sliding window input of Model Matching algorithm, the picture frame that the input of described sliding window is crossed in calibration coordinate space.

Fig. 5 C is the indicative icon of the picture frame of Fig. 5 B, and wherein sliding window input is selected from concrete area-of-interest.

Fig. 6 is showing the schematic diagram multiple diagrams of detected people being fused to the mode in common coordinate frame, multiple diagrams of described detected people each from different camera.

Fig. 7 is the high-level schematic process flow diagram of the method that end user's surveillance executed activity sequence monitors.

Fig. 8 is the schematic detail flowchart of the method that the executed activity sequence of end user's surveillance monitors.

Fig. 9 is the schematic diagram of the people's surveillance used in multiple workspace areas.

Figure 10 is the schematic diagram of the three-dimensional localization using multiple sensor view.

Embodiment

See accompanying drawing, wherein identical in various accompanying drawing Reference numeral is for representing identical parts, and Fig. 1 schematically shows the block scheme of people's surveillance 10, and it is for monitoring assembling, manufacturing or the workspace areas of similar procedure.People's surveillance 10 comprises the imaging device 12 of multiple view-based access control model, for catching the visual pattern of assigned work area of space.The imaging device 12 (as shown in Figure 2) of multiple view-based access control model is positioned at various position around automatic mobile apparatus and At The Height.Preferably, wide-angle lens or similar wide visual field device are used for visually covering more workspace areas.The imaging device of each view-based access control model leaves substantially each other partially, for the image of catching workspace areas from corresponding observation point, and different substantially from other corresponding imaging device of described observation point.This permission catches various video streaming image from the different observation point around workspace areas, distinguishes for by people and surrounding devices.Because the vision of object in workspace areas and equipment hinders (namely covering), when in workspace areas, existence is covered, multiple observation point improves the possibility of catching people in one or more pictures.

As shown in Figure 2, the imaging device 14 of the first view-based access control model and the imaging device 16 of the second view-based access control model are spaced apart from each other at elevated position substantially, thus each catches high angle view.Imaging device 14 and 16 provides high angle specification view (canonical view) or reference-view.Preferably, imaging device 14 and 16 provides three-dimensional scene analysis based on solid and tracking.Imaging device 14 and 16 can comprise visual imaging, LIDAR detects, infrared detection and/or may be used for the imaging of any other type of the entity in surveyed area.Additional image device can be located overheadly, and spaced apart with the imaging device 14 and 16 of the first and second view-based access control model, for obtaining extra high mantel view.For convenience of description, imaging device 14 and 16 can usually be called " camera ", it should be understood that, such camera needs not to be visible spectrum camera, unless otherwise described.

The imaging device 17 (" camera ") of other view-based access control model various navigates to sidepiece or the virtual corner of monitored workspace areas, for catching the view of middle angle and/or low-angle view.Should be understood that because system can work together with any amount of imaging device, the therefore quantity restructural of the imaging device of view-based access control model, thus can use than the more or less imaging device shown in Fig. 2; But it is pointed out that the increase along with redundancy imaging device quantity, the level of integrality and redundant reliability improves.The imaging device 12 of each view-based access control model is spaced apart from each other, and catches image for observation point basically different from each other, follows the trail of with the three-dimensional producing one or more people in workspace areas.The various views of being caught by the imaging device 12 of multiple view-based access control model provide the alternative views of workspace areas jointly, its make people's surveillance 10 can identify in workspace areas everyone.These different observation point provide in three dimensions in whole workspace areas to everyone tracking chance and strengthen when everyone moves through workspace areas to everyone tracking and location, to detect the potential less desirable interaction in workspace areas between each corresponding human and automatic equipment of motion.

Refer again to Fig. 1, the image of being caught by the imaging device 12 of multiple view-based access control model is delivered to processing unit 18 via communication media 20.Communication media 20 can be communication bus, Ethernet or other communication links (comprising wireless).

The host computer that processing unit 18 is preferably realized by commercial component (not different from personal computer) or by suitably encapsulate for its operating environment similar device its.Processing unit 18 may further include image-taking system (comprise figure frame possibly and grab frame device and/or network image acquisition software), and it is for catching image stream, is time synchronized data for process and record image stream.Use the agreement (such as Ethernet-security protocol (Ethernet-Safe)) guaranteeing message integrity, multiple processing unit can interconnect over data networks.Show that the data of adjoining the state in space by other processing units are monitored can exchange in a reliable fashion, this reliable mode comprises for the warning of the people (striding across the section of multiple system) move from region to region, object, signal and the transmission of tracking status data.Processing unit 18 utilizes master processor program and multiple sub-handling procedure (namely for a sub-handling procedure of the imaging device of each view-based access control model).Each corresponding sub-handling procedure is exclusively used in corresponding imaging device, is used for processing the image of being caught by corresponding imaging device.Master processor program to be integrated based on image (by each sub-routine processes) the execution multi views of catching of accumulation, to perform the real time monitoring of workspace areas.

In FIG, in workspace areas, the detection of workman promotes by using the sub-handling procedure of multiple database 22, the people of described database 22 under other mobile apparatus exist situation in common detection and Identification workspace areas.The data of the people that multiple database purchase is used in workspace areas inspected object, be identified from detected object identification people and tracking.Various database includes but not limited to calibration data base 24, background database 25, taxonomy database 26, end point database 27, trace data storehouse 28 and singly answers database (homography database) 30.The data quilt handling procedure be included in these databases uses, to detect, to identify and to follow the trail of the people in workspace areas.

Calibration data base 24 provides camera calibration parameter (intrinsic with external) based on pattern, goes distortion to make the object of distortion.In one structure, the mode of rule determination calibration parameter of such as chessboard can be used, this mode of rule is shown as orthogonal with the visual field of camera.Calibration procedure uses chessboard to estimate intrinsic and non-distortion parameter subsequently, and described parameter may be used for making the barrel distortion caused due to wide-angle lens go distortion.

Background database 25 stores the background module being used for different views, and background module is used for image being divided into its background formed and foreground area.Background module can obtain by catching image/video before installing any robot apparatus or any dynamic object is placed in work space.

Taxonomy database 26 contains the sorter of the cascade (cascade) for automatically people and non-human object being classified and relevant parameter.

End point database 27 contains the Vanishing Point Information for each camera view, and for carrying out end point correction, thus people is uprightly manifested in correction chart.

Trace data storehouse 28 is kept for the track of each monitored people, and when new people enters this scene, new track is added to database, and when they leave this scene, these tracks are deleted.Trace data storehouse also has the information of the display model for everyone, thus existing track can be easily relevant at the track at different time step place.

Singly answer database 30 containing the homograph parameter across different views and specification view.People advance enter this region time, the proper data from database (one or more) can be delivered to and monitor the system of contiguous zone, thus realizes the seamless transitions to the tracking across the people of multiple system from region to region.

Each above-mentioned database can comprise the parameter as various initialize routine result, and described initialize routine is in the installation of system and/or perform during safeguarding.These parameters such as can be stored as the form being easy to during operation be accessed by processor, such as XML file form.In one structure, during initial setting up/initialize routine, system can perform lens collimates program, such as, by checkerboard image being placed in the visual field of each camera.Use checkerboard image, lens collimates program can be determined to need to remove the correcting value needed for any flake distortion.These correction parameters can be stored in calibration data base 24.

After lens collimates program, system can determine homograph parameter subsequently, and it can be recorded in singly answers in database 30.This program can comprise reference body is placed in work space, thus they can be observed by multiple camera.By being carried out associating (and learning the fixed position of camera or object) in the position of object between various view simultaneously, different two dimensional images can be mapped to 3d space.

In addition, the end point of each camera can by being placed in the diverse location place of work space by multiple vertical reference mark, and be marked in each camera view by analyzing these how illustrate and to determine.The perspective feature of camera can make the diagram of corresponding vertical marker converge to common end point, and this end point can be recorded in end point database 27.

Fig. 3 shows the block scheme of the high level overview of the Plant supervisory treatment scheme comprising dynamic system integrity monitoring.

In picture frame 32, collect data stream from the imaging device 12 of view-based access control model, the view data that described imaging device 12 capture time is synchronous.In picture frame 33, executive system integrity monitoring.VPU is for component failure and will stop surveillance true(-)running and complete its expection condition of object and the integrality of detection system." dynamic integrity monitoring " should will detect the condition of these deterioration or inefficacy; and system triggers a kind of pattern when may not carry out safe mode; and without any less desirable result except the required stop time that places under repair, in the secure mode system integrity can be resumed subsequently and process interact can turn back to normal.

In one structure, datum target can be used for geometric calibration and integrality.Some in datum target can be enabled, such as, flicker IR signal lamp in sensor (one or more) visual field.In one structure, such as, IR signal lamp can glimmer with respective rate.Surveillance can be determined in image that signal lamp detects the expected frequency in fact in fact glimmered with IR signal lamp subsequently and conform to.If do not conformed to, then automatic equipment may not carry out safe mode, and the view of mistake can be out in the cold or inactive, or equipment can be changed to operate in safe mode.

The accidental change of datum target behavior also may make equipment to change with under being operated in Safe Mode Operation.Such as, if datum target is tracked moving target, and it leaves before workspace areas is left in position and disappears to it from expectation at systems axiol-ogy, then can take similar preventive measure.Another example of the accidental change of the datum target of motion is when datum target occurs in primary importance with unaccountable fast velocity (namely the ratio of distance to the time exceedes preset limit), and subsequently when the second place occurs again.In the picture frame 34 of Fig. 3, if VPU determines to there is integrity issue, then system enters fault self-powered safety (fail-to-safe) pattern, and warning is initiated and system is closed in this mode.If VPU determines to there is not integrity issue, start picture frame 35-39 in order.

In one structure, system integrity supervision 33 can comprise the integrality of the imaging device assessing each view-based access control model in a dynamic way quantitatively.Such as, integrity monitoring can analyze each video feed (video feed) continuously, to measure in noisiness in feeding or recognition image uncontinuity in time.In one structure, system can use at least one in absolute pixel differences, the overall situation and/or local histogram's difference and/or absolute edge difference, with the integrality quantitative (namely determining relative " the integrality scoring " from 0.0 (without reliability) to the scope of 1.0 (completely reliable)) to image.The difference mentioned can be determined about arbitrary in pre-established reference frame/image (that such as obtains between initialize routine selecting period) or the frame tightly obtained before measured frame.With pre-established reference frame/image ratio comparatively time, this algorithm can pay close attention to one or more parts (instead of dynamically change prospect part) of image background especially.

In picture frame 35, perform background subtraction, and the image formed is foreground area.Background subtraction makes system can those parts that can move of recognition image.These parts of picture frame are passed to the module of postorder subsequently, for further analysis.

In picture frame 36, the confirmation of executor, for detecting people from catching in image.In this step, the foreground image of identification is processed with the part of the most likely people of detection/recognition prospect.

In picture frame 37, perform foregoing outward appearance coupling and tracking, outward appearance coupling and tracking use its various database from the object identification people detected, and tracking knows others in workspace areas.

In picture frame 38,3D processing is applied to the data of catching, to obtain the 3D range information for the object in workspace areas.3D range information allows us to form the 3D reducing false alarm and takies grid (occupancy grid) and voxelization (voxelizations), and allows us with 3D form tracking objects.3D measures process and can such as use solid overhead camera (such as camera 14,16) to be performed, and the voxel constructing technology of the projection from each angled camera 17 maybe can be used to perform.

In picture frame 39, the track of coupling is provided to multi-view integration and object localization module.Various view can merge by multi-view integration module 39, to form the probability graph of everyone position in work space.In addition, the 3D processing (as shown in Figure 10) from the imaging device of view-based access control model is provided to multi-view integration and object localization module, for determine in workspace areas everyone position, direction, speed, occupation rate and density.For with workspace areas in follow the trail of with the potential interaction of mobile apparatus the people be identified.

Fig. 4 shows end user's surveillance to detect, identify and the process flow diagram flow chart of tracker.In picture frame 40, by master processor program to system initialization, integrate for the multi views performed in monitored workspace areas.Master processor program initialization and start sub-handling procedure.Corresponding sub-handling procedure is set to the data for the treatment of being caught by corresponding imaging device.Each sub-handling procedure parallel running.As herein described, following process picture frame is synchronous by master processor program, to guarantee the image time synchronized each other of catching.Master processor program waited for that before execution multi views is made into each sub-handling procedure completes the process of its respective capture data.Processing time for each corresponding sub-handling procedure is preferably not more than 100-200 millisecond.Executive system integrity checking (see Fig. 3, picture frame 33) is gone back when system initialization.If certainty annuity integrity checking fault, then system gives a warning immediately and enters fault self-powered safe mode, and wherein system is closed until perform corrective action.

Refer again to Fig. 4, in picture frame 41, streamed image data are caught by the imaging device of each view-based access control model.(or being converted to) pixel forms by the data that each imaging device is caught.In picture frame 42, the view data of catching is provided to the frame buffer that wherein image waits process, for detecting object in workspace areas from the automatic equipment of motion, and more specifically detects people.Each captured image is composed with time mark, thus each captured image by synchronous for processing simultaneously.

In picture frame 43, automatic calibration is applied to the image of catching, so that the object in the image of catching is gone distortion.Calibration data base provides based on the calibration parameter for making Distorting objects go the pattern of distortion.The mirror image fault thoroughly caused by wide-angle needs to make image go distortion by application camera calibration.This is needs, because any large distortion of image can make the homography function between the view of image device and outward appearance module inaccurate.Imaging calibration is disposable processing procedure; But, need when imaging device arranges change to recalibrate.Image calibration is also periodically checked by dynamic integrity monitoring subsystem, to detect the situation that imaging device moves slightly from its calibration visual field.

In picture frame 44 and 45, backgrounds simulation and foreground detection are started respectively.Background training is used for background image not open from prospect image area.Result is stored in background database, for being used by each sub-handling procedure, to distinguish background and prospect.The image of all distortions through background filtering to obtain foreground pixel in digitized image.In order to distinguish background in the image of catching, the image of empty work space viewing area should be used to train context parameter, thus background pixel can be easy to distinguish when there is moving object.Background data should upgrade in time.In the image of catching when detection and tracker, from imaging data filter background pixel, to detect foreground pixel.The foreground pixel detected is converted to block (blob) by using the coupling part of noise filtering and resource block size filtration to analyze (connected component analysis).

In picture frame 46, component analysis is activated.In relevant work area of space, not only can detect the people of motion, and the object of other motions can be detected, such as robots arm, trailer or casing.Therefore, component analysis relates to and detects all foreground pixels and determine which foreground image (such as block) is people and which is inhuman moving object.

Block can be defined as the region (such as contacting pixel) connecting pixel.Component analysis relates to identification and the analysis of the respective regions of pixel in the image of catching.Image will distinguish pixel according to value.Pixel is identified as prospect or background subsequently.The pixel with nonzero value is considered to prospect and the pixel with null value is considered to background.Component analysis considers various factors usually, and described factor can include but not limited to block locations, block area, patch periphery (such as edge), patch shape, block diameter, length or width and orientation.Technology for image or data sectional is not limited to 2D image, but can also utilize the output data from other types sensor (it provides IR image and/or 3D volume data).

In picture frame 47, a part of executor as component analysis detects/checking, to filter out inhuman block from people's block.In one structure, this checking can use territory sorting technique of trooping (swarmingdomain classifier technique) to perform.

In another structure, system can using forestland matching algorithm, such as support vector machine (SVM) or neural network, so that prospect block and housebroken people's attitude mould shape are carried out pattern match.Not attempt as the whole image of single bulk treatment, system can use the sliding window 62 of local to carry out scintigram picture frame 60 on the contrary, such as, roughly shown in Fig. 5 A.This can reduce the complexity of process and improve the robustness and specifics that detect.Sliding window 62 can subsequently for the input accomplishing SVM of confirmation object.

The module that executor detects can use with different attitude (namely stand, bend over, go down on one's knees) location and image towards the different people of different directions is trained.When training pattern, representational image can be arranged so that people aims at the vertical axis of image substantially.But, as shown in Figure 5A, the body axis of the people 64 be imaged can according to the perspective of image and end point angled, it needs not to be vertical.If the input of detection model is the window aimed at image coordinate system, then angled illustrated people adversely can affect the accuracy of detection.

In order to consider the crooked essence of the people in image, sliding window 62 can obtain from correction space instead of obtain from image coordinate space.Skeleton view can be mapped to the orthogonal view with ground registration by correction space.In other words, the perpendicular line in workspace areas can be mapped in adjustment image and vertically aim at by correction space.This is schematically illustrated in Fig. 5 B, wherein correcting window 66 scintigram picture frame 60, and angled people 64 can be mapped to the vertically aligned diagram 68 be arranged in orthogonal intersection space 70.When using SVM to analyze, this vertically aligned diagram 68 can provide for the detection of higher confidence level subsequently.In one structure, the sliding window 66 of correction can be promoted by correlation matrix, and described matrix such as can map between polar coordinate system and rectangular co-ordinate.

Although system can use above-mentioned sliding window search strategy to perform exhaustive search on whole picture frame in a structure, this strategy may relate to the image-region searching for complete no one.Therefore, at another structure, search volume only can be restricted to interested specific region 72 (ROI) by system, such as, shown in Fig. 5 C.In one structure, ROI72 can add coboundary tolerance limit, to count the people at the limiting edge place standing in ground space by observable ground space in representative graph picture frame 60.

In another structure, by making the preferential of the portion of the ROI72 expecting finder's block search, even can reduce computation requirement further.In this configuration, system can use clue to limit search based on the side information that can be used for image processor or to make search preferential.This side information can be included in motion in picture frame detect, from the trace information knowing others block before, and the data fusion of other cameras from multi-cam array.Such as, after the ground frame merged confirms the position of people, tracing algorithm forms the track of people and keeps track history in subsequent frames.If Environment Obstacles makes people locate failure in one case, then the track of people position by following the trail of before calculates, system can the location of quick-recovery people soon, will to correct in search set in ROI72.If can not again identify this block in several frame, then system can reporting objectives people disappear.

Refer again to Fig. 4, once people's block be detected in various view, then in picture frame 48, perform body axis for each detected people's block and estimate.The end point (obtaining from end point database) in image is used to determine the main body axis of everyone block.In one structure, body axis can be limited by two focus.First is the centroid point of the people's block be identified, and second point (i.e. end point) is respective point near body bottom portion (namely might not be bottom block, and may in the outside of block).More specifically, body axis is dummy line centroid point being connected to end point.In each corresponding camera view, determine corresponding vertical body axis for everyone block, as shown in roughly 80,82 and 84 places of Fig. 6.Usually, this line by the image of crosscut people on the line of toe from top to bottom.People detects the determination that scoring may be used for auxiliary corresponding body axis.Scoring provides matches to people and corresponding body axis should by the level of confidence used.Each vertical body axis will be used to the location determining people via homography, and will describe in detail later.

Refer again to Fig. 4, in picture frame 49, perform color configuration.Color appearance model is set up, for the same people of each views registered.Color configuration distinguishes and keeps the homogeneity of corresponding human in each image of catching.In one structure, color configuration is the vector of the average color of the body axis with block border frame.

In picture frame 50 and 51, homography and multi views integrated process are performed, and to coordinate each view respectively, and the position of people are mapped to common plane.Single should (as used herein) be mathematical concept, and wherein object is mapped to line or plane from a coordinate system by reversible transformation.

Homography module 50 can comprise at least one in body axis submodule and collaborative submodule.Usually, body axis submodule can use single should, with by detect/body axis that calculates is mapped to common plane from overhead angle views.In one structure, this plane is floor, and it overlaps with the ground of work space.This mapping illustrates via the floor at 86 places in Fig. 6.Once be mapped to jointly facial plane, then each body axis can single location point 87 place in floor or near intersect.When body axis undesirably intersects, system can use the best-fit approximate value of Minimum Mean Square Error or least square median method recognizing site point 87.This location point can represent to be estimated the one of the floor position of people in work space.In another embodiment, location point 87 can be determined by weighted least-squares method, and wherein each line can use integrality to mark and by weighting individually, and the scoring of described integrality is for for determining that the frame/view of straight line is determined.

Collaborative submodule can be similar to the operation of body axis submodule, because it uses list content should be mapped to each plane of observing from overhead perspective view from different image views.But replace mapping single line (i.e. body axis), whole detected prospect block is mapped to described plane by collaborative submodule on the contrary.More specifically, collaborative submodule use is single should to be mapped to collaborative Figure 88 by prospect block.This collaborative Figure 88 is all parallel multiple planes, and each plane is in differing heights relative to the ground of work space.Detection block from each view can use single should being mapped in each respective planes.Such as, in a structure, collaborative Figure 88 can comprise floor, mid-plane and head plane.In other structures, more or less plane can be used.

Multiple block can be there is at the prospect block from each corresponding views in the mapping process of common plane and map overlapping region.In other words, when the pixel of the observed block in a view is mapped to a plane, each pixel of original view has corresponding pixel in this plane.When multiple view is projected onto this plane entirely, they may intersect in a region, thus can be mapped to multiple original view from the pixel in intersecting area in plane.Overlapping region in one plane reflects the high probability existed in this position and At The Height people.In the mode similar to body axis submodule, integrality scoring may be used for being weighted from each view to the projection of collaborative Figure 88 block.Thus the transparency of original image may affect the concrete border in high probability region.

Once be mapped to respective planes from the block of each view, then high probability region can separated and region along common vertical axis can be grouped together.By the region disconnecting by these high probabilities at differing heights place, system can construct the border envelope of the detected person form of encapsulating.The position of this border envelope, speed and/or acceleration can subsequently for changing the behavior of contiguous automatic equipment, such as mounter people, if or such as people enter into or reach limited protection zone, warning is provided.Such as, if border envelope and the limited bulk space overlap of specifying or with this limited bulk space of infringement, then system can change the performance (such as can slow down or stop robot) of aut.eq. in limited bulk space.In addition, system can expect the motion of object by monitoring the speed of object and/or acceleration, and can change the behavior of aut.eq. when colliding or interact and be expected.

Except only identifying border envelope, this envelope entirety (and/or entirety of each plane) can be mapped to floor downwards, to determine the possible ground region be occupied.In one structure, this ground region be occupied may be used for being verified the location point 87 that body axis submodule is determined.Such as, if it is positioned at high probability be occupied ground region (as determined by collaborative submodule), then location point 87 can be verified.On the contrary, as fruit dot 87 is positioned at beyond this region, then system can identification error or refusal location point 87.

In another structure, main axis can be drawn through border envelope, thus axis is vertical (being namely substantially perpendicular to floor) substantially in work space.Main axis can the mean place in the envelope of border be drawn, and can be crossing with floor at second place point.This second place point can merge with the location point 87 determined via body axis submodule.

In one structure, multi views integrates 51 can by the information fusion of number of different types together, to improve the probability accurately detected.Such as, as shown in Figure 6, the information in the information in floor Figure 86 and collaborative Figure 88 can merge, to form probability Figure 92 of merging.In order to further refinement probability Figure 92, the 3D stereographic map of work space or the voxel constructed diagram 94 can be fused in probability estimate by system 10 in addition.In this configuration, 3D stereographic map can use Scale invariant features transform (SIFT) first to obtain its corresponding relation of characteristic sum.System can perform polar curve to the stereographic map pairing based on the intrinsic parameter of known camera and feature corresponding relation subsequently and correct.Parallax (depth of field) figure can use block matching method (such as providing in OpenCV) to obtain in real time subsequently.

Similarly, voxel diagram uses the image outline (imagesilhouettes) obtained from background subtraction to produce depth of field diagram.3D voxel projects to by system (by multiple camera of using) all planes of delineation, and whether determine to project overlapping with the profile (foreground pixel) in most of image.Because some image may be blocked due to robot or shop equipment, therefore system can use Voting Scheme, and it does not directly require the overlying protocol (overlappingagreement) from all images.3D stereographic map and voxel provide how to occupy the relevant information of 3d space with object, and this information may be used for strengthening probability Figure 92.

Can realize in several different ways by various types of data fusion is developed probability Figure 92 together.Be the most simply " simple weighted average integrates (simple weighted mean integration) " method, weighting coefficient is applied to each data type (i.e. body axis projection, collaborative Figure 88,3D stereoscopic depth projection and/or voxel diagram) by it.And body axis projects the Gaussian distribution that may further include about each body axis, and wherein each Gaussian distribution represents the distribution of block of pixel about corresponding body axis.When being projected to floor, these distributions can be overlapping, this can contribute to determining location point 87 or this can be combined with cooperative figure.

3D stereographic map and/or voxel is used to illustrate depth map, with pre-filtering image together with the second method carrying out merging can project with prospect block.Once pre-filtering, then system can perform many planar body axis-line analysis in those filtered regions, to provide the extraction of the more high confidence level of body axis in each view.

Refer again to Fig. 4, in picture frame 52, one or more movement locus can answer information and color profile combination based on the multi views list determined.These movement locus can illustrate the ordered movement of detected person in work space.In one structure, Kalman filter is used to come movement locus filtering.In Kalman filter, state variable is the position on ground and the speed of people.

In picture frame 53, system can determine that the expectation whether user trajectory mates for specific procedure maybe can accept track.In addition, system also can attempt the intention that " expection " people continues to advance in a certain direction.This intent information can be used on other modules, to calculate closing rate (the closing rate of time of time between people and surveyed area and distance, this is even more important for improvement has the region detection latency in detection of dynamic region, the motion of equipment is followed in described detection of dynamic region, such as robot, conveyer, fork truck and other movable equipments).This is also a kind of important information, it can expect the motion entering the people adjoining monitor area, the data of people can be passed to this and adjoin in monitor area, and adjoin in monitor area at this, receiving system can prepare to note mechanism, to obtain fast at this by the tracking of each individual in the monitor area that enters.

If the activity through determining of people is not verified or can outside reception process, if or people be expected and leave predetermined " safety zone ", then in picture frame 54, system can sound a warning, and warning is conveyed to user.Such as, warning can people walk through workspace areas predetermined safety zone, warning region and hazardous location time display on the display apparatus.Warning region and (and desired configuration any other region in systems in which, hazardous location, comprise dynamic area) be such operating area: people entered respective regions and equipment is slowed down, stop or otherwise avoiding people time warning is provided, this warning as being activated in picture frame 54.Warning region is that first people is warned people and has entered a region and enough near mobile apparatus and region that equipment may be caused to stop.Hazardous location is the position (such as envelope) of design in warning region.When people is in hazardous location, more dangerous warning can be issued, thus people learns that its position is in hazardous location or requestedly leave this hazardous location.These warnings are placed through and prevent troublesome equipment from shutting down and improve the throughput rate of disposal system, and wherein said equipment shutdown is because not knowing that its people close to warning zone accidentally enters into warning zone and causes.These warnings also can such as eliminated by system from the expectation of the conventional load or unload parts of this process interim that interacts.Also possibly, temporarily static people is detected on the path of the dynamic area of moving along his direction.

Except in respective regions time people give a warning, the motion (such as equipment can be stopped, accelerates or slow down) of close automatic equipment can be revised according to the prediction line inbound path of people in workspace areas (or possible dynamic area) or change to warning.Namely the motion of automatic equipment runs under the program of setting, and the program of this setting has the predetermined motion under predetermined speed.By the motion of people in tracking and prediction work area of space, the motion of automatic equipment can be changed (be namely decelerated or accelerate), to avoid may contacting with any of the people in workspace areas.This allows equipment to keep running, and need not close assembling/manufacture process.Current fail-safe operation by the results management based on the task of risk assessment, and needs factory's automatic equipment to stop completely when people usually being detected in hazardous location.Start-up course needs the operator of equipment to reset control to reset assembling/manufacture process.In this process, this unexpected stopping causes the loss of shutdown and throughput rate usually.

Activity command monitors

In one structure, said system may be used for monitoring the sequence of operations performed by user, and whether the process that checking is monitored is executed correctly.Except only analyzing video feed, system can monitor use and the selection of time of the utility appliance such as such as torque gun, nut wrench or screw driver further.

Fig. 7 illustrates in general the method 100 using the sequential monitoring of said system executed activity.As directed, the video of input is processed 102, and to produce inner diagram 104, it catches different types of information, such as scene motion, activity etc.Described diagram is used at 106 place's Study strategies and methods, and sorter produces action mark and action similarity score.At 108 places, this information is organized in together and is converted into semantic description, its subsequently at 110 places compared with known collapsible form, with produce mistake prevention scoring.Semantic and video summary is filed, for reference in future.If produce low scoring (it shows that the task process of order and the expectation be performed is dissimilar) with mating of template, then provide warning 112.

This process may be used for by determine some action when and where be performed and its execution sequence with the activity of verification operation person.Such as, if system identification operator hand extend in the case of concrete location, walks towards the bight of vehicle on assembly line, go down on one's knees and actuate nut runner, then system can determine to there is the high probability that wheel is fixed to vehicle by operator.But, if this order is only fixed with three wheels and terminates, then can indicate/warn this process not complete, because need the 4th wheel.In a similar manner, action can be mated with vehicle inventory by system, required all mounted for the hardware option of vehicle to guarantee.Such as, if systems axiol-ogy reaches for the deckle board with incorrect color to operator, then system can warn user to investigate this parts before continuation action.By this way, people's surveillance can be used as mistake proactive tool, to guarantee to perform required action during assembling process.

System can have enough dirigibilities, to adapt to the multitude of different ways performing a series of task, and can verify this process, as long as the people track final in preset vehicle position and effort scale complete intended target.Although efficiency may not be considered to the factor whether a series of actions correctly meets the target for assembly working station, it can by independent record.By this way, actual movement locus can compared with the movement locus optimized with activity log, and to carry out quantitatively to total departure, this may be used for suggestion process efficiency and improves (such as by showing or printing activity reports).

Fig. 8 provides the more detailed block diagram 120 of activity monitoring scheme.As directed, in picture frame 32, collect video data stream from camera.At 33 places, these data stream are by the transmission of system integrity monitoring module, and its checking image is in normal operating state.If video feed is not in normal operating state, then mistake is issued and system cannot enter safe mode.Next step after system integrity monitors is people's detecting device-tracing module 122, and it is described in the diagram overall above.This module 122 obtains each video feed and detects the people moved in this scene.Once the motion block of candidate can obtain, then system can use sorter process and filter out the situation of non-athletic.The final output of this module is 3D people's track.Next step relates to the diagram suitable from 3D people's trajectory extraction at 124 places.This illustrated scheme be supplementary and comprise for activity presentation analog image pixel 126, represent scene motion space-time interest points (STIP) 128, by actor from the track 130 of background separation and the voxel 132 integrating information multiple view.In these illustrated scheme, each is hereafter being described in detail.

Extract once sentence described supplementary form 104 and illustrate this information, then system is extracted some feature and is allowed them through corresponding one group of pre-training sorter.Temporary transient SVM classifier 134 pairs of STIP features 128 operate and produce such as stand, bend knee, walk, the action mark 136 such as to bend over, space S VM sorter 138 pairs of original image pixels 126 operate and produce action mark 140, the trace information 130 extracted and action mark one are used from dynamic time registration 142, track and typical desired trajectory to be compared, and produce action similarity score 144.People's Attitude estimation sorter 146 is trained, thus it can take voxel to illustrate 132 as inputting and producing Attitude estimation 148 as output.The combination that the time produced, space, track compare is placed into space-time label frame 150 with the attitude based on voxel, and described label frame becomes the structure frame for language describing module 152.This information forms action element for being decomposed into by any active sequences and producing AND-OR chart 154 subsequently.Compare and produce to mate with predetermined movable spool (activity scroll) at the 156 AND-OR charts 154 extracted subsequently and mark.For sending, low coupling scoring shows that observed action is not typical and is unusual warning.Produce semantic and vision summary at 158 places and filed.

For illustrating the space-time interest points (STIP) of action

STIP128 is detected feature, and it is in space and/or the important local change presenting feature of image the time.These points of interest many were produced in the period that performed an action by people.Use STIP128, system can attempt to determine, in observed video sequence, what action to occur.Each STIP feature 128 extracted is passed through SVM classifier group 134, and voting mechanism determines that feature most probable is relevant to which action.Moving window subsequently based on STIP detected in time window classification and determine action detected in each frame.Because there is multiple view, so all detected feature that this window will be considered from all views.In every frame, the final information of action form can by the chart of simplifying as showing detected action sequence.Finally, this chart can mate with the chart produced during the SVM training stage, to verify the correctness of detected action sequence.

In one example in which, when observation moves across platform with the people of the concrete region use torque gun at car, STIP128 can be produced.This action can relate to people changes into many boring attitudes from walking one, keeps this attitude blink, and changes back to walking.Because STIP is the motion based on point of interest, so an action can separate with another active region by the STIP entering and leave the generation of each attitude.

Dynamic time registration

Dynamic time registration (DTW) (performing 142) is the algorithm of similarity between two sequences for Measuring Time or velocity variations.Such as, the similarity of manner of walking between two tracks can be detected in an observation process via DTW, even if people's walking and he is quick walking in another sequence at leisure in one sequence, even if or exist accelerate, slow down or multiple of short duration stopping, even if or two sequences convert along timeline.DTW reliably can find the optimum matching between two given sequences (such as time series).Sequence under time scale by non-linearly registration, to measure its similarity independent of some non-linear variable under time scale.DTW algorithm uses dynamic programming technique to solve this problem.First step is compared with each point in secondary signal by each point in a signal, produces matrix.Second step runs through this matrix, starts, and terminate in the upper right corner (terminals of two sequences) in lower right-hand corner (beginnings corresponding to two sequences).For each unit, the adjacent unit by choosing left or below in matrix with lowest accumulated distance calculates Cumulative Distance, and this value is dosed temporary location.When this process completes, the distance between the value in the unit of upper right side represents according to two sequence signals of the most active path by matrix.

DTW only can use track or use track to add position mark to measure similarity.When vehicle is assembled, can use six position marks: FD, MD RD, RP, FP and walking, wherein F, R, M represent the front portion of car, centre and rear portion, and D and P represents driver's side and passenger side respectively.The distance costs of DTW is calculated as follows:

cost＝αE+(1-α)L,0≤α≤1

Wherein, E is the Euclidean distance on two tracks between two points, and L is that in regular hour window, the Nogata of position is poor; α is weighted number, and if track and position mark all for DTW measure, be then set as 0.8.Otherwise for only trajectory measurement, then α equals 1.

The action mark of usage space sorter

Single image identification system may be used for distinguishing in visible many possible whole actions in the data: such as walking, bend over, squat down and stretch out one's hand.These action marks can use Scale invariant features transform method (SIFT) and SVM classifier to determine.Being under the floor level of most of sorting technique is a kind of so method, its with to the various insensitive mode of behavior of bothering to Image Coding, described in bother behavior and can occur (illumination, attitude, observation point and blocking) in image forming course.SIFT descriptor is known in the art insensitive to illumination, to the little vary stable of attitude and observation point, and does not change because of yardstick and change in orientation.With a concrete dimension calculation SIFT descriptor in circular image regions on one point, this yardstick determines territory radius and required image blur.Make image blurring after, find out gradient direction and size, and space-bin grid covers circular diagram image field.Final descriptor is the normalized histogram of the gradient direction by size weighting (Gauss's weighting of successively decreasing from center) separated by space-bin.Therefore, if space-bin grid is 4x4 and there is the casing of 8 orientations, then the size that descriptor has is 4*4*8=128 casing.Although the position of SIFT descriptor, yardstick and orientation can to attitude and the constant way selection of observation point, up-to-date sorting technique uses fixed size and orientation, and is arranged in the grid of overlapping domains by descriptor.This not only improves performance, and allows all descriptors in quickly computed image.

In order to make visual category to summarize, between the member of classification, some visual similarities must be had, and some singularity must be had when comparing with non-member.In addition, any large image sets will have various redundant data (wall, ground etc.).The concept that this causes " vision keyword "---group's prototype descriptor, it uses vector quantization techniques (such as k means Method) to obtain from the whole set of training descriptor.Once the crucial phrase of vision is calculated---being called as code book---, image can be described by this keyword wherein and with which kind of frequency generation uniquely.K means Method is used to form code book.This algorithm finds k center in data space, the set of data points of each center representative in this space near it.After learning k cluster centre (code book) from training SIFT descriptor, the vision keyword of any new SIFT descriptor is the cluster centre near it simply.

After image is broken down into SIFT descriptor and vision keyword, vision keyword can be used for forming the descriptor for whole image, and it is the histogram of all vision keywords in image simply.Alternatively, image can be broken down in space-bin, and the mode that these image histograms can be identical with calculating SIFT descriptor is separated by space.This adds some scattered geometries concerning the process knowing action from original pixels information.

For knowing that the final step of the process of vision sorter is Training Support Vector Machines (SVM), to distinguish in the given example of the classification of its image histogram.

In the present case, image-based technique may be used for identifying some actions, such as, bend over, squat down and stretch out one's hand.Each " action " can relate to the set of the successive frame combined, and system only can use the part that wherein there is interested people of image.When obtaining multiple simultaneous view, system can to each view training SVM, and wherein the SVM of each view assesses each frame (or training with this frame) of action.Voting poll can be calculated subsequently to all SVM frames of all range of views for concrete action.This action is classified as the classification with the highest aggregate votes.

System can subsequently end user's tracker module with in any view, determine people at any time and determine which frame to classification process relevant.First, ground trace may be used for determining when the people in frame performs interested action.The sole mode that may significantly move because of people is by walking, so any frame supposing to correspond to action large on ground all comprises the image of people's walking.Therefore do not need these frames to use to classify based on the sorter of image.

When analyzing movement locus, a little motion table person of good sense for the long-time section in the middle of run duration section performs the frame of the action of not walking.Frame corresponding to the little motion of long-time section is divided into some groups, and each group forms unknown action (or the action of tape label, if for training).In these frames, people's tracker provides bounding box, and it specifies what part of image to comprise people.As mentioned above, bounding box can be specified in correcting image space, to contribute to training more accurately and identification.

Once find frame interested and bounding box by people's tracker, then the process for SVM training is very similar to traditional situation.SIFT descriptor is calculated in each motion images bounding box---for all frames and all views.In each view, those images (being namely temporarily grouped in together) belonging to an action make marks with hand, train for SVM.K means Method builds code book, and it is used for forming the image histogram for each bounding box subsequently.The image histogram obtained from a view is for training its SVM.In the system such as with six cameras, there are six SVM, its each be categorized as three possible actions.

Given new sequence, extracts unlabelled amount of action in the manner.Use the suitable SVM based on view and these frames and bounding box are classified one by one.Each generation in SVM is used for the scoring of each frame of action sequence.These scorings are added together to calculate the accumulation scoring of the action for all frames and all views.The action (classification) with the highest scoring is selected as the mark for action sequence.

Many time, people may crested in particular figure, but visible in other views.Concerning all classification, the ballot of the view of crested equals zero.Use the training of sequence for being labeled, 4 different sequences are used for test, to realize the accuracy improved.Importantly, it should be noted that the same code book being used in training period exploitation when testing, the image histogram of gained can not be classified by no person SVM.

System can use the reconstructing method based on voxel, and whether this reconstructing method is by project to 3D voxel on each plane of delineation and by determining to project overlapping with the respective profile of foreground object thus using the foreground moving object from multiple view to reconstruct 3D volumetric spaces.Once 3D has reconstructed, then column model such as can be fitted to different piece and operation parameter trains the sorter being used for estimating people's attitude by system.

Diagram in the block diagram of Fig. 6 and learning procedure combine with any external signal subsequently, and described external signal such as can export (such as torque gun, nut wrench or screw driver etc.), to form space-time label from one or more aid.This combined information is subsequently for building AND-OR chart at 154 places.Usually, AND-OR chart can describe the situation more complicated than simple dendrogram.This chart comprises two category nodes; OR node, it is and the identical node in typical dendrogram, and " with " node, it allows to be divided into multiple simultaneous path along the downward path of tree.Use this structure to describe the acceptable action sequence occurred in a scene.In this case " with " node allows to describe such as action A and occur, then occur together with action B with C, or the thing that the standard Treemap such as D generation can not describe.

In another structure, replace the AND-OR chart at 154 places, system can use finite state machine to describe the activity of user.Finite state machine is generally used for describing system that is that have several state and that have the condition changed between state.After a sequence segment is temporarily action element (elemental actions) by activity identification system, system can evaluate this sequence to determine whether it meets one group of approved action sequence.Approved sequence set also can be learnt from data, such as, by constructing finite state machine (FSM) from training data, and by any sequence is tested this any sequence by FSM.

It is direct for forming the FSM representing whole group of effective action sequence.Given training sequence group (usage operation identification system classification), first forms FSM node by finding out the set of all individual part marks on all training sequences.Once formation node, then system can place directed edge from node A to Node B, if any training sequence interior joint B be positioned at node A tight after.

Test given sequence is similarly direct: allow sequence through described finite state machine, to determine whether it reaches the state of leaving.If so, then sequence is effective, otherwise sequence is not effective.

Because the position of people when system learns that each activity performs, so it also can comprise spatial information in FSM structure.Which increase extra details, and be not only the possibility of the sequence aspect evaluation activity from event from aspect, position.

Video summary

This video summary module 158 of Fig. 8 adopts input video sequence and illustrates dynamic moving with very efficient and compact form, for explanation and file.By side by side showing multiple activity, final summary makes information maximization.In one approach, background view is selected and from selecting the foreground object of frame be extracted and be merged in base view.Frame is selected to be based on the action mark obtained by system, and allows to select the occurent subsequence of the interested action of those some of them.

Multiple work space

As detected completely and the people follow-up work area of space from multiple different observation point in people's surveillance as herein described, thus cover the tracking that can not affect people one or more given viewpoint people.And people's surveillance adjustable and dynamically reconstruct is movable shop equipment automatically, to avoid the potential interaction with people in workspace areas, and need not stop automatic equipment.This can comprise the new travel path determined and deny for automatic mobile apparatus.Multiple people in the traceable work space of people's surveillance, are delivered to the other system of charge of overseeing contiguous zone by tracking situation, each district can be restricted to for the multiple positions in workspace areas.

Fig. 9 shows the view of multiple workspace areas.Sensing device 12 for relevant work area of space is connected to the respective handling unit 18 being exclusively used in relevant work area of space.The people's that each respective handling unit identification and tracking are moved in its relevant work area of space is close, and communicates with one another on network link 170, thus individuality can be tracked when it transfers to another workspace areas from a workspace areas.As a result, multiple visual surveillance system can be coupled for track individual (when it interacts in various workspace areas).

Should understand, as in the environment of plant as herein described, the use of visual surveillance system is only an example, wherein visual surveillance system can be utilized and this visual surveillance system has the ability in any application that can be applicable to beyond the environment of plant, in this environment of plant, in region the activity of people tracked and motion and activity be recorded.

Visual surveillance system is used for movable automatic time and motion study, and it can be used for monitoring performance and the data being provided for improving working cell activity efficiency and throughput rate.This ability also can realize the activity monitoring in predetermined sequence, and the deviation wherein in sequence can be identified, record, and warning can be produced, for the detection of people's task mistake.Should " mistake proofing " ability can be used for preventing task error propagation to downstream process and owing to causing quality and productivity issue in sequence or for the wrong choice of the suitable material of preplanned mission.

Should also be understood that the form of distortion as the people's surveillance coverage in this system as herein described monitors restricted area, this restricted area can have important automatic or other device activity, and described device activity only needs periodically to safeguard or access.This system will monitor integrality to the access control in this region, and due to undelegated access alerts triggered.Because maintenance in this region or ordinary maintenance need switching over or other stop times; therefore system will monitor access and the operation of the people's (or some) authorized; if and unexpectedly stop action due to accident or emergent medical condition, then will by local or by telemonitoring workstation alerts triggered.This ability can improve the throughput rate of these type tasks, and wherein system can be considered to the part of " buddy system ".

Although carried out detailed description to execution better model of the present invention, those skilled in the art can learn that being used in the scope of appended claim implements many replacement design and implementation examples of the present invention.Object is above-mentioned and all the elements shown in the accompanying drawings should be understood to only be exemplary and not restrictive.

Claims

1. be configured in the illustrated method of probability of the object space in work space, the method comprises:

Identify the prospect part at least two in multiple 2D image;

2. the method for claim 1, if comprise border envelope and predetermined space overlap further, then performs control action.

3. the method for claim 1, comprise the main body axis determining the prospect part that each identifies further, this main body axis is the center line average of corresponding prospect part and aims at the end point of image;

Wherein location point represents the some position of object in work space.

4. method as claimed in claim 3, if comprise location point further in the envelope of border, the coordinate of record position point.

5. method as claimed in claim 4, comprises combined motion curve further, and wherein movement locus representative is relative to the position of the location point of a period of time; With

6. method as claimed in claim 5, during being included in this static in work space part-time section of location point further, determines the action that object performs.

7. method as claimed in claim 3, comprises further and floor and multiple plane being merged, to form plane probability graph.

8. method as claimed in claim 3, comprises further:

9. the method for claim 1, comprises the voxel of border envelope and work space to illustrate further and merges, to form object prototypes perfect further.

10. method as claimed in claim 9, comprises at least one in the speed of the part determining object prototypes perfect further and acceleration further.