US20220245914A1 - Method for capturing motion of an object and a motion capture system - Google Patents
Method for capturing motion of an object and a motion capture system Download PDFInfo
- Publication number
- US20220245914A1 US20220245914A1 US17/611,021 US202017611021A US2022245914A1 US 20220245914 A1 US20220245914 A1 US 20220245914A1 US 202017611021 A US202017611021 A US 202017611021A US 2022245914 A1 US2022245914 A1 US 2022245914A1
- Authority
- US
- United States
- Prior art keywords
- marker
- event
- events
- light
- based light
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000033001 locomotion Effects 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 35
- 239000003550 marker Substances 0.000 claims abstract description 107
- 238000012545 processing Methods 0.000 claims abstract description 8
- 230000004397 blinking Effects 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 7
- 230000003287 optical effect Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 description 10
- 230000004913 activation Effects 0.000 description 5
- 238000001994 activation Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 230000004807 localization Effects 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000036982 action potential Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- QBWCMBCROVPCKQ-UHFFFAOYSA-N chlorous acid Chemical compound OCl=O QBWCMBCROVPCKQ-UHFFFAOYSA-N 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004886 process control Methods 0.000 description 1
- 230000035484 reaction time Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/90—Identifying an image sensor based on its output data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/245—Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
Definitions
- Machine vision is a field that includes methods for acquiring, processing, analyzing and understanding images for use in wide type of applications such as for example security applications (e.g., surveillance, intrusion detection, object detection, facial recognition, etc.), environmental-use applications (e.g., lighting control), object detection and tracking applications, automatic inspection, process control, and robot guidance etc. Therefore, machine vision can be integrated with many different systems.
- security applications e.g., surveillance, intrusion detection, object detection, facial recognition, etc.
- environmental-use applications e.g., lighting control
- object detection and tracking applications e.g., automatic inspection, process control, and robot guidance etc. Therefore, machine vision can be integrated with many different systems.
- motion detection and tracking is useful in detecting the accurate position of a moving object in a scene, especially for Computer-Generated Imagery (CGI) solutions, such as video games, films, sports, television programs, virtual reality and augmented reality, movement science tools and simulators.
- CGI Computer-Generated Imagery
- localization systems are commonly used for this purpose, which are also known as motion capture (mo-cap), to detect and estimate the position of the objects equipped with markers.
- motion capture mi-cap
- Keskin, M. F et al. present in their article “Localization via Visible Light Systems”. Proceedings of the IEEE, 106(6), 1063-1088′′ a survey on localization techniques from visible light systems. They mention many articles from the scientific literature describing solutions (e.g. based on signal strength, time of arrival, angle of arrival, etc.) to the problem of estimating the position of light receivers from the signal received from calibrated reference LEDs, which can be identified via their intensity modulation pattern. In particular, they discuss the use of photo-detectors or conventional imaging sensors as light receivers. The limitations of conventional imaging sensors have been mentioned above. Photo-detectors also have a fixed yet much higher data rate
- An object of the present invention is to provide a new method for capturing motion of an object, adapted to detect and track the pose and orientation of an object with a great accuracy and/or with a high temporal resolution to capture fast movements with high fidelity.
- a method for capturing motion of an object comprises:
- processing the events may comprise:
- the timing coincidence can be detected between events having a time difference of less than 1 millisecond between them.
- the method may further comprise mapping 3D coordinates in the acquisition volume to 2D pixel coordinates in each of the event-based light sensors, wherein determining position of the common marker comprises obtaining 3D coordinates of the common marker that are mapped to the 2D pixel coordinates of the respective pixels from which the events having the detected timing coincidence therebetween are received.
- the at least one marker comprises an active marker adapted to emit light.
- the active maker may emit blinking light with a preset blinking frequency or a pseudo random blinking pattern.
- the at least one marker comprises a passive reflector
- the method further comprises illuminating the acquisition volume with external light, so that the passive reflector is adapted to reflect light from the external light, such as infrared light.
- the at least one marker may be configured to emit or reflect light having wavelength characteristics and the at least two event-based light sensors are provided with optical filters to filter out light not having the wavelength characteristics.
- the at least two event-based light sensors are fixed to a common rigid structure, such as a rigid frame, which can be moveable following the movement path of a moving object, so as to prevent the object from escaping the fields of view of the cameras.
- a motion capture system comprising:
- At least one marker to be carried by an object in an acquisition volume
- the above-mentioned method and system provide a significant improvement of the latency and temporal resolution of localization measurements, which allows higher fidelity movement capture at precision of the order of microseconds and millimeters, while greatly reducing required computational power.
- AR augmented reality
- VR virtual reality
- FIG. 1 illustrates an overall setup of the system according to the invention
- FIG. 2 is a block diagram of an event-based light sensor adapted to implementation of the invention
- FIG. 3 is a flowchart of an illustrative method according to the invention.
- FIGS. 4 a to 4 f illustrate an example of detecting and tracking a marker in 2D coordinates
- FIGS. 5 a to 5 e illustrate another example of detecting and tracking a marker in 3D coordinates.
- FIG. 1 illustrates the overall setup of the motion capture system of the invention.
- the system comprises at least two event-based light sensors 51 , 52 , which may respectively generate events depending on variations of light in the scene where the event-based light sensors observe.
- the event-based light sensors 51 , 52 are attached to a common rigid structure, such as a rigid frame 8 at a height h above the ground, and they observe a scene with their fields of view 61 , 62 partially overlapped in an acquisition volume 1 , which is adapted to contain an object 3 that can be observed and sensed by the event-based light sensors 51 , 52 .
- the fields of view 61 , 62 are overlapped, so that the object can be observed in both event-based light sensors simultaneously.
- the event-based light sensors 51 , 52 are arranged around the periphery of the acquisition volume once they are set, with their fields of view 61 , 62 covering the acquisition volume 1 where the object 3 is located.
- the object 3 may be a person, other moving object, or plurality of the formers, whose position, posture and orientation are to be detected and tracked.
- the object 3 carries at least one marker 4 .
- a plurality of markers is fixed on the surface of the object 3 .
- the object 3 is positioned in the acquisition volume 1 , so that the marker can be observed and sensed by the event-based light sensors 51 , 52 .
- the marker 4 is designed to be easily detectable by the event-based light sensors 51 , 52 . It may emit or reflect continuous or varying light that can be detected by the event-based light sensors which then generate the events accordingly.
- the marker 4 in the acquisition volume 1 can be observed by the event-based light sensors 51 , 52 which generate events corresponding to the variations of incident light from the marker 4 .
- the system includes a computing device, not shown in FIG. 1 , such as a desktop, a laptop computer or a mobile device, which is coupled with the event-based light sensors to receive the events and process these events with computer vision algorithms to detect and track the markers.
- a computing device such as a desktop, a laptop computer or a mobile device, which is coupled with the event-based light sensors to receive the events and process these events with computer vision algorithms to detect and track the markers.
- the position of the markers 4 and motion of the object 3 in the acquisition volume 1 can thus be acquired.
- FIG. 2 shows an event-based light sensor which comprises an event-based asynchronous vision sensor 10 placed facing a scene and receiving the light flow of the scene through optics for acquisition 15 comprising one or several lenses, which provides a field of view depending on the optics characteristics of the lenses.
- the sensor 10 is placed in the image plane of the optics for acquisition 15 . It comprises an array of sensing elements, such as photosensitive elements, organized into a matrix of pixels. Each sensing element corresponding to a pixel produces successive events signal depending on variations of light in the scene.
- the event-based light sensor comprises a processor 12 which processes the event signal originating from the sensor 10 , i.e. the sequences of events received asynchronously from the various pixels, and then forms and outputs event-based data.
- a hardware implementation of the processor 12 using specialized logic circuits (ASIC, FPGA, . . . ) or chip coupled with the sensor 10 is also possible.
- a new instant t k+1 is identified and a spike is emitted at this instant t k+1 .
- each time that the luminance observed by the pixel decreases by the quantity Q starting from what it was in time t k a new instant t k+1 is identified and a spike is emitted at this instant t k+1 .
- the signal sequence for the pixel includes a succession of spikes positioned over time at instants t k depending on the light profile for the pixel.
- the output of the sensor 10 is then in the form of an address-event representation (AER).
- the signal sequence typically includes a luminance attribute corresponding to a variation of incident light.
- the activation threshold Q can be fixed or can be adapted as a function of the luminance.
- the threshold can be compared to the variations in the logarithm of the luminance for generating events when exceeded.
- different thresholds can be respectively set for increasing luminance activations and for decreasing luminance activations.
- the senor 10 can be a dynamic vision sensor (DVS) of the type described in “A 128 ⁇ 128 120 dB 15 ⁇ s Latency Asynchronous Temporal Contrast Vision Sensor”, P. Lichtsteiner, et al., IEEE Journal of Solid-State Circuits, Vol. 43, No. 2, February 2008, pp. 566-576, or in patent application US 2008/0135731 A1.
- DVS dynamic vision sensor
- the dynamics of a retina minimum duration between the action potentials
- the dynamic behaviour surpasses that of a conventional video camera that has a realistic sampling frequency.
- data pertaining to an event originating from a pixel include the address of the pixel, a time of occurrence of the event and a luminance attribute corresponding to a polarity of the event, e.g. +1 if the luminance increases and ⁇ 1 if the luminance decreases.
- an asynchronous sensor 10 that can be used advantageously in the context of this invention is the asynchronous time-based image sensor (ATIS) of which a description is given in the article “A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS”, C. Posch, et al., IEEE Journal of Solid-State Circuits, Vol. 46, No. 1, January 2011, pp. 259-275.
- data pertaining to an event originating from a pixel include the address of the pixel, a time of occurrence of the event and a luminance attribute corresponding to an estimated value of the absolute luminance.
- the markers 4 can be passive, i.e. emitting no light on their own.
- a retro-reflective reflector reflects external illumination light, e.g. from external infrared light sources.
- the reflected light causes the event-based light sensor to generate events as mentioned above.
- the marker 4 can also be active, i.e. using a power source and emitting light, for example visible or near-infrared light, which may cause the event-based light sensor to generate events.
- the event-based light sensors have high temporal resolution, they make it possible to use of a much greater variety of light signals, compared to conventional frame-based cameras.
- the light reflected from or emitted by the markers may exhibit specific temporal behaviours, which could then be decoded for various purposes. For instance, using blinking LEDs with specific blinking frequencies enables identifying the markers reliably, making it easier to distinguish similar-looking objects or to disambiguate the orientation of a symmetrical pattern.
- the event-based light sensor since the event-based light sensor generates events depending on the variations of light received by the sensing element from the marker that appears in the field of view of the sensor, it is possible to configure the event-based light sensors to detect the events exclusively generated by pre-designed markers. This can be achieved by configuring the marker to emit or reflect light having wavelength characteristics, for example in certain pre-set range of wavelength, and adding optical filters to the light sensors so as to filter out light not having the pre-set range of wavelength from the markers. Alternatively, this can also be achieved by configuring the event-based light sensors for sensing only strong light variations, those induced by the markers, while maintaining a fast reaction time.
- At least one marker as mentioned above is installed on the surface of an object, such as the body of a performer or sportsman.
- the object with the marker is located in an acquisition volume.
- the markers are active or passive, as discussed above, and designed to facilitate their detection by event-based light sensors.
- Each marker can be fixed on any part of the object, and in case of human beings, it is usually attached to head/face, fingers, arms and legs.
- At least two event-based light sensors are separately arranged around the periphery of the acquisition volume.
- the event-based light sensors may be fixed to a common rigid structure, so that the relative position between the event-based light sensors is fixed.
- the light sensors are precisely arranged, and their fields of view cover the acquisition volume from different angles.
- the acquisition volume is a space which may contain the object, such as the performer or sportsman, or other objects that move in the acquisition volume.
- the size and shape of the acquisition volume is defined according to the application, a particular arrangement is a cube, such as a room, or a sphere, in which an object can freely move, and its motion will be captured.
- the acquisition volume may move, for example if the common rigid structure on which the event-based light sensors are fixed is a movable structure.
- the object in the acquisition volume can be observed and thus events can be asynchronously generated by the pixels on the event-based sensors in response to the variations of incident light from the fields of view.
- two event-based light sensors are set above the height of the object with their fields of view titled down towards the object.
- the light sensors can be configured properly to achieve a high temporal resolution and to filter out light not having certain wavelength characteristics, which guarantees that the events are exclusively generated by the concerned object, hence reducing the required computational power and the latency to a minimum.
- a known pattern of markers such as an asymmetric grid of blinking LEDs
- the event-based light sensors perceive the LEDs, recognize the blinking frequencies, and associate each 2D measurements to each element of the 3D structure.
- 3D points expressed in the coordinate frame of the acquisition volume then can be mapped into their 2D projections in the pixel coordinates of any of event-based light sensor, resulting in a set of 2D trajectories formed by events as observed by pixels in each sensor.
- models can be inverted to infer the 3D coordinates from a set of corresponding 2D observations, for instance via a classical triangulation approach.
- the event-based light sensors generate events according to the variations of incident light from the markers.
- the events received at step S 3 by the processor 12 are processed (S 4 ) by means of stereo 3D reconstruction, so as to position the marker in the 3D acquisition volume.
- a marker When moving in front of an event-based light sensor, a marker continuously triggers events from pixels of the event-based light sensors. Accordingly, the events in response to the moving marker are generated in each event-based light sensor and are processed separately, for example, by the global processing platform or by a local dedicated embedded system, so as to detect and track marker 2D positions in each event-based light sensor.
- these events including simultaneous events from the same marker generated by each event-based light sensor are paired or associated to find their correspondences on the basis of temporal and/or geometric characteristic, so that the 3D position of the marker can be detected and tracked.
- the events received from respective pixels of each event-based light sensors will be determined if they relate to a common marker on the basis of detection of a timing coincidence between these events from different event-based light sensors.
- the timing coincidence may be detected between events having a time difference of less than 1 millisecond between them.
- passive markers classical geometric epipolar constraints can be used to associate the events on each event-based light sensor to a common marker. After the events are paired, their correspondences are then processed to position the marker.
- event-based light sensors C 1 , C 2 . . . Cn are arranged to detect and track a marker in response to the marker's movement.
- the light sensor C 1 When the marker is present in the fields of view of the light sensors, the light sensor C 1 generates event ev(i c1 , t 1 ) for a pixel having an address expressed as index i c1 at coordinates (x ic1 , y ic1 ) in the pixel array of light sensor C 1 at a time t 1 , the light sensor C 2 generates event ev(i ic2 , t 2 ) for a pixel having an address expressed as index i ic2 at coordinates (x ic2 , y ic2 ) in the pixel array of light sensor C 2 at a time t 2 , . . .
- the light sensor Cn generates event ev(i cn , t n ) for a pixel having an address expressed as index i cn at coordinates (x icn ,y icn ) in the pixel array of light sensor Cn at a time t n .
- it could be one pixel or a group or a spot of pixels adjacent to each other in a light sensor in response to the marker, and the events ev(i c1 ,t 1 ) ev(i ic2 , t 2 ) . . . ev(i cn , t n ) may respectively contain a set of events generated by each light sensor. These events can be continuously generated in response to a maker presence in each event-based sensor.
- a group of adjacent pixels detect the marker, and the respective events ev(i c1 , t 1 ), ev(i ic2 , t 2 ) . . . ev(i cn ,t n ) from the group of pixels in each light sensor then can be separately clustered as clusters CL 1 , CL 2 . . . CLn. Based on the clustering, the marker and its 2D apparent trajectory can be detected and tracked in each sensor. The set of marker tracks detected by each light sensor, with the corresponding trajectories having 2D positions and timestamps, is then processed globally to find correspondences across those light sensors C 1 , C 2 . . . Cn.
- correspondences are made using a data association step based on temporal and geometric constraints, allowing the system to detect and discard spurious candidate tracks, validate consistent ones and assign unique ID to a confirmed marker. For example, if there is a timing coincidence between events from C 1 , C 2 . . . Cn, i.e. if the time difference between t 1 , t 2 . . . and t n are less than 1 millisecond a timing coincidence is detected, so that correspondences are found among these events. This means events ev(i c1 , t 1 ), ev(i ic2 , t 2 ) . . .
- New markers can also be processed in a simple manner. For example, they can be processed by clustering events generated in a small neighbourhood of pixels. A new track can then be created once the cluster reaches a predefined size and/or displays a motion which can be distinguished from the background noise.
- the microsecond accuracy of the event-based light sensor allows decoding the frequency of each marker, which can be used to further improve the reliability of the detection and/or to match the detections across sensors.
- FIGS. 4 a to 4 f show a marker 41 observed and tracked by three event-based light sensors C 1 , C 2 , and C 3 in 2D coordinates and its 3D position is triangulated at every update of the 2D tracks.
- sensors C 1 , C 2 and C 3 that view the marker 41 may separately generate event clusters.
- sensors C 1 and C 2 view the marker 41 and sensor C 3 cannot view the marker 41 due to an occluding obstacle O. Therefore, at first, only sensors C 1 and C 2 generate event clusters CL 1 and CL 2 corresponding to the marker 41 .
- the system uses geometrical constraints, such as epipolar geometry, to check if the cluster in one sensor corresponds to another one in one or more other sensors.
- geometrical constraints such as epipolar geometry
- the clusters CL 1 , CL 2 generated by sensors C 1 and C 2 are determined to correspond to each other, i.e. originating from the same marker. Since no cluster is generated in sensor C 3 , no cluster corresponds to the clusters CL 1 , CL 2 generated in sensors C 1 and C 2 . Accordingly, the marker 41 is triangulated from the sensors C 1 and C 2 in which it is visible.
- the 2D track can be maintained by monitoring new events received in a given spatial neighbourhood of the last known 2D position of this marker 41 in each sensor, when the marker 41 is moving in direction A.
- new events clusters in each sensor including cluster CL 1 ′ in sensor C 1 , cluster CL 2 ′ in sensor C 2 , and cluster CL 3 in sensor C 3 are received. They can belong to the actual movement of the marker 41 or to noise, hence a candidate 2D motion for this marker 41 is created separately in each sensor.
- this candidate 2D motion represented by the cluster CL 3 in sensor C 3 where the marker 41 was not visible is corresponding in terms of geometrical and temporal constraints with the candidate 2D motions represented by clusters CL 1 ′ and CL 2 ′ observed in sensors C 1 and C 2 where the marker was visible, a new 2D track is created in sensor C 3 for the considered marker 41 , as shown in FIG. 4 e .
- This can be used to, for example, process disocclusion of the considered marker in one of the sensors.
- 3D tracks can be initialized similarly than in the abovementioned example of 2D tracking in FIGS. 4 a -4 b , i.e. using event clustering separately in each sensor and checking their correspondence when the clusters are of sufficient size and/or have a timing coincidence, as shown FIGS. 5 a -5 b .
- Clusters CL 1 and CL 2 are generated respectively in sensors C 1 and C 2 , and their correspondences are checked.
- a further step of triangulation is applied on the clusters CL 1 and CL 2 in sensors C 1 and C 2 . Therefore, the marker tracks are then maintained in 3D coordinates rather than in 2D coordinates.
- the 3D speed and acceleration of the marker 41 can be estimated, so as to predict its expected future 3D position of the marker 41 .
- the last known 3D position of the considered marker 41 and the predicted one can be projected in all sensors, not only in sensors C 1 and C 2 where the marker was visible, but also in sensor C 3 where the maker was not visible due to the obstacle O.
- the new event cluster CL 3 is generated, which is close to or overlap the predicted projection in sensor C 3 .
- the use of event-based light sensors in place of frame-based cameras has a direct impact on the temporal resolution of the system.
- stereo 3D reconstruction has been shown to run at around 1 kHz, which is already a 10 ⁇ improvement compared to existing commercial motion capture systems. This enables the present invention to capture high speed movement, such as the movement of a swinging golf club head carrying one or more markers.
- the event-based light sensors enable a marker tracking based on nearest neighbor approaches in the spacetime.
- the events generated by a moving marker should be close in time and image plane space (events are typically measured with the accuracy of microsecond).
- the method to implement this kind of motion capture is relatively simple, and the amount of unnecessary computation is reduced to a minimum. Combined with the increase of the running frequency, this leads to a significant improvement on the measurement latency compared to commercial systems in the art.
- event-based light sensors lead to significant reductions of the power consumption compared to conventional frame-based cameras.
- the detection and tracking method can leverage the sparsity of event-based data in an extremely efficient way, leading to a reduction of the overall computational power required by the system.
- the above-described method may be implemented using program instructions recorded in non-transitory computer-readable media to implement various operations which may be performed by a computer.
- the media may also include, alone or in combination with the program instructions, data files, data structures, and the like.
- the program instructions recorded on the media may be those specially designed and constructed for the purposes of the illustrative embodiments, or they may be of the well-known kind and available to those having skill in the computer software arts.
- non-transitory computer-readable media examples include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM discs and DVDs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like.
- program instructions include both machine code, such as code produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
- the described hardware devices may be configured to act as one.
- Implementations of the invention may be used in many applications including computer human interaction (e.g., recognition of gestures, posture, face, and/or other applications), controlling processes (e.g., an industrial robot, autonomous and other vehicles), following movements of a set of interest points or objects (e.g., vehicles or humans) in the visual scene and with respect to the image plane, augmented reality applications, virtual reality applications, access control (e.g., opening a door based on a gesture, opening an access way based on detection of an authorized person), detecting events (e.g., for visual surveillance or people or animal), counting, tracking, etc.
- computer human interaction e.g., recognition of gestures, posture, face, and/or other applications
- controlling processes e.g., an industrial robot, autonomous and other vehicles
- following movements of a set of interest points or objects e.g., vehicles or humans
- augmented reality applications e.g., virtual reality applications
- access control e.g., opening a door based on a gesture, opening an access
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Length Measuring Devices By Optical Means (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A method for capturing motion of an object, the method comprising: installing at least one marker on the object; bringing the object having the at least one marker installed thereon in an acquisition volume; arranging at least two event-based light sensors such that respective fields of view of the at least two event-based light sensors cover the acquisition volume, wherein each event-based light sensor has an array of pixels; receiving events asynchronously from the pixels of the at least two event-based light sensors depending on variations of incident light from the at least one marker sensed by the pixels; and processing the events to position the at least one marker within the acquisition volume and capture motion of the object.
Description
- Machine vision is a field that includes methods for acquiring, processing, analyzing and understanding images for use in wide type of applications such as for example security applications (e.g., surveillance, intrusion detection, object detection, facial recognition, etc.), environmental-use applications (e.g., lighting control), object detection and tracking applications, automatic inspection, process control, and robot guidance etc. Therefore, machine vision can be integrated with many different systems.
- Among these above-mentioned applications, motion detection and tracking is useful in detecting the accurate position of a moving object in a scene, especially for Computer-Generated Imagery (CGI) solutions, such as video games, films, sports, television programs, virtual reality and augmented reality, movement science tools and simulators.
- Accordingly, localization systems are commonly used for this purpose, which are also known as motion capture (mo-cap), to detect and estimate the position of the objects equipped with markers.
- In the art, several commercial motion capture systems, such as Vicon's and OptiTrack's, use multiple frame-based cameras equipped with IR lighting to detect passive retro-reflective markers. The exposition settings of these cameras are configured to make the markers stand-out in the video streams, so that they may be tracked easily.
- These cameras have a fixed framerate, typically around 120-420 fps, which leads to two important limitations. First, the acquisition process generates a lot of unnecessary data which must still be processed to detect the markers, leading to an unnecessary usage of computational power and hence a limitation on the latency reachable by the system. Second, the fixed framerate, which can be seen as the sampling frequency of the positions of objects, leads to a limitation on the dynamics which can be captured by the system with good fidelity.
- Moreover, in the art, Keskin, M. F et al. present in their article “Localization via Visible Light Systems”. Proceedings of the IEEE, 106(6), 1063-1088″ a survey on localization techniques from visible light systems. They mention many articles from the scientific literature describing solutions (e.g. based on signal strength, time of arrival, angle of arrival, etc.) to the problem of estimating the position of light receivers from the signal received from calibrated reference LEDs, which can be identified via their intensity modulation pattern. In particular, they discuss the use of photo-detectors or conventional imaging sensors as light receivers. The limitations of conventional imaging sensors have been mentioned above. Photo-detectors also have a fixed yet much higher data rate
- An object of the present invention is to provide a new method for capturing motion of an object, adapted to detect and track the pose and orientation of an object with a great accuracy and/or with a high temporal resolution to capture fast movements with high fidelity.
- A method for capturing motion of an object is proposed. The method comprises:
-
- installing at least one marker on the object;
- bringing the object having the at least one marker installed thereon in an acquisition volume;
- arranging at least two event-based light sensors such that respective fields of view of the at least two event-based light sensors cover the acquisition volume, wherein each event-based light sensor has an array of pixels;
- receiving events asynchronously from the pixels of the at least two event-based light sensors depending on variations of incident light from the at least one marker sensed by the pixels; and
- processing the events to position the at least one marker within the acquisition volume and capture motion of the object.
- In an embodiment, processing the events may comprise:
-
- determining that events received from respective pixels of the at least two event-based light sensors relate to a common marker based on detection of a timing coincidence between said events; and
- determining position of the common marker based on 2D pixel coordinates of the respective pixels from which the events having the detected timing coincidence therebetween are received.
- In particular, the timing coincidence can be detected between events having a time difference of less than 1 millisecond between them.
- In addition, the method may further comprise mapping 3D coordinates in the acquisition volume to 2D pixel coordinates in each of the event-based light sensors, wherein determining position of the common marker comprises obtaining 3D coordinates of the common marker that are mapped to the 2D pixel coordinates of the respective pixels from which the events having the detected timing coincidence therebetween are received.
- In another embodiment, the at least one marker comprises an active marker adapted to emit light. As an example, the active maker may emit blinking light with a preset blinking frequency or a pseudo random blinking pattern.
- Alternatively, the at least one marker comprises a passive reflector, and the method further comprises illuminating the acquisition volume with external light, so that the passive reflector is adapted to reflect light from the external light, such as infrared light.
- Furthermore, the at least one marker may be configured to emit or reflect light having wavelength characteristics and the at least two event-based light sensors are provided with optical filters to filter out light not having the wavelength characteristics.
- In an embodiment, the at least two event-based light sensors are fixed to a common rigid structure, such as a rigid frame, which can be moveable following the movement path of a moving object, so as to prevent the object from escaping the fields of view of the cameras.
- There is also provided a motion capture system comprising:
- at least one marker to be carried by an object in an acquisition volume;
-
- at least two event-based light sensors having respective fields of view covering the acquisition volume, wherein each event-based light sensor has an array of pixels configured to asynchronously generate events depending on variations of incident light from the at least one marker sensed by the pixels; and
- a computing device coupled to the at least two event-based light sensors to process the events to position the at least one marker within the acquisition volume and capture motion of the object.
- The above-mentioned method and system provide a significant improvement of the latency and temporal resolution of localization measurements, which allows higher fidelity movement capture at precision of the order of microseconds and millimeters, while greatly reducing required computational power. This makes the motion capture system according to the present invention more flexible and suited to more applications, such as augmented reality (AR) or virtual reality (VR) as well as motion capture for sport analysis, cinema and video games.
- Other features and advantages of the present invention will appear in the description hereinafter, in reference to the appended drawings, where:
-
FIG. 1 illustrates an overall setup of the system according to the invention; -
FIG. 2 is a block diagram of an event-based light sensor adapted to implementation of the invention; -
FIG. 3 is a flowchart of an illustrative method according to the invention; -
FIGS. 4a to 4f illustrate an example of detecting and tracking a marker in 2D coordinates; and -
FIGS. 5a to 5e illustrate another example of detecting and tracking a marker in 3D coordinates. -
FIG. 1 illustrates the overall setup of the motion capture system of the invention. - The system comprises at least two event-based
light sensors - In the illustrated embodiment, the event-based
light sensors rigid frame 8 at a height h above the ground, and they observe a scene with their fields ofview acquisition volume 1, which is adapted to contain anobject 3 that can be observed and sensed by the event-basedlight sensors view light sensors view acquisition volume 1 where theobject 3 is located. - The
object 3 may be a person, other moving object, or plurality of the formers, whose position, posture and orientation are to be detected and tracked. Theobject 3 carries at least onemarker 4. Typically, a plurality of markers is fixed on the surface of theobject 3. Theobject 3 is positioned in theacquisition volume 1, so that the marker can be observed and sensed by the event-basedlight sensors - The
marker 4 is designed to be easily detectable by the event-basedlight sensors - With such an arrangement, the
marker 4 in theacquisition volume 1 can be observed by the event-basedlight sensors marker 4. - Furthermore, the system includes a computing device, not shown in
FIG. 1 , such as a desktop, a laptop computer or a mobile device, which is coupled with the event-based light sensors to receive the events and process these events with computer vision algorithms to detect and track the markers. The position of themarkers 4 and motion of theobject 3 in theacquisition volume 1 can thus be acquired. -
FIG. 2 shows an event-based light sensor which comprises an event-basedasynchronous vision sensor 10 placed facing a scene and receiving the light flow of the scene through optics foracquisition 15 comprising one or several lenses, which provides a field of view depending on the optics characteristics of the lenses. Thesensor 10 is placed in the image plane of the optics foracquisition 15. It comprises an array of sensing elements, such as photosensitive elements, organized into a matrix of pixels. Each sensing element corresponding to a pixel produces successive events signal depending on variations of light in the scene. - The event-based light sensor comprises a
processor 12 which processes the event signal originating from thesensor 10, i.e. the sequences of events received asynchronously from the various pixels, and then forms and outputs event-based data. A hardware implementation of theprocessor 12 using specialized logic circuits (ASIC, FPGA, . . . ) or chip coupled with thesensor 10 is also possible. - In particular, the
asynchronous sensor 10 carries out an acquisition to output a signal which, for each pixel, may be in the form of a succession of instants tk (k=0, 1, 2, . . . ) at which an activation threshold Q is reached. Each time this luminance increases by a quantity equal to the activation threshold Q starting from what it was in time tk, a new instant tk+1 is identified and a spike is emitted at this instant tk+1. Symmetrically, each time that the luminance observed by the pixel decreases by the quantity Q starting from what it was in time tk, a new instant tk+1 is identified and a spike is emitted at this instant tk+1. The signal sequence for the pixel includes a succession of spikes positioned over time at instants tk depending on the light profile for the pixel. Without limitation, the output of thesensor 10 is then in the form of an address-event representation (AER). In addition, the signal sequence typically includes a luminance attribute corresponding to a variation of incident light. - The activation threshold Q can be fixed or can be adapted as a function of the luminance. For example, the threshold can be compared to the variations in the logarithm of the luminance for generating events when exceeded. Alternatively, different thresholds can be respectively set for increasing luminance activations and for decreasing luminance activations.
- By way of example, the
sensor 10 can be a dynamic vision sensor (DVS) of the type described in “A 128×128 120dB 15 μs Latency Asynchronous Temporal Contrast Vision Sensor”, P. Lichtsteiner, et al., IEEE Journal of Solid-State Circuits, Vol. 43, No. 2, February 2008, pp. 566-576, or in patent application US 2008/0135731 A1. The dynamics of a retina (minimum duration between the action potentials) can be approached with a DVS of this type. The dynamic behaviour surpasses that of a conventional video camera that has a realistic sampling frequency. When a DVS is used as the event-basedsensor 10, data pertaining to an event originating from a pixel include the address of the pixel, a time of occurrence of the event and a luminance attribute corresponding to a polarity of the event, e.g. +1 if the luminance increases and −1 if the luminance decreases. - Another example of an
asynchronous sensor 10 that can be used advantageously in the context of this invention is the asynchronous time-based image sensor (ATIS) of which a description is given in the article “A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS”, C. Posch, et al., IEEE Journal of Solid-State Circuits, Vol. 46, No. 1, January 2011, pp. 259-275. When an ATIS is used as the event-basedsensor 10, data pertaining to an event originating from a pixel include the address of the pixel, a time of occurrence of the event and a luminance attribute corresponding to an estimated value of the absolute luminance. - The
markers 4 can be passive, i.e. emitting no light on their own. - For instance, a retro-reflective reflector reflects external illumination light, e.g. from external infrared light sources. The reflected light causes the event-based light sensor to generate events as mentioned above.
- Alternatively, the
marker 4 can also be active, i.e. using a power source and emitting light, for example visible or near-infrared light, which may cause the event-based light sensor to generate events. - Since the event-based light sensors have high temporal resolution, they make it possible to use of a much greater variety of light signals, compared to conventional frame-based cameras. In particular, the light reflected from or emitted by the markers may exhibit specific temporal behaviours, which could then be decoded for various purposes. For instance, using blinking LEDs with specific blinking frequencies enables identifying the markers reliably, making it easier to distinguish similar-looking objects or to disambiguate the orientation of a symmetrical pattern.
- Moreover, since the event-based light sensor generates events depending on the variations of light received by the sensing element from the marker that appears in the field of view of the sensor, it is possible to configure the event-based light sensors to detect the events exclusively generated by pre-designed markers. This can be achieved by configuring the marker to emit or reflect light having wavelength characteristics, for example in certain pre-set range of wavelength, and adding optical filters to the light sensors so as to filter out light not having the pre-set range of wavelength from the markers. Alternatively, this can also be achieved by configuring the event-based light sensors for sensing only strong light variations, those induced by the markers, while maintaining a fast reaction time.
- Referring to
FIG. 3 , now we discuss hereinafter a method for capturing motion of an object by means of the motion capture system mentioned above. - In the beginning (S1), at least one marker as mentioned above is installed on the surface of an object, such as the body of a performer or sportsman. The object with the marker is located in an acquisition volume. The markers are active or passive, as discussed above, and designed to facilitate their detection by event-based light sensors. Each marker can be fixed on any part of the object, and in case of human beings, it is usually attached to head/face, fingers, arms and legs.
- Meanwhile (S2), at least two event-based light sensors are separately arranged around the periphery of the acquisition volume. The event-based light sensors may be fixed to a common rigid structure, so that the relative position between the event-based light sensors is fixed.
- The light sensors are precisely arranged, and their fields of view cover the acquisition volume from different angles. The acquisition volume is a space which may contain the object, such as the performer or sportsman, or other objects that move in the acquisition volume. The size and shape of the acquisition volume is defined according to the application, a particular arrangement is a cube, such as a room, or a sphere, in which an object can freely move, and its motion will be captured. The acquisition volume may move, for example if the common rigid structure on which the event-based light sensors are fixed is a movable structure.
- With such an arrangement, the object in the acquisition volume can be observed and thus events can be asynchronously generated by the pixels on the event-based sensors in response to the variations of incident light from the fields of view. In an example, two event-based light sensors are set above the height of the object with their fields of view titled down towards the object.
- In addition, optionally, the light sensors can be configured properly to achieve a high temporal resolution and to filter out light not having certain wavelength characteristics, which guarantees that the events are exclusively generated by the concerned object, hence reducing the required computational power and the latency to a minimum.
- During the setting of the event-based light sensors, it is also possible to calibrate them to estimate the parameters allowing to map 3D coordinates in the acquisition volume into 2D pixel coordinates, i.e. floating-point pixel addresses, in any of the event-based light sensor.
- For this purpose, as an example, a known pattern of markers, such as an asymmetric grid of blinking LEDs, is moved exhaustively across the acquisition volume and detected by each event-based light sensor. The event-based light sensors perceive the LEDs, recognize the blinking frequencies, and associate each 2D measurements to each element of the 3D structure. 3D points expressed in the coordinate frame of the acquisition volume then can be mapped into their 2D projections in the pixel coordinates of any of event-based light sensor, resulting in a set of 2D trajectories formed by events as observed by pixels in each sensor. These 2D trajectories combined with the knowledge of the 3D dimensions of the pattern then enable estimating the posture and orientation of each light sensor, via a classical bundle adjustment technique.
- Knowing the parameters mapping 3D coordinates into pixels coordinates, models can be inverted to infer the 3D coordinates from a set of corresponding 2D observations, for instance via a classical triangulation approach. In this regard, it is preferable to use more event-based light sensors, thus enabling higher triangulation accuracy for subsequent positioning.
- Afterwards (S3), the event-based light sensors generate events according to the variations of incident light from the markers. The events received at step S3 by the
processor 12 are processed (S4) by means of stereo 3D reconstruction, so as to position the marker in the 3D acquisition volume. - Since the data generated by event-based light sensors are substantively different from the ones generated by frame-based cameras, a different method for detecting and tracking marker is adapted with specific algorithms in the present invention. These algorithms leverage the event-based paradigm and the high-temporal resolution to reduce the computational complexity to a minimum.
- An exemplary algorithm for detecting and tracking markers in event-based data is now discussed. When moving in front of an event-based light sensor, a marker continuously triggers events from pixels of the event-based light sensors. Accordingly, the events in response to the moving marker are generated in each event-based light sensor and are processed separately, for example, by the global processing platform or by a local dedicated embedded system, so as to detect and track marker 2D positions in each event-based light sensor.
- Afterwards, these events including simultaneous events from the same marker generated by each event-based light sensor are paired or associated to find their correspondences on the basis of temporal and/or geometric characteristic, so that the 3D position of the marker can be detected and tracked.
- For example, for active markers, the events received from respective pixels of each event-based light sensors will be determined if they relate to a common marker on the basis of detection of a timing coincidence between these events from different event-based light sensors. The timing coincidence may be detected between events having a time difference of less than 1 millisecond between them. For passive markers, classical geometric epipolar constraints can be used to associate the events on each event-based light sensor to a common marker. After the events are paired, their correspondences are then processed to position the marker.
- In an exemplary example, several event-based light sensors C1, C2 . . . Cn are arranged to detect and track a marker in response to the marker's movement.
- When the marker is present in the fields of view of the light sensors, the light sensor C1 generates event ev(ic1, t1) for a pixel having an address expressed as index ic1 at coordinates (xic1, yic1) in the pixel array of light sensor C1 at a time t1, the light sensor C2 generates event ev(iic2, t2) for a pixel having an address expressed as index iic2 at coordinates (xic2, yic2) in the pixel array of light sensor C2 at a time t2, . . . , and the light sensor Cn generates event ev(icn, tn) for a pixel having an address expressed as index icn at coordinates (xicn,yicn) in the pixel array of light sensor Cn at a time tn. In particular, it could be one pixel or a group or a spot of pixels adjacent to each other in a light sensor in response to the marker, and the events ev(ic1,t1) ev(iic2, t2) . . . ev(icn, tn) may respectively contain a set of events generated by each light sensor. These events can be continuously generated in response to a maker presence in each event-based sensor.
- Usually, a group of adjacent pixels detect the marker, and the respective events ev(ic1, t1), ev(iic2, t2) . . . ev(icn,tn) from the group of pixels in each light sensor then can be separately clustered as clusters CL1, CL2 . . . CLn. Based on the clustering, the marker and its 2D apparent trajectory can be detected and tracked in each sensor. The set of marker tracks detected by each light sensor, with the corresponding trajectories having 2D positions and timestamps, is then processed globally to find correspondences across those light sensors C1, C2 . . . Cn.
- These correspondences are made using a data association step based on temporal and geometric constraints, allowing the system to detect and discard spurious candidate tracks, validate consistent ones and assign unique ID to a confirmed marker. For example, if there is a timing coincidence between events from C1, C2 . . . Cn, i.e. if the time difference between t1, t2 . . . and tn are less than 1 millisecond a timing coincidence is detected, so that correspondences are found among these events. This means events ev(ic1, t1), ev(iic2, t2) . . . ev(icn, tn) and their corresponding clusters CL1, CL2 . . . CLn relate to a common marker. A set of 2D positions in each sensor for this common marker can then be processed to triangulate its 3D position.
- New markers can also be processed in a simple manner. For example, they can be processed by clustering events generated in a small neighbourhood of pixels. A new track can then be created once the cluster reaches a predefined size and/or displays a motion which can be distinguished from the background noise. Alternatively, in case of active markers, it is also possible to encode a unique ID using a specific blinking frequency or a pseudo random blinking pattern for each marker or part of the markers. The microsecond accuracy of the event-based light sensor allows decoding the frequency of each marker, which can be used to further improve the reliability of the detection and/or to match the detections across sensors.
- A more detailed exemplary implementation of detecting and tracking a marker is now discussed with reference to
FIGS. 4a to 4f , which show amarker 41 observed and tracked by three event-based light sensors C1, C2, and C3 in 2D coordinates and its 3D position is triangulated at every update of the 2D tracks. - At the beginning, as shown in
FIG. 4a , when amarker 41 appears in the acquisition volume, sensors C1, C2 and C3 that view themarker 41 may separately generate event clusters. InFIG. 4a , sensors C1 and C2 view themarker 41 and sensor C3 cannot view themarker 41 due to an occluding obstacle O. Therefore, at first, only sensors C1 and C2 generate event clusters CL1 and CL2 corresponding to themarker 41. - Once a cluster of events in each sensor reaches a pre-set size, the system uses geometrical constraints, such as epipolar geometry, to check if the cluster in one sensor corresponds to another one in one or more other sensors. In addition, it is also possible to further check the timing coincidence between the events in each sensor as mentioned above, to determine if they originate from one common marker. As shown in
FIG. 4b , the clusters CL1, CL2 generated by sensors C1 and C2 are determined to correspond to each other, i.e. originating from the same marker. Since no cluster is generated in sensor C3, no cluster corresponds to the clusters CL1, CL2 generated in sensors C1 and C2. Accordingly, themarker 41 is triangulated from the sensors C1 and C2 in which it is visible. - As shown in
FIG. 4c , when the cluster CL1 in sensor C1 corresponds to a cluster CL2 of sensor C2, a 2D track of themarker 41 in sensors C1 and C2 is created. - Afterwards, the 2D track can be maintained by monitoring new events received in a given spatial neighbourhood of the last known 2D position of this
marker 41 in each sensor, when themarker 41 is moving in direction A. For example, new events clusters in each sensor including cluster CL1′ in sensor C1, cluster CL2′ in sensor C2, and cluster CL3 in sensor C3 (when themarker 41 is no longer invisible in sensor C3 due to the occluding obstacle) are received. They can belong to the actual movement of themarker 41 or to noise, hence a candidate 2D motion for thismarker 41 is created separately in each sensor. - Then, as shown in
FIG. 4d , if this candidate 2D motion represented by the cluster CL3 in sensor C3 where themarker 41 was not visible is corresponding in terms of geometrical and temporal constraints with the candidate 2D motions represented by clusters CL1′ and CL2′ observed in sensors C1 and C2 where the marker was visible, a new 2D track is created in sensor C3 for the consideredmarker 41, as shown inFIG. 4e . This can be used to, for example, process disocclusion of the considered marker in one of the sensors. - Once the 2D tracks have been updated and checked to be corresponding with each other, a new 3D position of the
marker 41 represented by its 3D coordinates is triangulated, as shown inFIG. 4 f. - Alternatively, it is also possible to track the marker in 3D coordinates and to use the 3D coordinates to simplify the matching of 2D tracks across sensors in a similar hardware setting, as illustrated in
FIGS. 5a to 5f , where in the beginning themarker 41 is visible in sensors C1 and C2, and is not visible in sensor C3 due to the occluding obstacle. - 3D tracks can be initialized similarly than in the abovementioned example of 2D tracking in
FIGS. 4a-4b , i.e. using event clustering separately in each sensor and checking their correspondence when the clusters are of sufficient size and/or have a timing coincidence, as shownFIGS. 5a-5b . Clusters CL1 and CL2 are generated respectively in sensors C1 and C2, and their correspondences are checked. Different from the 2D tracking, after the correspondence checking, a further step of triangulation is applied on the clusters CL1 and CL2 in sensors C1 and C2. Therefore, the marker tracks are then maintained in 3D coordinates rather than in 2D coordinates. - Based on the past 3D coordinates or positions, as shown in
FIG. 5c , the 3D speed and acceleration of themarker 41 can be estimated, so as to predict its expected future 3D position of themarker 41. - In this regard, the last known 3D position of the considered
marker 41 and the predicted one can be projected in all sensors, not only in sensors C1 and C2 where the marker was visible, but also in sensor C3 where the maker was not visible due to the obstacle O. When themarker 41 can be viewed by sensor C3, as shown inFIG. 5c , the new event cluster CL3 is generated, which is close to or overlap the predicted projection in sensor C3. Then, it is possible to monitor new event clusters CL1′, CL2′, CL3 received in a spatial neighborhood around the projection to identify candidate 2D motions in each sensor, as shown inFIG. 5 d. - It is then possible to use a robust optimization algorithm to estimate the new 3D position best explaining the observed 2D motions and simultaneously detecting the spurious 2D motion candidates, as shown in
FIG. 5 e. - Advantageously, the use of event-based light sensors in place of frame-based cameras has a direct impact on the temporal resolution of the system. With the arrangement mentioned above, stereo 3D reconstruction has been shown to run at around 1 kHz, which is already a 10 × improvement compared to existing commercial motion capture systems. This enables the present invention to capture high speed movement, such as the movement of a swinging golf club head carrying one or more markers.
- The event-based light sensors enable a marker tracking based on nearest neighbor approaches in the spacetime. The events generated by a moving marker should be close in time and image plane space (events are typically measured with the accuracy of microsecond). The method to implement this kind of motion capture is relatively simple, and the amount of unnecessary computation is reduced to a minimum. Combined with the increase of the running frequency, this leads to a significant improvement on the measurement latency compared to commercial systems in the art.
- Moreover, event-based light sensors lead to significant reductions of the power consumption compared to conventional frame-based cameras. When the system is still, the detection and tracking method can leverage the sparsity of event-based data in an extremely efficient way, leading to a reduction of the overall computational power required by the system.
- The above-described method may be implemented using program instructions recorded in non-transitory computer-readable media to implement various operations which may be performed by a computer. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the illustrative embodiments, or they may be of the well-known kind and available to those having skill in the computer software arts. Examples of non-transitory computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM discs and DVDs; magneto-optical media such as optical discs; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include both machine code, such as code produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one.
- Implementations of the invention may be used in many applications including computer human interaction (e.g., recognition of gestures, posture, face, and/or other applications), controlling processes (e.g., an industrial robot, autonomous and other vehicles), following movements of a set of interest points or objects (e.g., vehicles or humans) in the visual scene and with respect to the image plane, augmented reality applications, virtual reality applications, access control (e.g., opening a door based on a gesture, opening an access way based on detection of an authorized person), detecting events (e.g., for visual surveillance or people or animal), counting, tracking, etc. Myriads of other applications exist that will be recognized by those of ordinary skilled in the art given the present disclosure.
- The embodiments described hereinabove are illustrations of this invention. Various modifications can be made to them without leaving the scope of the invention which stems from the annexed claims.
Claims (13)
1. A method for capturing motion of an object, the method comprising:
installing at least one marker on the object;
bringing the object having the at least one marker installed thereon in an acquisition volume;
arranging at least two event-based light sensors such that respective fields of view of the at least two event-based light sensors cover the acquisition volume, wherein each event-based light sensor has an array of pixels;
receiving events asynchronously from the pixels of the at least two event-based light sensors depending on variations of incident light from the at least one marker sensed by the pixels; and
processing the events to position the at least one marker within the acquisition volume and capture motion of the object, wherein processing the events comprises:
determining that events received from respective pixels of the at least two event-based light sensors relate to a common marker based on detection of a timing coincidence between said events; and
determining position of the common marker based on 2D pixel coordinates of the respective pixels from which the events having the detected timing coincidence therebetween are received, wherein the timing coincidence is detected between events having a time difference of less than 1 millisecond between them.
2. The method of claim 1 , further comprising mapping 3D coordinates in the acquisition volume to 2D pixel coordinates in each of the event-based light sensors,
wherein determining position of the common marker comprises obtaining 3D coordinates of the common marker that are mapped to the 2D pixel coordinates of the respective pixels from which the events having the detected timing coincidence therebetween are received.
3. The method of any one of claim 1 , wherein the at least one marker comprises an active marker adapted to emit light, preferably infrared light.
4. The method of claim 3 , wherein the at least one marker is configured to emit blinking light with a blinking frequency or a pseudo random blinking pattern.
5. The method of claim 1 , wherein the at least one marker comprises a passive reflector, the method further comprising illuminating the acquisition volume with external light.
6. The method of claim 5 , wherein the external light is infrared light.
7. The method of claim 1 , wherein the at least one marker is configured to emit or reflect light having wavelength characteristics and the at least two event-based light sensors are provided with optical filters to filter out light not having the wavelength characteristics.
8. The method of claim 1 , wherein the at least two event-based light sensors are fixed to a common rigid structure and the common rigid structure is moveable following a movement path of the object.
9. A motion capture system, comprising:
at least one marker to be carried by an object in an acquisition volume;
at least two event-based light sensors having respective fields of view covering the acquisition volume, wherein each event-based light sensor has an array of pixels configured to asynchronously generate events depending on variations of incident light from the at least one marker sensed by the pixels; and
a computing device coupled to the at least two event-based light sensors to process the events to position the at least one marker within the acquisition volume and capture motion of the object, wherein processing the events comprises:
determining that events received from respective pixels of the at least two event-based light sensors relate to a common marker based on detection of a timing coincidence between said events; and
determining position of the common marker based on 2D pixel coordinates of the respective pixels from which the events having the detected timing coincidence therebetween are received,
wherein the timing coincidence is detected between events having a time difference of less than 1 millisecond between them.
10. The motion capture system according to claim 9 , wherein the at least one marker comprises an active marker which is adapted to emit light, preferably infrared light.
11. The motion capture system according to claim 10 , wherein the at least one marker is configured to emit blinking light with a blinking frequency or a pseudo random blinking pattern.
12. The motion capture system according to claim 9 , wherein the at least one marker comprises a passive reflector, the motion capture system further at least one light source to illuminate the acquisition volume with external light.
13. The motion capture system according to claim 12 , wherein the external light is infrared light.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19305624.9 | 2019-05-16 | ||
EP19305624 | 2019-05-16 | ||
PCT/EP2020/063489 WO2020229612A1 (en) | 2019-05-16 | 2020-05-14 | A method for capturing motion of an object and a motion capture system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220245914A1 true US20220245914A1 (en) | 2022-08-04 |
Family
ID=67902428
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/611,021 Pending US20220245914A1 (en) | 2019-05-16 | 2020-05-14 | Method for capturing motion of an object and a motion capture system |
Country Status (6)
Country | Link |
---|---|
US (1) | US20220245914A1 (en) |
EP (1) | EP3970063A1 (en) |
JP (1) | JP2022532410A (en) |
KR (1) | KR20220009953A (en) |
CN (1) | CN113841180A (en) |
WO (1) | WO2020229612A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3126799B1 (en) * | 2021-09-09 | 2024-05-24 | Xyzed | System for tracking actors on at least one performance stage |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8941589B2 (en) * | 2008-04-24 | 2015-01-27 | Oblong Industries, Inc. | Adaptive tracking system for spatial input devices |
US8965898B2 (en) * | 1998-11-20 | 2015-02-24 | Intheplay, Inc. | Optimizations for live event, real-time, 3D object tracking |
US9019349B2 (en) * | 2009-07-31 | 2015-04-28 | Naturalpoint, Inc. | Automated collective camera calibration for motion capture |
US10095928B2 (en) * | 2015-12-22 | 2018-10-09 | WorldViz, Inc. | Methods and systems for marker identification |
US10949980B2 (en) * | 2018-10-30 | 2021-03-16 | Alt Llc | System and method for reverse optical tracking of a moving object |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6801637B2 (en) * | 1999-08-10 | 2004-10-05 | Cybernet Systems Corporation | Optical body tracker |
EP1958433B1 (en) | 2005-06-03 | 2018-06-27 | Universität Zürich | Photoarray for detecting time-dependent image data |
FR2983998B1 (en) * | 2011-12-08 | 2016-02-26 | Univ Pierre Et Marie Curie Paris 6 | METHOD FOR 3D RECONSTRUCTION OF A SCENE USING ASYNCHRONOUS SENSORS |
-
2020
- 2020-05-14 US US17/611,021 patent/US20220245914A1/en active Pending
- 2020-05-14 JP JP2021568354A patent/JP2022532410A/en active Pending
- 2020-05-14 CN CN202080035560.4A patent/CN113841180A/en active Pending
- 2020-05-14 WO PCT/EP2020/063489 patent/WO2020229612A1/en active Application Filing
- 2020-05-14 EP EP20725546.4A patent/EP3970063A1/en active Pending
- 2020-05-14 KR KR1020217037105A patent/KR20220009953A/en active Search and Examination
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8965898B2 (en) * | 1998-11-20 | 2015-02-24 | Intheplay, Inc. | Optimizations for live event, real-time, 3D object tracking |
US8941589B2 (en) * | 2008-04-24 | 2015-01-27 | Oblong Industries, Inc. | Adaptive tracking system for spatial input devices |
US9019349B2 (en) * | 2009-07-31 | 2015-04-28 | Naturalpoint, Inc. | Automated collective camera calibration for motion capture |
US10095928B2 (en) * | 2015-12-22 | 2018-10-09 | WorldViz, Inc. | Methods and systems for marker identification |
US10949980B2 (en) * | 2018-10-30 | 2021-03-16 | Alt Llc | System and method for reverse optical tracking of a moving object |
Also Published As
Publication number | Publication date |
---|---|
JP2022532410A (en) | 2022-07-14 |
EP3970063A1 (en) | 2022-03-23 |
CN113841180A (en) | 2021-12-24 |
WO2020229612A1 (en) | 2020-11-19 |
KR20220009953A (en) | 2022-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10268900B2 (en) | Real-time detection, tracking and occlusion reasoning | |
CN108156450B (en) | Method for calibrating a camera, calibration device, calibration system and machine-readable storage medium | |
CN107466411B (en) | Two-dimensional infrared depth sensing | |
US7929017B2 (en) | Method and apparatus for stereo, multi-camera tracking and RF and video track fusion | |
Piątkowska et al. | Spatiotemporal multiple persons tracking using dynamic vision sensor | |
WO2021023106A1 (en) | Target recognition method and apparatus, and camera | |
US20080101652A1 (en) | Method and apparatus for tracking objects over a wide area using a network of stereo sensors | |
US9201499B1 (en) | Object tracking in a 3-dimensional environment | |
GB2475104A (en) | Detecting movement of 3D objects using a TOF camera | |
AU2015203771A1 (en) | A method and apparatus for surveillance | |
KR20210129043A (en) | How to process information from event-based sensors | |
JP2011211687A (en) | Method and device for correlating data | |
WO2022127181A1 (en) | Passenger flow monitoring method and apparatus, and electronic device and storage medium | |
Arsic et al. | Applying multi layer homography for multi camera person tracking | |
Herghelegiu et al. | Robust ground plane detection and tracking in stereo sequences using camera orientation | |
Nguyen et al. | Confidence-aware pedestrian tracking using a stereo camera | |
KR20120026956A (en) | Method and apparatus for motion recognition | |
US20220245914A1 (en) | Method for capturing motion of an object and a motion capture system | |
Ghidoni et al. | Cooperative tracking of moving objects and face detection with a dual camera sensor | |
JP2019121019A (en) | Information processing device, three-dimensional position estimation method, computer program, and storage medium | |
Gruenwedel et al. | Low-complexity scalable distributed multicamera tracking of humans | |
Chan et al. | Autonomous person-specific following robot | |
Hadi et al. | Fusion of thermal and depth images for occlusion handling for human detection from mobile robot | |
Lin et al. | Collaborative pedestrian tracking with multiple cameras: Data fusion and visualization | |
Mukhtar et al. | RETRACTED: Gait Analysis of Pedestrians with the Aim of Detecting Disabled People |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PROPHESEE, FRANCE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BOURDIS, NICOLAS;MIGLIORE, DAVIDE;REEL/FRAME:059464/0625 Effective date: 20200518 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |