WO2020239210A1 - Method, apparatus and computer program for tracking of moving objects - Google Patents

Method, apparatus and computer program for tracking of moving objects Download PDF

Info

Publication number
WO2020239210A1
WO2020239210A1 PCT/EP2019/063881 EP2019063881W WO2020239210A1 WO 2020239210 A1 WO2020239210 A1 WO 2020239210A1 EP 2019063881 W EP2019063881 W EP 2019063881W WO 2020239210 A1 WO2020239210 A1 WO 2020239210A1
Authority
WO
WIPO (PCT)
Prior art keywords
tracklet
data
inertial
inertial measurement
video
Prior art date
Application number
PCT/EP2019/063881
Other languages
French (fr)
Inventor
Roberto HENSCHEL
Timo VON MARCARD
PROF. DR. Bodo ROSENHAHN
Original Assignee
Gottfried Wilhelm Leibniz Universität Hannover
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gottfried Wilhelm Leibniz Universität Hannover filed Critical Gottfried Wilhelm Leibniz Universität Hannover
Priority to PCT/EP2019/063881 priority Critical patent/WO2020239210A1/en
Priority to DE112019007390.7T priority patent/DE112019007390T5/en
Publication of WO2020239210A1 publication Critical patent/WO2020239210A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Definitions

  • the invention relates to a method for tracking of moving objects within a defined area using a camera system.
  • the invention relates further to an apparatus for tracking of moving objects within a defined area as well as a computer program for this purpose.
  • MMT multiple people tracking
  • a crucial part of this strategy is to derive a measure whether two detections belong to the same person or not.
  • this involves a motion model or person ap- pearance.
  • a motion model attempts to assign likelihoods to observed person movements. This is very generic and only depends on corner coordinates of de- tection boxes. However, as soon as the motion becomes more dynamic, simple motion models are insufficient and tracking degrades. In particular, most motion models assume low and constant velocities, which holds for pedestrians only within a short temporal window.
  • Another complementary strategy is to model relations between detections based on person appearance.
  • CNN-based feature representations are used to evaluate if two detections show the same person.
  • a major advantage of utilizing appearance information over motion models is that they allow to relate detections which are temporally far apart. This facilitates to re-identify people even after long- term occlusions or if they temporally fall out of the camera view.
  • Flowever incorporating additional sensory input for the task of MPT creates a very different problem setup compared to the vision-only methods, because there exists no linking between the persons detected in the video frames and the inertial data recorded by the body-worn IMUs.
  • a method for tracking of moving objects within a defined area comprised the steps of:
  • the video se- quence has a plurality of video frames and a temporal length
  • inertial data for at least one object said inertial data for an ob- ject has been recorded by an inertial measurement unit arranged on and assigned to the corresponding object;
  • each tracklet includes trajectory data of a trajectory of the corresponding detected object for a certain tracklet time period within the temporal length of said video se- quence using a processing unit;
  • the inventive method uses a video sequence with a plurality of video frames (sin- gle images) and inertial measurement units attached to one or more persons to be tracked, especially to each person.
  • the idea is to incorporate local inertial measurement unit motion measurements in order to disambiguate the as- signment of detections to person trajectories. Since inertial measurement units are body-worn, the corresponding motion measurements are unique for each person. Similar to appearance, this property facilitates to track and re-identify persons even after long-term occlusions.
  • a tracking approach is predestinated for scenarios where it is possible to equip people with an inertial measurement unit and appearance is less informative or not available. The latter could be the case if night-vision is used or privacy concerns prohibit processing or storing color images of people.
  • the video sequence has a plurality of video frames (single video images) and includes a scene of the de- fined area with the moving objects to be tracked.
  • the moving objects can be hu- mans (people).
  • the moving objects can also be of a non-human nature, e.g. vehi- cles.
  • inertial data for at least one object within the defined area are provided.
  • the inertial data has been recorded by an inertial measurement unit (IMU) which is attached on the object to be tracked.
  • IMU inertial measurement unit
  • Each inertial measurement unit is assigned to the corresponding object where it is attached. By localizing the inertial measurement unit, the corresponding object can be identified.
  • the video sequence and the inertial data can be recorded first and saved in a data memory (offline mode). However, it is also possible to use live video sequences and live inertial data just recorded.
  • at least one object is detected in the video frames of the video sequence by an object detector unit.
  • the object detector unit identifies the corresponding object by its coordinates (within the video frame) and/or its height and width.
  • the detected object will be identified by a detection area (called detection box) within the video frame, whereby the detection area includes the image data of the detected object.
  • detection box detection area
  • a colored box is drawn around the object to visualize the detection.
  • a detection of an object within the video sequence is called detected object.
  • a plurality of tracklets are generated by a processing unit. For each object which was detected in one or more coherent video images, one tracklet is generated.
  • Each tracklet in- cludes trajectory data of a trajectory of the detected object for a certain tracklet time period within the temporal length of said video sequence.
  • a tracklet time pe- riod can be a length of 0.5 seconds to 1 .5 seconds.
  • Trajectory data of a tracklet contains the orientation of the object (in relation to the camera view) and/or the po- sition in the defined area or in the video frame. Thus, a tracklet holds information about a short part of the track (trajectory) of the moving object recorded in the de- fined area.
  • a tracklet can be generated based on a detection of an object in one video frame so that for each video frame in which the object was detected a track- let is generated. However, in the most cases it is more advantageous if a tracklet is generated based on a detection of one object in more than one video frame (e.g. 15 video frames to generate one tracklet).
  • one of said inertial measurement unit is assigned to one or more tracklets based on the trajectory data of the corresponding tracklet and the inertial data within the tracklet time period of the corresponding tracklet.
  • each tracklet was assigned an inertial measurement unit. The assignment is conducted such that the inertial data are consistent with the trajectory data of the respective tracklet using. Further the assignments to the tracklets are performed simultane- ously, e.g. by using a mathematical model.
  • an ob- ject regarding to the tracklet can be identified by its inertial measurement unit, be- cause each measurement unit is assigned to exact one object. Periods in which the object was hidden or was outside the recording area are no longer a problem, since the inventive method allows a re-assignment and re-identification.
  • the step of providing the video sequence includes recording the defined area using a camera system including at least one camera to generate the video sequence with the temporal length.
  • the inertial data are provided for a plurality of ob- jects, wherein each object is equipped with at least one inertial measurement unit. It is possible to equip all objects to be tracked with at least on inertial measure- ment unit so that an inertial measurement unit is arranged on and is assigned to each object. However, it is also possible that not every object is equipped with an inertial measurement unit.
  • a plurality of objects in the video frames of the video sequence are detected, wherein a plurality of tracklets for each detected object is generated.
  • the tracklets of a detected object are part of the complete trajectory of the detected object.
  • the trajectory data of the Tracklets of a detected object can form the trajectory of the detected object.
  • one of said inertial measurement unit is assigned to each generated tracklet.
  • an assignment probability is calculated for all inertial measurement units based on the inertial data of the inertial measurement units within the respective tracklet time period, the as- signment probability indicating how consistent are the inertial data of an inertial measurement unit and the trajectory data of the respective tracklet, wherein one of said inertial measurement units being assigned to the respective tracklet based on the calculated assignment probabilities.
  • an assignment to a tracklet is evalu- ated to calculate an assignment probability. Then, an inertial measurement unit is assigned to said tracklet based on the calculated assignment probabilities. Fur- ther, this evaluation is conducted for all tracklets.
  • the assignment probability could be, for example, a cost function.
  • the concrete assignment of an IMU to a tracklet is then determined taking into account all assignment probabili- ties of the IMUs and the tracklet data in a global context.
  • one of said inertial measurement unit is assigned to a first tracklet and the same inertial measurement unit is assigned to a temporal following tracklet based on the calculated assignment probabilities, if the trajectory data of the first tracklet and the trajectory data of the second tracklet are reasona- ble with respect to spatio-temporal aspects and/or if the inertial data of said as- signed inertial measurement unit to the tracklet time periods of the first and the second tracklet are reasonable with respect to movement aspects.
  • Movement as- pects can be orientation and/or acceleration aspects.
  • the trajectory data of the first tracklet and the corresponding inertial data of the inertial measurement unit within the tracklet time period of the first tracklet and the trajectory data of the second tracklet and the corresponding iner- tial data of the inertial measurement unit within the tracklet time period of the sec- ond tracklet have to be reasonable with respect to spatio-temporal aspects and/or movement aspects.
  • at least one assignment of one of said inertial meas- urement units to one of said tracklets is determined based on all assignment prob- abilities of all inertial measurement units and tracklets and all trajectory data of all tracklets in a global context.
  • an orientation is computed for at least one of the ob- jects by inputting image data of the object into an artificial neural network which has learned an assignment of the image data of the object to an orientation of the object, wherein at least one tracklet for said at least one object being further gen- erated based on said detected orientation of said object.
  • a person orientation for example, is defined in terms of the normal vector of the torso’s coronal plane projected to the ground plane.
  • the artificial neural network has learned the mapping of image data within the detection box and the orienta- tion of the object shown in the image data of the detection box.
  • this 2D projection for the orientation is replaceable by a 3D definition of the torso’s orienta- tion.
  • the detected orientation of the object is corrected based on the position of the detected object within the video frames.
  • this person has a constant orientation.
  • the perceived orientation of that person with respect to the view point of the camera is different at every point in the image. This is compensated by considering a correction angle derived from the detection box within the image.
  • a trajectory is determined based on the trajec- tory data of those tracklets to which the inertial measurement unit of the object has been assigned and the inertial data of the inertial measurement unit of the object.
  • an apparatus for tracking of moving objects within a de- fined area comprises at least one inertial measurement unit, an object detector unit and a processing unit, wherein the ap- paratus is arranged for conducting the method as described above.
  • a computer program for tracking of moving ob- jects within a defined area is proposed.
  • the computer program is arranged to exe- cute the method as described above.
  • FIG. 1 schematic representation of the inventive apparatus
  • Figure 2 schematic representation of a camera view after the object detection by the object detection box
  • Figure 5 representation of the visual heading artificial neural network.
  • Figure 1 shows a representation of the inventive apparatus 10 for tracking of mov- ing objects 200, 220 within a defined area 100.
  • the defined area 100 corresponds to the recording view of the camera 12 depicted in figure 1.
  • the invention is not limited to a certain number of persons.
  • the camera 12 is recording a video sequence of the defined area 100 including the persons 200 and 220.
  • the recorded video sequence is transferred to a pro- cessing device 14 for further processing.
  • each person 200, 210 to be tracked is equipped with an inertial measurement unit 20, 22.
  • the person 200 is equipped with the inertial measure- ment unit 20 and the person 220 is equipped with the inertial measurement unit 22.
  • Each inertial measurement unit 20, 22 is recording inertial data of the move- ment of the corresponding object 200, 220. The recorded inertial data are trans- ferred to the processing device 14.
  • the processing device 14 has an object detector unit 16 for detecting the objects 200, 220 to be tracked in the video sequence.
  • the object detector unit 16 uses each video frame to recognize the objects 200, 220. Over the whole video se- quence, each object can be tracked visual.
  • the result of the object detection of the object detector unit is transferred to a pro- cessing unit 18. Further, the inertial data form the inertial measurement units 20, 22 are also transferred to the processing unit 18 of the processing device 14.
  • the processing unit 18 is arranged to generate a plurality of tracklets for the de- tected objects 200, 220 based on object detecting in the video sequence.
  • Each tracklet includes trajectory data of a trajectory of the corresponding detected object 200, 220 for a certain tracklet time period which is calculated form object detec- tion.
  • the processing unit 18 is arranged to assign the inertial measure- ment units 20, 22 to each tracklet based on the trajectory data of the correspond- ing tracklet and the inertial data within the tracklet time period of the corresponding tracklet such that the inertial data are consistent with the trajectory data of the re- spective tracklet.
  • a complete tra- jectory could be calculated based on the trajectory data of the tracklets with the same inertial measurement unit and the inertial data of the inertial measurement unit.
  • the complete trajectory is saved in a digital memory 30.
  • Figure 2 shows the result of the object detection from the object detection unit 16 on the example of one video frame 41 . If the objects 200, 220 are detected, a de- tection box 300, 320 is drawn around each detected person 200, 220. The detec- tion box 300, 320 includes the image data of the corresponding detected object 200, 220 of the relevant video frame 41 .
  • Figure 3 to 5 show in detail the tracking according to the present invention.
  • the in- vention follows the tracking-by-detection paradigm and group detections to short tracklets in a first step. Then the tracking task can be formulated to assign IDs (in- ertial measurement unit IDs) to tracklets, such that all tracklets with identical IDs correspond to person trajectories in the video.
  • IDs in- ertial measurement unit IDs
  • the tracking task is solved by incorporating motion information from body-worn inertial measurement units (IMUs).
  • IMUs body-worn inertial measurement units
  • a graph la- beling problem is formulated to find an optimal assignment of IMU IDs to tracklets, such that the resultant trajectories are visually smooth in the video and consistent with measured IMU orientations and accelerations.
  • the IMU signals are integrated at different conceptual levels: For each potential detection to IMU assignment, we require that the person orientation as seen by the camera is consistent with the corresponding IMU orientation. Orientation con- sistency alone is very ambiguous and hence the invention enforce spatio-temporal consistency if two detections are associated to the same IMU ID.
  • the com- plementary characteristics is employed of short-term detection box motion features and longterm IMU acceleration features.
  • Figure 3 illustrates the graph and shows an exemplary labeling solution.
  • H (v,l) is introduced, which associates a label I element of L to tracklet v element of V.
  • assignment costs c' v element of C and indicator variables x' v which take value 1 if H is selected, and 0 otherwise. Additionally, for pairs of hypotheses sharing the same label and whose vertices are connected by an edge e element of E, compatibility costs c'e element of C are considered modeling the likelihood that two tracklets belong to the same person.
  • the tracking task is then to select hypotheses for the entire sequence that mini- mizes the total costs. This can be casted into a binary optimization problem:
  • the subset comprises all tracklets v that contain a detection in frame t.
  • Eq. (2) ensures that each tracklet v is assigned to at most one label and Eq. 3 guaran- tees that a label is not assigned to more than one tracklet at a time.
  • con- sistency features are introduced which are later mapped to costs c' v and c'e.
  • the person orientation is defined as the normal vector of the torsos coronal
  • the invention calculates the heading nd of the person.
  • the observed heading in Id depends on the person po- sition in the image, see Figure 4 (b).
  • Figure 4 (b) To see this, consider a person walking on a straight line parallel to the image plane of a non-moving camera. In a global con- text this person has a constant orientation. However, due to perspective effects the perceived orientation of that person with respect to the view point of the cam- era is different at every point in the image. To compensate this, a correction angle derived from the detection box within the image is considered. Let ad be the angle between the vector defined by the camera center and box position pd, and the depth-axis of the camera. In order to compensate the perspective influence, the perceived orientation is rotated by ad and obtain the prediction nd, cmp. Figure 4 (b).
  • an artificial neural network is used to learn the mapping 3 ⁇ 4 l® *3 ⁇ 4 . More specifically, a VGG16 pretrained on ImageNet is used to regress the heading, which also incorporates the aforemen- tioned perspective correction (PC) in the last layer.
  • This network as the Visual Heading Network (VHN) is shown in Figure 5 as a graphical illustration of the net- work architecture.
  • IMUs are consistently placed at the back of each person such that the local sensor z-axis corresponds to the normal vector of the torsos coronal plane.
  • the measured torso orientation vector m,t of IMU I at time t is defined:
  • Ri,t element of SO(3) is the measured IMU orientation mapping the local sensor coordinate frame to the global coordi- nate frame and P projects the normal vector to the ground plane.
  • Nd corresponds to the number of detections of tracklet v and td represents the time stamp of a detection d.
  • Both, fvei and fheight are features which are meaningful within short temporal win- dows.
  • this invention has an object to focus on sequences where people get occluded or fall out of the camera view quiet often and for longer time periods.
  • acceleration measurements to link hypothesis which cover larger temporal horizons.
  • the position pt1 element of R 3 at time ti of an IMU can be recovered by double integration of the corresponding acceleration signal a according to
  • the acceleration feature is defined as the set of all such differences according to
  • the graph labeling problem defined in Eq. (1 ) is a binary quadratic program. This program can reformulate as an equivalent binary linear program (BLP) by introduc- ing slack variables: Each product of variables is replaced by a new variable
  • Reliable tracklets can be generated by grouping detections. Temporally subsequent detections can be connected if their intersection over un- ion is above 0.7. For example, the maximal tracklet length can be set to 15 frames.
  • the overall network architecture is depicted in Figure 5. It contains the VGG16 architecture, which is truncated after its last pooling layer.
  • the layers FC1 , FC2 and FC3 are fully connected layers with 16, 16, and 2 neu- rons, respectively.
  • To output an orientation vector n that is within the unit sphere S1 a hyperbolic tangent activation functions is used.
  • the VGG16 is normally trained on ImageNet with an invariance for horizontal flipping. To undo this, the layers FC1 , FC2 and FC3 can be trained together with the last convolutional layer of VGG16, while keeping the weights of all other layers fixed.
  • the network parameters are learned by minimizing the cost function (5), for given ground-truth detections and corresponding IMU heading vectors of the VIMPT training sequence.
  • Graph edge settings In the graph G, weighted edges e element of E are created between two nodes v and v’ in the following cases. If the shortest temporal dis- tance between all detections of v and v’ is at most 12 frames, a short-term edge can be established associated to costs derived from box features. Similarly, long- term edges can be established associated to costs derived from acceleration fea- tures between all detections of v and v’ if the temporal distance is between 12 and 150 frames.
  • Feature to cost mapping In order to transform unary and pairwise features to costs, different strategies can be used. For orientation and box features a logistic regression model is learned that predicts optimal costs based on ground-truth tra- jectories in the training sequence of the dataset. This did not work satisfactory for the acceleration feature. We observed that noise in 3D position estimates destroys much of the expressiveness of this feature. Instead, a threshold can use d to indi- cate if two hypothesis are highly incompatible. Hence, a high constant cost can as- sign to an edge if min facc(H,H’) > d.

Abstract

The invention relates to a method for tracking of moving objects within a defined area, wherein the method comprising the steps of: - providing a recorded video sequence of said defined area, the video sequence has a plurality of video frames and a temporal length; - providing inertial data for at least one object, said inertial data for an object has been recorded by an inertial measurement unit arranged on and assigned to the corresponding object; - detecting at least one object in said video frames of said video sequence using an object detector unit; - generating a plurality of tracklets for said at least one detected object based on object detecting in said video sequence, each tracklet includes trajectory data of a trajectory of the corresponding detected object for a certain tracklet time period within the temporal length of said video sequence using a processing unit; - assigning one of said inertial measurement units to one or more tracklets based on the trajectory data of the corresponding tracklet and the inertial data within the tracklet time period of the corresponding tracklet such that the inertial data are consistent with the trajectory data of the respective tracklet using said processing unit.

Description

Method, apparatus and computer program for tracking of moving objects
The invention relates to a method for tracking of moving objects within a defined area using a camera system. The invention relates further to an apparatus for tracking of moving objects within a defined area as well as a computer program for this purpose.
Multiple people tracking (MPT) in video sequences has been an active field of re- search for decades. Several applications exist where trajectories are required for further analysis and interpretation. This could be to understand social interactions of humans, support urban planning [6], secure areas against dangerous behavior or to provide an automatic analysis of player’s performance in sports.
Most state-of-the-art MPT approaches tackle this problem in two steps: First, a person detector is applied to each frame of the image sequence. Then, an optimi- zation problem is formulated, which clusters all detections such that ideally each cluster represents the trajectory of a person and false detections remain unconsid- ered.
A crucial part of this strategy is to derive a measure whether two detections belong to the same person or not. Typically, this involves a motion model or person ap- pearance. A motion model attempts to assign likelihoods to observed person movements. This is very generic and only depends on corner coordinates of de- tection boxes. However, as soon as the motion becomes more dynamic, simple motion models are insufficient and tracking degrades. In particular, most motion models assume low and constant velocities, which holds for pedestrians only within a short temporal window.
Another complementary strategy is to model relations between detections based on person appearance. Here, CNN-based feature representations are used to evaluate if two detections show the same person. A major advantage of utilizing appearance information over motion models is that they allow to relate detections which are temporally far apart. This facilitates to re-identify people even after long- term occlusions or if they temporally fall out of the camera view.
Despite the enormous progress with artificial neural network-based appearance features, it remains challenging to differentiate persons wearing similar or identical clothing. A prototypical example for such a situation is sport player tracking, where team members wear almost identical dresses. Another challenge arises if people change appearance throughout a sequence, e.g. they put on a jacket or open an umbrella. Then the assumption of appearance constancy is violated and conse- quently tracking accuracy degrades.
In W. Jiang and Z. Yin. Combining passive visual cameras and active IMU sensors to track cooperative people. In International Conference on Information Fusion (Fusion), pages 1338— 1345, 2015, a method for people tracking in videos is dis- closed. Each person to be tracked is equipped with an inertial measurement unit (IMU). An IMU-equipped person has to be manually localized in the first video frame. Then, IMU information is used to recover the trajectory in situations where the visual tracker fails.
Flowever, incorporating additional sensory input for the task of MPT creates a very different problem setup compared to the vision-only methods, because there exists no linking between the persons detected in the video frames and the inertial data recorded by the body-worn IMUs.
Flence, it is an object of the present invention to provide an improved method and apparatus which use both information for tracking objects in an automatic manner without a manually localization. These object is solved by the inventive method according to claim 1 , by the in- ventive apparatus according to claim 10 and by the inventive computer program according to claim 12.
In accordance to claim 1 , a method for tracking of moving objects within a defined area is proposed. The inventive method comprised the steps of:
- providing a recorded video sequence of said defined area, the video se- quence has a plurality of video frames and a temporal length;
- providing inertial data for at least one object, said inertial data for an ob- ject has been recorded by an inertial measurement unit arranged on and assigned to the corresponding object;
- detecting at least one object in said video frames of said video se- quence using an object detector unit;
- generating a plurality of tracklets for said at least one detected object based on object detecting in said video sequence, each tracklet includes trajectory data of a trajectory of the corresponding detected object for a certain tracklet time period within the temporal length of said video se- quence using a processing unit;
- assigning one of said inertial measurement units to one or more track- lets based on the trajectory data of the corresponding tracklet and the inertial data within the tracklet time period of the corresponding tracklet such that the inertial data are consistent with the trajectory data of the respective tracklet using said processing unit.
The inventive method uses a video sequence with a plurality of video frames (sin- gle images) and inertial measurement units attached to one or more persons to be tracked, especially to each person. Conceptually, the idea is to incorporate local inertial measurement unit motion measurements in order to disambiguate the as- signment of detections to person trajectories. Since inertial measurement units are body-worn, the corresponding motion measurements are unique for each person. Similar to appearance, this property facilitates to track and re-identify persons even after long-term occlusions. Hence, such a tracking approach is predestinated for scenarios where it is possible to equip people with an inertial measurement unit and appearance is less informative or not available. The latter could be the case if night-vision is used or privacy concerns prohibit processing or storing color images of people.
Even though motion information is available through IMU measurements it still poses a very challenging problem. From IMU data alone it is not possible to gener- ate stable 3D trajectories due to unknown initial states and accumulating drift caused by double integration of acceleration signals. If this were possible, we could easily associate each detection box to the closest IMU trajectory projected to the image. Hence, instead of working on pre-computed IMU trajectories, we have to associate 3D orientation and acceleration measurements to 2D motion infor- mation observed in the video. For example, this requires to relate IMU orienta- tions, which are elements of SO(3), to image data being a two-dimensional pixel array. Further, IMU measurements often fit to several people at a time step and the person wearing the IMU might be occluded or out of the camera view.
To address this, a recorded video sequence is provided. The video sequence has a plurality of video frames (single video images) and includes a scene of the de- fined area with the moving objects to be tracked. The moving objects can be hu- mans (people). The moving objects can also be of a non-human nature, e.g. vehi- cles. Furthermore, inertial data for at least one object within the defined area are provided. The inertial data has been recorded by an inertial measurement unit (IMU) which is attached on the object to be tracked. Each inertial measurement unit is assigned to the corresponding object where it is attached. By localizing the inertial measurement unit, the corresponding object can be identified.
The video sequence and the inertial data can be recorded first and saved in a data memory (offline mode). However, it is also possible to use live video sequences and live inertial data just recorded. In the first step, at least one object, advantageously more objects or all objects in the video sequence, is detected in the video frames of the video sequence by an object detector unit. The object detector unit identifies the corresponding object by its coordinates (within the video frame) and/or its height and width. In this cases, the detected object will be identified by a detection area (called detection box) within the video frame, whereby the detection area includes the image data of the detected object. Sometimes, a colored box is drawn around the object to visualize the detection. A detection of an object within the video sequence is called detected object.
In the next step, for each detected object within the video sequence a plurality of tracklets are generated by a processing unit. For each object which was detected in one or more coherent video images, one tracklet is generated. Each tracklet in- cludes trajectory data of a trajectory of the detected object for a certain tracklet time period within the temporal length of said video sequence. A tracklet time pe- riod can be a length of 0.5 seconds to 1 .5 seconds. Trajectory data of a tracklet contains the orientation of the object (in relation to the camera view) and/or the po- sition in the defined area or in the video frame. Thus, a tracklet holds information about a short part of the track (trajectory) of the moving object recorded in the de- fined area. A tracklet can be generated based on a detection of an object in one video frame so that for each video frame in which the object was detected a track- let is generated. However, in the most cases it is more advantageous if a tracklet is generated based on a detection of one object in more than one video frame (e.g. 15 video frames to generate one tracklet).
Then, one of said inertial measurement unit is assigned to one or more tracklets based on the trajectory data of the corresponding tracklet and the inertial data within the tracklet time period of the corresponding tracklet. In the best case, each tracklet was assigned an inertial measurement unit. The assignment is conducted such that the inertial data are consistent with the trajectory data of the respective tracklet using. Further the assignments to the tracklets are performed simultane- ously, e.g. by using a mathematical model.
Based on the assignment of the inertial measurement units to the tracklet, an ob- ject regarding to the tracklet can be identified by its inertial measurement unit, be- cause each measurement unit is assigned to exact one object. Periods in which the object was hidden or was outside the recording area are no longer a problem, since the inventive method allows a re-assignment and re-identification.
According to an embodiment, the step of providing the video sequence includes recording the defined area using a camera system including at least one camera to generate the video sequence with the temporal length.
According to an embodiment, the inertial data are provided for a plurality of ob- jects, wherein each object is equipped with at least one inertial measurement unit. It is possible to equip all objects to be tracked with at least on inertial measure- ment unit so that an inertial measurement unit is arranged on and is assigned to each object. However, it is also possible that not every object is equipped with an inertial measurement unit.
According to an embodiment, a plurality of objects in the video frames of the video sequence are detected, wherein a plurality of tracklets for each detected object is generated. The tracklets of a detected object are part of the complete trajectory of the detected object. The trajectory data of the Tracklets of a detected object can form the trajectory of the detected object.
According to an embodiment, one of said inertial measurement unit is assigned to each generated tracklet. According to an embodiment, with respect to one of said tracklets, an assignment probability is calculated for all inertial measurement units based on the inertial data of the inertial measurement units within the respective tracklet time period, the as- signment probability indicating how consistent are the inertial data of an inertial measurement unit and the trajectory data of the respective tracklet, wherein one of said inertial measurement units being assigned to the respective tracklet based on the calculated assignment probabilities.
In other words, for each inertial measurement an assignment to a tracklet is evalu- ated to calculate an assignment probability. Then, an inertial measurement unit is assigned to said tracklet based on the calculated assignment probabilities. Fur- ther, this evaluation is conducted for all tracklets. The assignment probability could be, for example, a cost function. In an embodiment, the concrete assignment of an IMU to a tracklet is then determined taking into account all assignment probabili- ties of the IMUs and the tracklet data in a global context.
According to an embodiment, one of said inertial measurement unit is assigned to a first tracklet and the same inertial measurement unit is assigned to a temporal following tracklet based on the calculated assignment probabilities, if the trajectory data of the first tracklet and the trajectory data of the second tracklet are reasona- ble with respect to spatio-temporal aspects and/or if the inertial data of said as- signed inertial measurement unit to the tracklet time periods of the first and the second tracklet are reasonable with respect to movement aspects. Movement as- pects can be orientation and/or acceleration aspects.
In other words, the trajectory data of the first tracklet and the corresponding inertial data of the inertial measurement unit within the tracklet time period of the first tracklet and the trajectory data of the second tracklet and the corresponding iner- tial data of the inertial measurement unit within the tracklet time period of the sec- ond tracklet have to be reasonable with respect to spatio-temporal aspects and/or movement aspects. According to an embodiment, at least one assignment of one of said inertial meas- urement units to one of said tracklets is determined based on all assignment prob- abilities of all inertial measurement units and tracklets and all trajectory data of all tracklets in a global context. This can be realized by using a mathematical model which takes into account all assignment probabilities of all inertial measurement units related to all tracklets as well as the trajectory data of all tracklets. Based on video object detection (e.g. based on the trajectory data), two tracklets that belong to the same detected object and follow each other in time are connected to each other.
According to an embodiment, an orientation is computed for at least one of the ob- jects by inputting image data of the object into an artificial neural network which has learned an assignment of the image data of the object to an orientation of the object, wherein at least one tracklet for said at least one object being further gen- erated based on said detected orientation of said object.
A person orientation, for example, is defined in terms of the normal vector of the torso’s coronal plane projected to the ground plane. The artificial neural network has learned the mapping of image data within the detection box and the orienta- tion of the object shown in the image data of the detection box. However, this 2D projection for the orientation is replaceable by a 3D definition of the torso’s orienta- tion.
Even though global orientation is constant, the perceived orientation as seen from the camera varies. According to an embodiment, the detected orientation of the object is corrected based on the position of the detected object within the video frames. In a global context this person has a constant orientation. However, due to per- spective effects the perceived orientation of that person with respect to the view point of the camera is different at every point in the image. This is compensated by considering a correction angle derived from the detection box within the image.
According to an embodiment, for at least one of the objects (advantageously for a plurality of objects or for all objects) a trajectory is determined based on the trajec- tory data of those tracklets to which the inertial measurement unit of the object has been assigned and the inertial data of the inertial measurement unit of the object.
In accordance to claim 10, an apparatus for tracking of moving objects within a de- fined area is proposed. The inventive apparatus comprises at least one inertial measurement unit, an object detector unit and a processing unit, wherein the ap- paratus is arranged for conducting the method as described above.
Further, in accordance to claim 12, a computer program for tracking of moving ob- jects within a defined area is proposed. The computer program is arranged to exe- cute the method as described above.
The invention is described in more detail in the following figures:
Figure 1 schematic representation of the inventive apparatus;
Figure 2 schematic representation of a camera view after the object detection by the object detection box;
Figure 3 graph representation of the generated tracklets;
Figure 4 representation of person orientation;
Figure 5 representation of the visual heading artificial neural network.
Figure 1 shows a representation of the inventive apparatus 10 for tracking of mov- ing objects 200, 220 within a defined area 100. The defined area 100 corresponds to the recording view of the camera 12 depicted in figure 1. In the example of fig- ure 1 there are two objects in the form of persons 200, 220 within the defined area 100. However, the invention is not limited to a certain number of persons.
The camera 12 is recording a video sequence of the defined area 100 including the persons 200 and 220. The recorded video sequence is transferred to a pro- cessing device 14 for further processing.
Furthermore, each person 200, 210 to be tracked is equipped with an inertial measurement unit 20, 22. The person 200 is equipped with the inertial measure- ment unit 20 and the person 220 is equipped with the inertial measurement unit 22. Each inertial measurement unit 20, 22 is recording inertial data of the move- ment of the corresponding object 200, 220. The recorded inertial data are trans- ferred to the processing device 14.
The processing device 14 has an object detector unit 16 for detecting the objects 200, 220 to be tracked in the video sequence. The object detector unit 16 uses each video frame to recognize the objects 200, 220. Over the whole video se- quence, each object can be tracked visual.
The result of the object detection of the object detector unit is transferred to a pro- cessing unit 18. Further, the inertial data form the inertial measurement units 20, 22 are also transferred to the processing unit 18 of the processing device 14.
The processing unit 18 is arranged to generate a plurality of tracklets for the de- tected objects 200, 220 based on object detecting in the video sequence. Each tracklet includes trajectory data of a trajectory of the corresponding detected object 200, 220 for a certain tracklet time period which is calculated form object detec- tion.
Furthermore, the processing unit 18 is arranged to assign the inertial measure- ment units 20, 22 to each tracklet based on the trajectory data of the correspond- ing tracklet and the inertial data within the tracklet time period of the corresponding tracklet such that the inertial data are consistent with the trajectory data of the re- spective tracklet.
If an inertial measurement unit has been assigned to each tracklet, a complete tra- jectory could be calculated based on the trajectory data of the tracklets with the same inertial measurement unit and the inertial data of the inertial measurement unit. The complete trajectory is saved in a digital memory 30.
Figure 2 shows the result of the object detection from the object detection unit 16 on the example of one video frame 41 . If the objects 200, 220 are detected, a de- tection box 300, 320 is drawn around each detected person 200, 220. The detec- tion box 300, 320 includes the image data of the corresponding detected object 200, 220 of the relevant video frame 41 . Figure 3 to 5 show in detail the tracking according to the present invention. The in- vention follows the tracking-by-detection paradigm and group detections to short tracklets in a first step. Then the tracking task can be formulated to assign IDs (in- ertial measurement unit IDs) to tracklets, such that all tracklets with identical IDs correspond to person trajectories in the video.
In the context of the present invention, the tracking task is solved by incorporating motion information from body-worn inertial measurement units (IMUs). A graph la- beling problem is formulated to find an optimal assignment of IMU IDs to tracklets, such that the resultant trajectories are visually smooth in the video and consistent with measured IMU orientations and accelerations.
The IMU signals are integrated at different conceptual levels: For each potential detection to IMU assignment, we require that the person orientation as seen by the camera is consistent with the corresponding IMU orientation. Orientation con- sistency alone is very ambiguous and hence the invention enforce spatio-temporal consistency if two detections are associated to the same IMU ID. Here, the com- plementary characteristics is employed of short-term detection box motion features and longterm IMU acceleration features. Figure 3 illustrates the graph and shows an exemplary labeling solution.
In order to solve the tracking task, an undirected weighted graph Q = (V,E,C,L) is created, where V is the vertex set comprising all tracklets of the entire sequence and E is the edge set containing all edges that connect a pair of tracklets. Vertices and edges may obtain a label I element of L, where the label set L = {1 ,2,3, ...,P} contains an IMU ID for all P persons wearing an IMU. At this point, the notion of an assignment hypothesis H = (v,l) is introduced, which associates a label I element of L to tracklet v element of V. Associated to each hy- pothesis are assignment costs c'v element of C and indicator variables x'v which take value 1 if H is selected, and 0 otherwise. Additionally, for pairs of hypotheses sharing the same label and whose vertices are connected by an edge e element of E, compatibility costs c'e element of C are considered modeling the likelihood that two tracklets belong to the same person.
The tracking task is then to select hypotheses for the entire sequence that mini- mizes the total costs. This can be casted into a binary optimization problem:
Figure imgf000015_0001
where the feasibility set F is subject to
Figure imgf000015_0002
The subset
Figure imgf000015_0003
comprises all tracklets v that contain a detection in frame t. Eq. (2) ensures that each tracklet v is assigned to at most one label and Eq. 3 guaran- tees that a label is not assigned to more than one tracklet at a time. Next, the unary and pairwise potentials are described in detail. Specifically, con- sistency features are introduced which are later mapped to costs c'v and c'e. In or- der to provide a measure for the likelihood of an assignment hypothesis H = (v, I), the person orientation in each detection box of tracklet v is calculated and com- pared those orientations to the temporally aligned orientation measurements of
IMU I.
The person orientation is defined as the normal vector of the torsos coronal
Figure imgf000016_0001
plane projected to the ground plane as illustrated in Figure 4 (a). The projected normal is used as this comprises less degrees of freedom and people usually move in a rather upright pose.
Hence, given the image data Id of detection d, the invention calculates the heading nd of the person. However, the observed heading in Id depends on the person po- sition in the image, see Figure 4 (b). To see this, consider a person walking on a straight line parallel to the image plane of a non-moving camera. In a global con- text this person has a constant orientation. However, due to perspective effects the perceived orientation of that person with respect to the view point of the cam- era is different at every point in the image. To compensate this, a correction angle derived from the detection box within the image is considered. Let ad be the angle between the vector defined by the camera center and box position pd, and the depth-axis of the camera. In order to compensate the perspective influence, the perceived orientation is rotated by ad and obtain the prediction nd, cmp. Figure 4 (b).
In order to obtain the person heading from image data, an artificial neural network is used to learn the mapping ¾ *¾ . More specifically, a VGG16 pretrained on ImageNet is used to regress the heading, which also incorporates the aforemen- tioned perspective correction (PC) in the last layer. This network as the Visual Heading Network (VHN) is shown in Figure 5 as a graphical illustration of the net- work architecture. In an example setting of the present invention, IMUs are consistently placed at the back of each person such that the local sensor z-axis corresponds to the normal vector of the torsos coronal plane. The measured torso orientation vector m,t of IMU I at time t is defined:
Figure imgf000017_0001
where
Figure imgf000017_0003
is the local z-axis vector, Ri,t element of SO(3) is the measured IMU orientation mapping the local sensor coordinate frame to the global coordi- nate frame and P projects the normal vector to the ground plane.
Finally, we define the unary orientation feature representing the likelihood of hy- pothesis H as
Figure imgf000017_0002
where F denotes the cosine similarity, Nd corresponds to the number of detections of tracklet v and td represents the time stamp of a detection d.
Further, pairwise features are defined which represent the compatibility of two hy- pothesis FI = (v, I) and FI’ = (n', I). Two hypotheses are said to be compatible, if the assignment of a joint label I to v and v’ is reasonable with respect to spatio-tem- poral aspects.
Box Features. Within a short temporal window a person cannot move arbitrarily fast. Flence, the tracklets of a compatible hypothesis pair should be spatially close and corresponding detection boxes should be similar in size. For each detection box d, a rough 3D position estimate
Figure imgf000018_0005
is calculated by projecting the dete- tion box foot point to the 3D ground plane of the scene. Hence, for detections d of v and d’ of v’ let V3D(d, d’) denote the velocity in 3D from d to d’. Let N(v,v’) be the set of all pairs of detections between H and H’ considered for the feature. The ve- locity feature between H and H’ can be defined as:
Figure imgf000018_0001
Additionally, we compare the detection box heights of both hypotheses. Let hd de- note the height of detection box d in pixels. The compatibility measure Dh(d, d’) is defined based on the heights of detections d and d’ according to
Figure imgf000018_0002
where the factor in front of the fraction compensates for the temporal distance be- tween d and d’:
Figure imgf000018_0003
Finally, a box height feature is defined as
Figure imgf000018_0004
Both, fvei and fheight are features which are meaningful within short temporal win- dows. However, this invention has an object to focus on sequences where people get occluded or fall out of the camera view quiet often and for longer time periods. Hence, in the following we utilize acceleration measurements to link hypothesis which cover larger temporal horizons.
Acceleration Feature. Ideally, the position pt1 element of R3 at time ti of an IMU can be recovered by double integration of the corresponding acceleration signal a according to
Figure imgf000019_0003
where to, pto, and vto denote initial time, initial position and initial velocity, respec- tively. Please note that a in this case represents the gravity-free acceleration in global coordinates.
Let pto be the 3D position of detection d and pti the 3D position of d’. After double integration of the acceleration signal, Eq. (10) can solve for the initial velocity, which it is denoted as ViMu(d, d’). Concurrently we a persons velocity Vd at initial time to is approximate in terms of finite differences of neighboring detections of d. Hence, for a compatible hypothesis pair H and H’ the velocity differences
Figure imgf000019_0002
should be small for all possible detection pairs d element of v and d’ element of v’. The acceleration feature is defined as the set of all such differences according to
Figure imgf000019_0001
The graph labeling problem defined in Eq. (1 ) is a binary quadratic program. This program can reformulate as an equivalent binary linear program (BLP) by introduc- ing slack variables: Each product of variables is replaced by a new variable
Figure imgf000020_0002
zi,n,n’ and the following constraints are added:
Figure imgf000020_0001
Tracklet generation. Reliable tracklets can be generated by grouping detections. Temporally subsequent detections can be connected if their intersection over un- ion is above 0.7. For example, the maximal tracklet length can be set to 15 frames.
Visual Heading Network. The overall network architecture is depicted in Figure 5. It contains the VGG16 architecture, which is truncated after its last pooling layer. The layers FC1 , FC2 and FC3 are fully connected layers with 16, 16, and 2 neu- rons, respectively. To output an orientation vector n that is within the unit sphere S1 , a hyperbolic tangent activation functions is used. The VGG16 is normally trained on ImageNet with an invariance for horizontal flipping. To undo this, the layers FC1 , FC2 and FC3 can be trained together with the last convolutional layer of VGG16, while keeping the weights of all other layers fixed. During training, a dropout layer with p = 0.3 between the fully connected layers are added to avoid overfitting. Finally, the network parameters are learned by minimizing the cost function (5), for given ground-truth detections and corresponding IMU heading vectors of the VIMPT training sequence.
Graph edge settings. In the graph G, weighted edges e element of E are created between two nodes v and v’ in the following cases. If the shortest temporal dis- tance between all detections of v and v’ is at most 12 frames, a short-term edge can be established associated to costs derived from box features. Similarly, long- term edges can be established associated to costs derived from acceleration fea- tures between all detections of v and v’ if the temporal distance is between 12 and 150 frames.
Feature to cost mapping. In order to transform unary and pairwise features to costs, different strategies can be used. For orientation and box features a logistic regression model is learned that predicts optimal costs based on ground-truth tra- jectories in the training sequence of the dataset. This did not work satisfactory for the acceleration feature. We observed that noise in 3D position estimates destroys much of the expressiveness of this feature. Instead, a threshold can use d to indi- cate if two hypothesis are highly incompatible. Hence, a high constant cost can as- sign to an edge if min facc(H,H’) > d.
Reference numbers
10 apparatus/system 12 camera
14 processing device 16 object detection unit
18 processing unit
20 IMU
22 IMU
30 digital memory
41 video frame
100 defined area
200 object/person
220 object/person
300 detection box
320 detection box

Claims

Patent claims
1. Method for tracking of moving objects within a defined area, wherein the method comprising the steps of:
- providing a recorded video sequence of said defined area, the video se- quence has a plurality of video frames and a temporal length;
- providing inertial data for at least one object, said inertial data for an ob- ject has been recorded by an inertial measurement unit arranged on and assigned to the corresponding object;
- detecting at least one object in said video frames of said video se- quence using an object detector unit;
- generating a plurality of tracklets for said at least one detected object based on object detecting in said video sequence, each tracklet includes trajectory data of a trajectory of the corresponding detected object for a certain tracklet time period within the temporal length of said video se- quence using a processing unit;
- assigning one of said inertial measurement units to one or more track- lets based on the trajectory data of the corresponding tracklet and the inertial data within the tracklet time period of the corresponding tracklet such that the inertial data are consistent with the trajectory data of the respective tracklet using said processing unit.
2. Method according to claim 1 , wherein the step of providing said video se- quence includes recording said defined area using a camera system includ- ing at least one camera to generate the video sequence with the temporal length.
3. Method according to claim 1 or 2, wherein inertial data are provided for a plurality of objects, wherein each object is equipped with at least one inertial measurement unit.
4. Method according to one of the preceding claims, wherein a plurality of ob- jects in said video frames of said video sequence are detected, wherein a plurality of tracklets for each detected object is generated.
5. Method according to one of the preceding claims, wherein one of said iner- tial measurement unit is assigned to each generated tracklet.
6. Method according to one of the preceding claims, wherein, with respect to one of said tracklets, an assignment probability is calculated for all inertial measurement units based on the inertial data of the inertial measurement units within the respective tracklet time period, the assignment probability indicating how consistent are the inertial data of an inertial measurement unit and the trajectory data of the respective tracklet, wherein one of said in- ertial measurement units being assigned to the respective tracklet based on the calculated assignment probabilities.
7. Method according to claim 6, wherein one of said inertial measurement unit is assigned to a first tracklet and the same inertial measurement unit is as- signed to a temporal following tracklet based on the calculated assignment probabilities, if the trajectory data of the first tracklet and the trajectory data of the second tracklet are reasonable with respect to spatio-temporal as- pects and/or if the inertial data of said assigned inertial measurement unit to the tracklet time periods of the first and the second tracklet are reasonable with respect to movement aspects.
8. Method according to claim 6 or 7, wherein at least one assignment of one of said inertial measurement units to one of said tracklets is determined based on all assignment probabilities of all inertial measurement units and track- lets and all trajectory data of all tracklets in a global context.
9. Method according to one of the preceding claims, wherein an orientation is detected for at least one of the objects by inputting image data of the object into an artificial neural network which has learned an assignment of the im- age data of the object to an orientation of the object, wherein at least one tracklet for said at least one object being further generated based on said detected orientation of said object.
10. Method according to claim 9, wherein the detected orientation of the object is corrected based on the position of the detected object within the video frames.
1 1 . Method according to one of the preceding claims, wherein for at least one of the objects a trajectory is determined based on the trajectory data of those tracklets to which the inertial measurement unit of the object has been assigned and the inertial data of the inertial measurement unit of the object.
12. Apparatus for tracking of moving objects within a defined area, wherein the apparatus comprising at least one inertial measurement unit, an object de- tector unit and a processing unit, wherein the apparatus is arranged for con- ducting the method according to one of the preceding claims.
13. Apparatus according to claim 12, wherein the apparatus comprising a cam- era system with at least one camera for recording the video sequence.
14. Computer programm arranged to execute the method according to one of the claims 1 to 1 1 , if the computer program is running on a computer.
PCT/EP2019/063881 2019-05-28 2019-05-28 Method, apparatus and computer program for tracking of moving objects WO2020239210A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2019/063881 WO2020239210A1 (en) 2019-05-28 2019-05-28 Method, apparatus and computer program for tracking of moving objects
DE112019007390.7T DE112019007390T5 (en) 2019-05-28 2019-05-28 Method, device and computer program for tracking moving objects

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/063881 WO2020239210A1 (en) 2019-05-28 2019-05-28 Method, apparatus and computer program for tracking of moving objects

Publications (1)

Publication Number Publication Date
WO2020239210A1 true WO2020239210A1 (en) 2020-12-03

Family

ID=66826944

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/063881 WO2020239210A1 (en) 2019-05-28 2019-05-28 Method, apparatus and computer program for tracking of moving objects

Country Status (2)

Country Link
DE (1) DE112019007390T5 (en)
WO (1) WO2020239210A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581761A (en) * 2020-12-07 2021-03-30 浙江宇视科技有限公司 Collaborative analysis method, device, equipment and medium for 5G mobile Internet of things node
CN114359976A (en) * 2022-03-18 2022-04-15 武汉北大高科软件股份有限公司 Intelligent security method and device based on person identification
CN114973153A (en) * 2022-07-27 2022-08-30 广州宏途数字科技有限公司 Smart campus security detection method, device, equipment and storage medium
CN116052095A (en) * 2023-03-31 2023-05-02 松立控股集团股份有限公司 Vehicle re-identification method for smart city panoramic video monitoring

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FARAZI HAFEZ ET AL: "Real-Time Visual Tracking and Identification for a Team of Homogeneous Humanoid Robots", 1 November 2017, INTERNATIONAL CONFERENCE ON FINANCIAL CRYPTOGRAPHY AND DATA SECURITY; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER, BERLIN, HEIDELBERG, PAGE(S) 230 - 242, ISBN: 978-3-642-17318-9, XP047473331 *
LEE DONGHOON ET AL: "OPTIMUS:online persistent tracking and identification of many users for smart spaces", MACHINE VISION AND APPLICATIONS, SPRINGER VERLAG, DE, vol. 25, no. 4, 1 April 2014 (2014-04-01), pages 901 - 917, XP035368589, ISSN: 0932-8092, [retrieved on 20140401], DOI: 10.1007/S00138-014-0607-4 *
NAZARE ANTONIO CARLOS ET AL: "Content-Based Multi-Camera Video Alignment using Accelerometer Data", 2018 15TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), IEEE, 27 November 2018 (2018-11-27), pages 1 - 6, XP033518265, DOI: 10.1109/AVSS.2018.8639468 *
ROBERTO HENSCHEL ET AL: "Simultaneous Identification and Tracking of Multiple People using Video and IMUs", 20 June 2019 (2019-06-20), XP055664340, Retrieved from the Internet <URL:http://openaccess.thecvf.com/content_CVPRW_2019/papers/BMTT/Henschel_Simultaneous_Identification_and_Tracking_of_Multiple_People_Using_Video_and_CVPRW_2019_paper.pdf> [retrieved on 20200203] *
W. JIANGZ. YIN: "Combining passive visual cameras and active IMU sensors to track cooperative people", INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION, 2015, pages 1338 - 1345, XP033204841

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581761A (en) * 2020-12-07 2021-03-30 浙江宇视科技有限公司 Collaborative analysis method, device, equipment and medium for 5G mobile Internet of things node
CN112581761B (en) * 2020-12-07 2022-04-19 浙江宇视科技有限公司 Collaborative analysis method, device, equipment and medium for 5G mobile Internet of things node
CN114359976A (en) * 2022-03-18 2022-04-15 武汉北大高科软件股份有限公司 Intelligent security method and device based on person identification
CN114359976B (en) * 2022-03-18 2022-06-14 武汉北大高科软件股份有限公司 Intelligent security method and device based on person identification
CN114973153A (en) * 2022-07-27 2022-08-30 广州宏途数字科技有限公司 Smart campus security detection method, device, equipment and storage medium
CN114973153B (en) * 2022-07-27 2022-11-04 广州宏途数字科技有限公司 Smart campus security detection method, device, equipment and storage medium
CN116052095A (en) * 2023-03-31 2023-05-02 松立控股集团股份有限公司 Vehicle re-identification method for smart city panoramic video monitoring

Also Published As

Publication number Publication date
DE112019007390T5 (en) 2022-03-03

Similar Documents

Publication Publication Date Title
US10853970B1 (en) System for estimating a three dimensional pose of one or more persons in a scene
Toft et al. Long-term visual localization revisited
Dai et al. Rgb-d slam in dynamic environments using point correlations
WO2020239210A1 (en) Method, apparatus and computer program for tracking of moving objects
CN112785702B (en) SLAM method based on tight coupling of 2D laser radar and binocular camera
US8711221B2 (en) Visually tracking an object in real world using 2D appearance and multicue depth estimations
Hu et al. A sliding-window visual-IMU odometer based on tri-focal tensor geometry
Huang et al. Structure from motion technique for scene detection using autonomous drone navigation
Henschel et al. Simultaneous identification and tracking of multiple people using video and imus
US10347001B2 (en) Localizing and mapping platform
Ruotsalainen et al. Visual-aided two-dimensional pedestrian indoor navigation with a smartphone
Ruotsalainen et al. Heading change detection for indoor navigation with a smartphone camera
CN110874910B (en) Road surface alarm method, device, electronic equipment and readable storage medium
Antonucci et al. Performance assessment of a people tracker for social robots
Sun et al. When we first met: Visual-inertial person localization for co-robot rendezvous
Nguyen et al. Confidence-aware pedestrian tracking using a stereo camera
US9990857B2 (en) Method and system for visual pedometry
Manderson et al. Texture-aware SLAM using stereo imagery and inertial information
JP2019121019A (en) Information processing device, three-dimensional position estimation method, computer program, and storage medium
Shimizu et al. LIDAR-based body orientation estimation by integrating shape and motion information
CN110052020A (en) Equipment, the control device and method run in mancarried device or robot system
Ingwersen et al. SportsPose-A Dynamic 3D sports pose dataset
US10977810B2 (en) Camera motion estimation
US20200226787A1 (en) Information processing apparatus, information processing method, and program
Minaeian et al. Crowd detection and localization using a team of cooperative UAV/UGVs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19730108

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19730108

Country of ref document: EP

Kind code of ref document: A1