LU101413B1

LU101413B1 - Connected entities tracking

Info

Publication number: LU101413B1
Application number: LU101413A
Authority: LU
Inventors: Juste Guillermo Schwartz
Original assignee: Clear Image Ai S A R L
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2021-03-31

Abstract

A video surveillance system according includes a camera for providing a stream of video frames of an area under surveillance. The camera includes or is connected with a rangefinder for determining 3D positions of physical entities in the area under surveillance. The camera includes a processing unit configured to identify the physical entities in the video stream, labelling the entities once it has identified them, and tracking the positions of the labelled entities. The processing unit is further configured to search for and register relationships between labelled entities, where the relationships searched for have to belong to one or more predetermined relationship classes. The searching of the relationships is based on the tracked positions of the labelled entities. The processing unit is further configured to detect relationship events and to issue a notification when such a relationship event occurs.

Description

DESCRIPTION

CONNECTED ENTITIES TRACKING Field of the Invention

[0001] The invention generally relates to a video surveillance system based on one or more cameras with rangefinding capability (e.g. stereoscopic cameras or single- objective cameras combined with a rangefinder) and a context analysis module. The context analysis module associates together multiple physical entities in the camera field of view when it detects that a relationship exists between these entities, the relationship belonging to at least one among several predetermined relationship classes. Another aspect of the invention relates to a video surveillance method implemented by such a system.

Background of the Invention

[0002] Video surveillance of public and non-public spaces has become increasingly widespread over the last years. However, conventional video surveillance systems produce such prodigious volumes of footage that reviewing the entirety of the material by humans becomes impracticable. Accordingly, there is a pressing demand for intelligent video surveillance systems that pre-digest video data and reduce the information that needs human review to manageable amounts. The ideal system operates autonomously and reliably filters out all irrelevant content.

[0003] The Forbes article by Zak Doffman, “Smarter Cities: Will Autonomous Al Surveillance And loT Now Automate Law Enforcement?” pictures such an ideal system. Specifically, it mentions autonomous surveillance cameras detecting anomalies in the behavior and movement of people and vehicles and objects, where the cameras themselves learn to distinguish between anomalies and normal situations.

The article of course does not report of an actually existing system but indicates the direction into which this area of technology is heading. Nevertheless, many practical problems need still be solved before such autonomous surveillance systems become a reality.

[0004] US2007013776 discloses a video surveillance system that extracts video primitives and event occurrences from the video primitives using event discriminators.

| DP.CIAI.0001/LU 2 LU101413 The system can undertake a response, such as an alarm, based on extracted event occurrences. Summary of the Invention

[0005] The present invention goes further than previous video surveillance systems Using artificial intelligence (deep learning). Specifically, in an aspect, the invention uses a context analysis module implemented by a processing unit (e.g. a microprocessor, a graphics processing unit, a field-programmable gate array or an application-specific integrated circuit) that associates together multiple physical entities in the camera field of view when it detects that a relationship exists between these entities. The processing unitis able to keep track of that relationship, possibly even when one or several of the entities participating in the relationship get out of the camera field of view and/or re- enter it. The processing unit is configured to evaluate a stream of frames or a single frame against a set of rules defined within an alarm manager. If the analysis of the frames is recognized as a complete chain of events that is deemed as potentially relevant to the requirements of the application, an alarm will be raised. The system shall emit the data associated with the sequence of frames or a single frame that caused the alarm to be triggered.

[0006] A video surveillance system according to an aspect of the invention includes a camera for providing a stream of video frames (video feed) of an area under surveillance, the camera including or being connected with a rangefinder for determining three-dimensional positions of physical entities in said area under surveillance. The camera includes a processing unit configured to identify the physical entities in the stream of video frames, labelling the physical entities once it has identified them, and tracking the positions of the labelled physical entities. The processing unit is further configured to search for and register relationships between labelled physical entities, it being understood that the relationships searched for have to belong to one or more predetermined relationship classes (such as, e.g., group, attachment, ownership, dependence, etc.) The searching of the relationships is based on the tracked positions (i.e. the trajectories) of the labelled physical entities. The processing unit is further configured to detect relationship events and to issue a notification when such a relationship event occurs.

[0007] It will thus be appreciated that the video surveillance system targets events occurring in the relationships between physical entities. One may say that the system tries to identify patterns or anomalies of context rather than in the individual entities. However, it shall be understood that this does not exclude the possibility of the system additionally targeting events relative to individual physical entities (e.g. facial recognition, object recognition).

[0008] The rangefinder integrated with or connected to the camera could use any technology, e.g. a time-of-flight-based measurement technique, such as radar, lidar, ultrasonic ranging, etc. Preferably, however, the camera is or includes a stereoscopic camera and the rangefinder is, accordingly, a stereoscopic rangefinder.

[0009] The physical entities identified, labelled and tracked by the processing unit could be of any type (entity class), e.g., humans, animals, objects, vehicles, etc. In some applications, the physical entities could include persons.

[0010] One of the relationship events the processing unit is be configured to detect may include endings of relationships belonging to a specific relationship class.

[0011] According to a preferred aspect of the invention, the processing unit may be configured to classify the physical entities into at least two different predetermined entity classes. The search for relationships then preferably includes searching for relationships of a first relationship class, i.e. between entities of one entity class to entities of another entity class. Hereinafter, entities of different entity classes that are linked to each other by a relationship will be considered “attached” to each other.

[0012] Additionally, the search for relationships could include searching for relationships of a second relationship class, i.e. between entities of a specific entity class to entities of the same entity class. Hereinafter, entities of the same entity class that are linked to each other by a relationship will be considered to belong to a same “group”.

[0013] Optionally, the processing unit may infer additional relationships using predefined rules. For instance, if an entity of a first entity class belongs to a group of entities of that class and is at the same time attached to another entity of a second entity class, then there may be a rule to the effect that the entity of the second entity class becomes attached to all the entities of the group. In other words, the property of attachment may be shared among the group members. Such rules may be put into place in a general manner or only for specific relationship classes (groups, attachments, etc.), depending on the purpose of the surveillance.

[0014] An interesting application of a surveillance system according to the invention may be to detect luggage items becoming abandoned. In that case, the physical entities identified, labelled and tracked by the processing unit would include persons as a first entity class and luggage items as a second entity class. The search for relationships in that case includes searching for associations between luggage items and persons or groups of persons.

[0015] Another application may be the detection and, possibly the tracking of children that lost contact with their accompanying persons. In that case, the physical entities identified, labelled and tracked by the processing unit would include adult persons (or, more generally, accompanying persons) as a first entity class and children (or accompanied persons) as a second entity class. The search for relationships in that application would include searching for associations between adult persons (accompanying persons) and children (accompanied persons).

[0016] Other applications could include the monitoring of groups of people, vehicles, etc. It may be worthwhile noting that one video surveillance system may be configured to address different purposes simultaneously (e.g. identifying abandoned luggage and detecting lost children).

[0017] The searching of the relationships based on the tracked positions of the labelled physical entities could include identifying physical entities that enter the field of view of the camera together in close proximity and have same trajectories. The processing unit is preferably configured to carry out the identification of physical entities, the search for relationships and the detection of relationship events using a deep learning model. Regarding the criteria when a relationship between physical entities is deemed to exist, the processing unit may use one or more metrics to decide when physical entities are together in close proximity and have same trajectories.

[0018] The processing unit may be configured to implement a border control of the field of view of the camera, the border control comprising defining one or more entrance and/or exit zones in the field of view of the camera, and prioritizing identification and tracking of physical entities within the entrance and/or exit zones. In the remainder of the field of view, identification and tracking of physical entities may be carried out with a lesser priority and/or at lower rate.

[0019] It will be appreciated that the video surveillance system may comprise a plurality of cameras that provide streams of video frames of different portions of the area under surveillance under different viewing angles.

[0020] Preferably, the system includes a shared database that is accessible by the cameras, wherein features of the physical entities are stored for re-identification of the physical entities at later times and/or by different cameras. The shared database (P2P database) may be distributed over the cameras.

[0021] Another aspect of the invention relates to a video surveillance method, including: obtaining a stream of video frames of an area under surveillance from a camera, the camera including or being connected with a rangefinder; using the rangefinder to determine three-dimensional positions of physical entities in the area under surveillance; identifying the physical entities in the stream of video frames, labelling the physical entities once identified, tracking the positions of the labelled physical entities; searching for relationships between labelled physical entities, the relationships searched for belonging to one or more relationship classes, the searching of the relationships being based on the tracked positions of the labelled physical entities; detecting relationship events, e.g. when an existing relationship belonging to a specific relationship class ends; and issuing a notification when such a relationship event occurs.

[0022] As used herein, the verb “to comprise” shall be understood as open-ended, i.e., to have the same meaning as to include.

[0023] The term "video" refers to motion pictures represented digital form, collected by the camera(s).

[0024] The term "frame" refers to a particular image or other discrete unit within a video.

[0025] The term “physical entity” or, in short, “entity”, refers to an item of interest appearing in a stream of video frames. Examples of entities include: persons, vehicles, animals, etc.

[0026] The term "relationship event" refers to a change in a relationship that the processing unit has been trained to detect or that the processing unit has learned to be of interest or representing an anomaly. Such a change in a relationship could include the constitution, the ending, the increase in strength or intensity, the decrease in strength or intensity, etc. of a relationship. Any relationship event may be referenced with respect to a location and/or a time and/or the physical entities involved in the relationship.

[0027] It is worthwhile noting that the labelling of the physical entities may be for the purpose of system-internal referencing, such that each entity can be identified by a unique label. The action of labelling preferably includes attaching attributes to the physical entity, for instance, a reference to the entity class, estimated size, last known location, etc. Therefore, the action of labelling may be considered as creating and/or updating a profile of the entities, anonymized or not. Brief Description of the Drawings

[0028] By way of example, preferred, non-limiting embodiments of the invention will now be described in detail with reference to the accompanying drawings, in which: Fig. 1: is a is a schematic high-level representation of a video surveillance system according to an embodiment of the invention; Fig. 2: is a schematic illustration of a possible implementation of an entity occlusion estimation module; Fig. 3: is a schematic illustration of a possible implementation of an entities in-frame classification module; Fig. 4: is a schematic illustration of a possible implementation of an entities tracking module; Fig. 5: is a schematic illustration of a possible implementation of an objects association module;

Fig. 6: is a schematic illustration of the functioning of an embodiment of a paired object registration module; Fig. 7: is a schematic illustration of a possible implementation of a border control module; Fig. 8: is a schematic illustration of a possible implementation of an object status control module; Fig. 9: is a schematic illustration of the use of a P2P database that different cameras of a same video surveillance system have access to.

Detailed Description Preferred Embodiments

[0029] The video surveillance system described hereinafter uses one or more stereo cameras and machine learning to define multiple clusters of labelled entities as they enter the field of view of the camera automatically and then using these cluster labels, entity labels and any associated metadata about the entities to feed a discrete finite automata algorithm with defined rules that will determine the output of the analysis.

This allows to fit the pattern of results to a multitude of user interface designs and gives flexibility on designing different purpose interfaces without having to hardcode a single analysis pipeline for each interface. All of this analysis is performed onboard of the camera and without using some entity registration database to recognize the entities in the field of view.

[0030] The camera can be configured to process the event of a first entity leaving behind a second entity as it exits the field of view. It can also be configured to signal when another (third) entity tries to take possession of an entity left behind.

[0031] The camera may mix several techniques to trigger the alarm for the aforementioned relationship events. First the camera with a pre-trained deep learning model recognizes and classifies the entities in the field of view. This may be achieved according to several methods. The camera can refer to a predetermined list of entities classes that expresses which types of entities have to be recognized or it can use statistical inference on the entities that are moving together in the field of view and surpass a certain threshold of co-occurrence.

[0032] Once the previous rules have been established the entities are tracked in three dimensions by determining their position in space with the help of the visual depth estimation using the stereoscopic camera.

[0033] Entities are tracked and if they follow the same trajectory in space for a pre- established amount of time in the field of view are then classified as having a relationship (e.g. attachment, group), provided that they satisfy any other rule defined in the system for the finding of a relationship.

[0034] The camera is also capable of establishing attachment within groups. A group formation algorithm establishes the evolution of a group of entities of the same class (for example a group of people) and defines if the group is really connected or just on the same path due to usual group dynamic as for example strangers walking at the same pace in a corridor. Even though an entity (like a bag) might have been associated to a particular entity (like a person) if the owner entity passes the attached entity to another member of the group, ownership of the shared entity will be associated with both members of the group. This can be extrapolated to exchanges between more than two members of the group.

[0035] Once attachment has been established between an entity of a first entity class and an entity of another entity class in field of view, that attachment can be tracked anywhere in the field of view. If one of the entities on participating in the attachment relationship leaves the field of view, an alarm will be triggered by the processing unit.

[0036] The unique identity of classified entities is established through a re- identification algorithm (see e.g. “An Improved Deep Learning Architecture for Person Re-ldentification,” Ejaz Ahmed, Michael Jones and Tim K. Marks, 2015), but other methods can be applied. For animals or people it can be or include gait recognition or facial recognition. The camera can also use texture recognition, gesture recognition, etc. In summary, any methodology that neural network classification system can be programmed for may be used for that purpose. The camera can also use a pre- established dataset of pre-trained individuals’ characteristics from a pre-created database like a police facial dataset.

[0037] Independently of the methodology used (needing an external database or not) the camera classifies individual entities, so if the entity that was attached to an entity that was left behind re-enters the field of view, an alarm can be cancelled and the entities that were previously attached are re-associated.

[0038] The camera also has communication capabilities. Either using WIFI, cable or other communication infrastructures, it can connect automatically to other cameras that are part of the same network. When one of the cameras has an entity that detaches itself from the connected object by leaving the field of view, the camera transmits the event to the other cameras and sends attached the characteristics that define the entity that left (e.g. re-identification parameters, facial recognition parameters or others) to a P2P database shared by all the cameras. The other cameras of the network compare entities entering their field of view to the defining parameters in the P2P database so that any detached entity can be tracked throughout the area were the network of cameras resides.

[0039] Fig. 1 is a schematic high-level representation of the system. The processing unit of the system receives the video feed 1001 from the camera. The video is separated into frames and each frame is analyzed at the entities-in-frame (detection and) classification module 1002. This may be done by processing the images with an end-to-end deep learning classification model that can recognize entities belonging to the classes of entities that need be detected for the purposes of the surveillance system. The information from the entities-in-frame classification module 1002 is passed on to the entity occlusion estimation module 1005 and the entities tracking module 1004.

[0040] The entity occlusion module 1005 registers the entities that have been detected. This information is used to compare the exposed surface of any classified entity as it moves through the field of view to determine if it is being occluded or not. The idea is that the system can keep track of entities even when they disappear, from the camera’s point of view, behind another object. This will be explained further hereinafter with reference to Fig. 2.

[0041] The information from the entities-in-frame classification module 1002 is also sent to the entities tracking module 1004. This module is in charge of tracking the entities (in 3D) in the visual field and defining the parameters that will trigger the alarm when two entities that were considered attached become detached. It uses the results of the entity occlusion estimation module 1005 to be able to take into account entities that are in the field of view but cannot be seen. It feeds information into the group association module 1008, so that the latter can determine more complex attachments between entities, and also feeds the objects association module 1007 which is in charge of the definition of attachment between entities. The entities tracking module 1004 also saves the entities raw location status on the P2P (peer-to-peer) shared database 1003 and the re-identification module 1006.

[0042] The peer to peer database 1003 is a distributed database shared by all the cameras. It allows them to share any relevant information to the analysis of the scene or the entities in it.

[0043] The re-identification module 1006 uses an end-to-end deep learning model to characterize and recognize entities without the need of a pre-prepared entities register. The parametrization of the entities can also be saved on the P2P database 1003 so that it can be shared between cameras. For example, if an entity (e.g. a person) leaves one field of view and enters the field of view of another camera, the entity can be recognized by the second camera through the re-identification parametrization performed by the first camera. Once the objects association module 1007 analyses the link between the different entities, it processes if an alarm has to be triggered or not. This module controls the processing logic of the entity tracking module 1004, the group association module 1008, the information from the re-identification module 1006 and the analysis from the objects association module 1007. It also controls the “alarm” event logic. If the entities state link status module 1009 estimates that an event has occurred the alarm is triggered and is sent to the alarm manager 1010. The alarm manager 1010 can have varied behaviour: it can trigger sending information to an app, or to a user interface for someone to review or it can just send the information to the cloud to be stored. The alarm manager 1010 has access to the P2P shared database 1003 so that it can provide entity information to the recipient of the alarm.

[0044] The relevant sub-modules shall now be described in more detail.

[0045] Fig. 2 schematically shows entity occlusion estimation module 1005. The purpose of this module is to define the level of occlusion of a detected entity in the field of view as an occlusion index. The index is the relationship of area visible of the entity in the current frame to the maximum area exposed of the entity in any frame. Deep learning algorithms usually have an occlusion index above which they will fail to recognize the entity they are tracking. The detected entities and their 3D locations in a frame are fed to the occlusion estimation module 1005, which takes into consideration the current 2D area overlap and difference in depth distance (at entities depth distance difference estimation module 2004) to perceive an entity's occlusion index. This can be compared with previous values of the occlusion index to identify rising or falling trends. The occlusion index can be considered as part of the entity's features and/or provided to the entity tracking module. Based on this feature the software can use this cue to either switch to just tracking the object and override the deep learning classification result until the index starts growing beyond a certain lower threshold or to assume that the entity has disappeared behind an object and has to be kept located even if it is no longer visible. The module receives as input information from the entities classification module 1002 and the entities 3D location module 2002. The 2D area overlap estimation function 2003 uses the classified entities in each video frame and defines the evolution of the area on frame. Function 2006 in the entity occlusion estimation module 1005 uses this area difference estimation in combination with the variations of the distance from the entity to the camera to quantify the real area difference and calculate the occlusion index 2008. This information is then relayed to the P2P database 1003 as part of the entity features and to the entity 3D location module 2002 to keep an accurate account of the state of the entity for tracking purposes.

[0046] Fig. 3 schematically shows the entities in-frame classification module 1002. Video feed 1001 is acquired through the camera sensors. The video is separated into frames (at frame separation module 3002). Once an entity enters the field of view, its trajectory is determined by looking across frame sequences. Entities whose coordinates do not vary more than a given distance across consecutive frames are merged to form a trajectory. Once such trajectories are intialized, entities in subsequent frames are matched to the trajectories based on a deep association metric. Entities which do not match any present trajectories are considered as candidates for spawning new trajectories. The processing unit selects one of the video feeds to parse the entities in the scene at stereo image fusion module 3003. The images from the other sensor will be used to estimate the distance depth of each pixel in the image. A deep learning neural network model is used to segment or box the entities present in the image (at segmentation module 3004). The same model classifies the entities at the same time (illustrated as deep learning classification module 3005). The classification of entities and their positions within the frames are relayed to the P2P database 1003 and the entities tracking module 3007 (1004 in Fig. 1).

[0047] Fig. 4 schematically shows the entities tracking module 3007. The entities 3D location module 2002 receives polygons or boxes defining the entities that were detected in the image and also the depth per pixel from the entities in-frame classification module 1002. The 3D location of the entities is calculated and the information is sent to the entity occlusion estimation module 1005, which holds information from the previous video frame. The entity occlusion estimation module 1005 sends back the level of occlusion of the different entities on the 3D location register 4006 which updates the positioning and occlusion estimate of each entity and sends it to the P2P database 1003. Updates of the occlusion estimates are performed at occlusion update module 4004. The 3D location register 4006 sends the information to the group association module 1008 so that the entities movement evolution with respect to groups of entities in the image can be estimated. It also sends the information to the objects association module 1007 so that the individual attachment between entities can be estimated. The system uses an algorithm on probabilistic estimation of an entity belonging to a group of the same entities. Any algorithm that fulfills the criteria of being able to associate into a group based on 3D tracking and can predict when a group has split, has been merged or is crossing another group may serve the purpose of the invention. If the individual entity (for example a person) that is attached to the other entity (for example a bag) belongs to a group the attachment to the object will be shared among the group. The implication of this is that if the entity leaves the group and the field of view but the bag remains with the group, no alarm will be triggered because the object will be considered to be attached to the group.

[0048] The 3D location register 4006 also sends the location estimation and classification code to the re-identification module 1006 so that an entity in the field of view can be parametrized in such a way that the system will be able to recognize it if it leaves the field of view and then returns. The tracking functionality is supplemented to recognize objects that were in the area but that the system cannot track anymore.

[0049] Fig. 5 schematically depicts the objects association module 1007. The object status control function relies on two pieces of information: the function that registers the entities that are associated in the field of view (paired objects registration, 5003) and the function that controls if an entity is about to leave the field of view (border control, 5004). The paired objects registration module 5003 uses the location of the entities and their trajectories to establish attachment between them: If they enter the field of view together in close proximity and following the same 3D trajectory then they are defined as attached. Once in the field of view, attachment only changes if the object that is attached to the entity is left in charge of the group that the entity belongs to, if that is the case. The paired objects registration module 5003 also uses the group detection function (of group association module 1008) to determine if the attachment between entities is between two entities only or if there is group sharing of the ownership.

[0050] Once the attachment has been registered it is monitored by the object status control module 5005. The object status control module 5005 receives information from the border control module 5004, which determines if an entity is about to leave the field of view or if an entity entering the field of view was already known to the system. If the entity is coming back into the field of view, the border control module 5004 then sends a query to the re-identification module 1006 to check if the entity had been in the field of view before and, if that is the case, reassociates it with any attached entity that was left behind. The information on the attachment status of all entities on the field of view which is revised on each frame of the video is sent back to the paired objects registration module 5003. At that point, the system decides if an attachment has been broken (at entities unlink event estimation module 5007). If that is the case, a request is sent to the alarm manager 1010.

[0051] Fig. 6 schematically illustrates the functioning of the paired object registration module 5003. The entities 3D location module 2002 signals that there are new entities entering the field of view (6001) and conveys their 3D trajectories. The entities that are following the same trajectories are either classified as groups if they are of the same class or as attached if they are not of the same class. The pairing of baggage and person module 6002 gets information on potential groups by sending trajectories and entities classes information to the group association module 1008 repeatedly as the video frames keep coming in. So, the association is an iterative process as the entities run their initial trajectories. A dependent entity (like a bag) can have two associations, first to the entity of a different class that has the closest trajectory to it and second to the other members of the group of that entity. This ensures that if the attachment is transmitted to a different owner entity in the group the dependent entity is not declared unattached. At the same time the footage of ail the entities is being recorded as they enter the scene and registered at different positions and occlusion levels (6004). This will be used to register each entity in the re-identification module later with as much rich data as possible to guarantee a high probability of separating the identity of an entity from any other in the field of view.

[0052] When the entities become stationary (6005) or a certain time has elapsed since they entered the field of view or enough footage has been collected to define entities connections, group dynamics and identification parameters (6006, 6007, 6008, 6009, 6010 ...), all the parameters collected for re-identification as well as the attachment and group information are uploaded to the P2P distributed database 1003.

[0053] Fig. 7 schematically illustrates the border control module 5004 shown in Fig. 5. Border control module 5004 corresponds to the logic the system uses to decide if an entity is leaving the field of view or entering it and if several entities are coming into the field of view on the same 3D trajectory. The module is based on defining an area around the exits from the field of view were image analysis is more intense, in order to make the exit/entry control more efficient. This approach eliminates the need to track and classify the entities in the entire field of view all the time, making computation much more efficient. A relatively small area of interest is thus defined, instead of just using the borders of the image as the border so that the system has the time to do the different analyses of an entity approaching the border when they are still completely within the image. The area of interest ensures that the algorithms have enough material to do classification, etc. without needing to recoup footage from a just elapsed time interval, which would make border control quite inefficient. In other words, the area of interest corresponds to a buffer zone that allows the algorithms to work on the live frames to analyze entities in the buffer zone (which is still in the field of view) rather than to work on past frames when an entity is at the very edge of the field of view. Once the border control area is defined, the system checks continuously if there are entities within that area (7001, 7002, 7003) without attached objects (example: human without a luggage). If the entity starts entering the border a complete set of frames is collected for the analysis algorithms to go through it. If the human is not only at the border at the frame but is entering or exiting the frame (7004, 7007) then the status of the entity is updated (by object status control module 5005).

[0054] Fig. 8 relates to the object status control module 5005 shown in Figs. 5 and 7. If there is a change of status event for an entity (for example detachment) (detected at module 8001), if it is an exiting event (tested at 8003), the change of status is registered on the P2P database 1003 and an estimation on the probability of a detachment between entities is performed 8007. This estimation takes several things into account, previous exiting by the same entity, if the entity belonged to a group, etc. This probability will be passed to the alarm manager which will apply the heuristic rules or probabilistic rules defined to decide if action has to be taken. If it is not an exiting event (tested at 8003), the system will do a re-identification check (by re-identification module 1006) on the entity entering the field of view to see if it was there before (8008). If it is a new entity, a new profile is created (8011) and uploaded to the P2P database 1003. If it is a re-entry, the code does an unlink/link estimation based on previous records for that entity (8009), if the entity was paired before its status is updated (at paired objects registration module 8005) and saved to the P2P database 1003. The information is also passed to the alarm manager so that any alarm of detachment can be reversed.

[0055] Fig. 9 schematically illustrates the use of the P2P database 1003, which may be implemented as a distributed database on the individual stereo cameras. The cameras can communicate through wifi or bluetooth or using any other suitable protocol. When a camera is setup, it automatically looks for other cameras that might be covering the same area. If the cameras belong to the same sub-project then they join automatically and copy the same database structure that the other cameras are sharing. Then they automatically acquire all the data that have been collected by the other cameras. The cameras 9002 periodically ping the other cameras on the subnet they share to have an updated copy of the database. One of the capabilities of the cameras 9002 is to share re-identification parametrizations of the entities that are in their field of view as well as their location, attachment, etc., that is to say complete profiles of the entities. In the example shown in Fig. 9, one can see a person 9004 leaving the field of view of a first camera with his luggage 9003 remaining unattended in the first camera’s field of view. Thanks to the P2P subnet the person 9004 can be followed from one field of view to the next where it is recognized by the respective camera thanks to its re-identification profile shared within the P2P database 1003. This is not the only usage of this capability, it can be used to find lost children in an area full of people, it can be used to orchestrate an evacuation, making sure for example that the key responders are at the right spot of the crisis or are directed to that spot, and many other use cases.

[0056] While specific embodiments have been described herein in detail, those skilled in the art will appreciate that various modifications and alternatives to those details could be developed in light of the overall teachings of the disclosure. Accordingly, the particular arrangements disclosed are meant to be illustrative only and not limiting as to the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalents thereof.

Claims

| DP.CIAI.0001/LU 17 LU101413 Claims

1. A video surveillance system, including a camera for providing a stream of video frames of an area under surveillance, the camera including or being connected with a rangefinder for determining three- dimensional positions of physical entities in said area under surveillance, wherein the camera includes a processing unit configured to identify said physical entities in said stream of video frames, labelling said physical entities once identified, and tracking the positions of the labelled physical entities: wherein the processing unit is further configured to search for and register relationships between labelled physical entities, the relationships searched for belonging to one or more relationship classes, the searching of the relationships being based on the tracked positions of the labelled physical entities, the processing unit being configured to detect relationship events and to issue a notification when such a relationship event occurs.

2. The video surveillance system as claimed in claim 1, wherein said camera is or includes a stereoscopic camera and wherein said rangefinder is a stereoscopic rangefinder.

3. The video surveillance system as claimed claim 1 or 2, wherein the physical entities identified, labelled and tracked by said processing unit include persons.

4. The video surveillance system as claimed in any one of claims 1 to 3, wherein the processing unit is configured to detect when an existing relationship belonging to a specific relationship class ends.

5. The video surveillance system as claimed in any one of claims 1 to 4, wherein the processing unit is configured to classify said physical entities into at least two different entity classes and wherein said search for relationships includes searching for relationships of a first relationship class, wherein relationships of said first relationship class link entities of one entity class to entities of another entity class.

6. The video surveillance system as claimed in claim 5, wherein said search for relationships includes searching for relationships of a second relationship class, wherein relationships of said second relationship class link entities of a specific entity class to entities of the same entity class.

7. The video surveillance system as claimed in claim 5 or 6, wherein the physical entities identified, labelled and tracked by said processing unit include persons as a first entity class and luggage items as a second entity class.

8. The video surveillance system as claimed in claim 7, wherein said search for relationships includes searching for associations between luggage items and persons.

9. The video surveillance system as claimed in claim 5 or 6, wherein the physical entities identified, labelled and tracked by said processing unit include adult persons as a first entity class and children as a second entity class.

10. The video surveillance system as claimed in claim 9, wherein said search for relationships includes searching for associations between adult persons and children.

11. The video surveillance system as claimed in any one of claims 1 to 10, wherein the searching of the relationships based on the tracked positions of the labelled physical entities includes identifying physical entities entering the field of view of the camera together in close proximity and have same trajectories, the processing unit being configured to carry out the identification of physical entities, the search for relationships and the detection of relationship events using a deep learning model.

12. The video surveillance system as claimed in claim 11, wherein the processing unit is configured to implement a border control of the field of view of the camera, said border control comprising defining one or more entrance and/or exit zones in the field of view of the camera, and prioritising identification and tracking of physical entities within said entrance and/or exit zones.

13. The video surveillance system as claimed in any one of claims 1 to 12, comprising a plurality of cameras, for providing streams of video frames of different portions of the area under surveillance.

14. The video surveillance system as claimed in claim 13, comprising a shared database that is accessible by the cameras, wherein features of the physical entities are stored for re-identification of the physical entities at later times and/or by different cameras.

| DP.CIAIL.0001/LU 19 LU101413

15. A video surveillance method, including obtaining a stream of video frames of an area under surveillance from a camera, the camera including or being connected with a rangefinder; using the rangefinder to determine three-dimensional positions of physical entities in said area under surveillance; identifying said physical entities in said stream of video frames, labelling said physical entities once identified, tracking the positions of the labelled physical entities; searching for relationships between labelled physical entities, the relationships searched for belonging to one or more relationship classes, the searching of the relationships being based on the tracked positions of the labelled physical entities; detecting relationship events, e.g. when an existing relationship belonging to a specific relationship class ends; and issuing a notification when such a relationship event occurs.