WO2022260505A1

WO2022260505A1 - Self-contained distributed video surveillance method and platform for real-time event detection and tracking by means of low-cost interconnected smart cameras

Info

Publication number: WO2022260505A1
Application number: PCT/MA2022/050008
Authority: WO
Inventors: Mohammed AZZAKHNINI; Ahmed AZOUGH; Noureddine EN-NAHNAHI
Original assignee: Université Sidi Mohamed Ben Abdellah
Priority date: 2021-06-11
Filing date: 2022-06-09
Publication date: 2022-12-15
Also published as: MA53520B1; MA53520A1

Abstract

The present invention describes a self-contained distributed video surveillance method and platform for detecting and tracking events occurring in an area monitored by a dispersed collection of low-cost, loosely coupled smart cameras. The proposed method and platform relies on appropriate computer vision algorithms to perform the tasks of detecting, tracking and identifying moving objects. Subsequently, each detected object will be represented by a three-dimensional signature that uniquely identifies it. This signature is constructed by combining the visual aspects of the object as viewed from different angles. In this way, every object tracked will be recognised by the other smart cameras regardless of the angle from which it was detected. The various signatures, which are stored in the local database of each camera, are used to search for objects of interest or to establish the complete trajectory of an object in the monitored area.

Description

Title: Autonomous distributed video surveillance method and platform for real-time event detection and monitoring through low-cost interconnected smart cameras

Field of the invention:

The present invention generally relates to the field of surveillance equipment and technologies, and more particularly wireless video surveillance systems, distributed architectures and computer vision algorithms used for the detection, tracking and identification of moving objects.

State of the prior art

Nowadays, surveillance cameras are taking over the world. There are cameras everywhere; in public establishments, airports, stations, in the streets and even installed inside vehicles. This explosion in the number of cameras keeping an eye on our world demands the need to design automated video surveillance systems. An example of such a system is disclosed in the US patent entitled "Cognitive Tracker -- Appliance for Enabling Camera-to-Camera Object Tracking in Multi-Camera Surveillance Systems" (US20180308243). This patent discloses a system for cognitive tracking of objects of interest observed in multi-camera surveillance systems, which continuously calculates the salient characteristics of the objects detected in each frame of data, and assigns them a unique identifier for a predetermined duration. Each time a tracked object appears or disappears from the camera systems, the transfer register, which contains information about the tracked objects with their identifiers, is updated. A message is shared by a set of well-defined neighboring cameras in order to cause an update of their transfer register.

However, the system described above is still insufficient and does not offer an ideal solution. Among its limitations, we can cite the calculation of the salient features to be carried out continuously for each image, which can cause a deterioration in response time and performance, and require very powerful processing units since it is a question of a real-time video surveillance system. In addition, the removal of objects from the database as soon as they leave the monitored area does not make it possible to retrace the path traveled by each detected object.

Another object tracking system in a multi-camera environment is described in the patent entitled “Method and System for tracking object of interest in real-time in multi-camera environment” (US20200175697A1). This system described in said patent, and which consists of a client module and a server module, makes it possible to follow an object of interest through a set of cameras. The server module each time determines a subnet of neighboring cameras that communicate with each other. the other based on the appearance of the object of interest in the monitored area. Once the object has been detected in a camera via the client module, the server module defines a set of adjacent neighboring cameras which communicate with each other in order to ensure tracking of this object in the monitored area. However, such a system shows its limits in several respects. The sharing of information concerning the configuration of the sub-network of adjacent cameras each time between the server module and the client module increases the quantity of data which circulates in the network, which overloads the bandwidth of the network and weighs down the processing. In addition, this system relies in its processing on the server module which must each time reinitialize the sub-network of cameras which communicate together based on the last appearance of the object. Such centralized processing presents problems in case the server module crashes.

Description of the invention:

The field of video surveillance has developed considerably in recent years. Today, CCTV cameras are easily found everywhere: in hospitals, shopping malls, parking lots, train stations and even on the street. CCTV applications make it possible to observe many different positions at the same time and quickly recognize abnormal events in a scene.

However, the proliferation of cameras installed in public or private spaces makes it increasingly difficult for human operators to use and analyze such a quantity of data produced by these systems.

Many automatic video analysis techniques have been studied from a research perspective, and some are even commercialized in industrial solutions. However, most of these solutions are based on the client/server principle and use very centralized architectures with high-cost servers in order to process the continuous videos sent by the various cameras which monitor a specific area. Clients of these architectures are only limited to image acquisition where all processing tasks are performed by the central server.

Highly centralized architectures are not always the ideal solution. Companies must invest resources in order to buy and maintain servers which should be powerful enough to process video streams from different cameras. Furthermore, the failure of the central server leads to the failure of the entire system. The other problem is linked to the processing time, which can be long due to a possible overload either on the server or on the level of the network bandwidth. Systems based on architectures distributed systems are more efficient, less prone to failure, and less expensive. This makes it the ideal solution to implement in an autonomous video surveillance system.

Our invention aims to remedy the problems and limitations that existing video surveillance systems have. Indeed, the proposed method and platform is based on a distributed architecture so that each intelligent camera is autonomous and totally independent of the other elements of our proposed platform.

Brief description of the different views of the drawings:

Figure 1 represents the distribution of cameras in the monitored area and their organizations according to their roles in the platform.

Figure 2 is a flowchart showing the operations of detection, tracking and identification of moving objects in a smart camera, as well as the synchronization process between neighboring smart cameras and the central server.

Figure 3 is a simplified diagram describing the components of the system.

Figure 4 is a flowchart showing a processing operation consisting in identifying the object contained in a request and visualizing its path in the monitored area.

Figure 5 is a flowchart showing the process of merging the different captures of an object detected from several angles of view in order to build a 3D signature.

Figure 6 is a simplified diagram that shows a few daemons running in real time in the smart camera and performing a specific task.

Detailed description of the invention:

This invention describes an autonomous distributed video surveillance method for detecting and monitoring events in real time through low-cost interconnected intelligent cameras.

According to figure 2, our method of detecting and monitoring events, object of the present invention, comprises the following steps:

• A step of detecting monitored objects in the access areas of the monitored area using intelligent access cameras (1); • A step of building a multi-view 3D signature -3DSiD- which models each detected object in a unique way;

• A step of monitoring the various monitored objects in the field of vision of each smart camera agent (2);

• An identification step for each object as it passes through the different zones monitored by intelligent agent cameras;

• A step of storing the complete path traveled by each monitored object by grouping together its different positions marked by the different smart cameras;

• A step of updating the signature and the position of the object monitored by the cameras as it moves in the monitored area;

• A stage of research and recovery of objects of interest, if necessary, through the central database; and

• A step of sharing geographical information of monitored objects using a geographical location system linked to smart cameras.

The detection step makes it possible to analyze and extract any object present in each video stream. The objects are then classified and their movements are analyzed in order to detect abnormal behavior in order to prevent a dangerous situation. The detected objects are then tracked during their movements in the monitored area using an appropriate computer vision algorithm. Finally, the identification step makes it possible to assign each object of interest a unique identifier (ID) which distinguishes it from other objects. Indeed, each detected moving object is analyzed to extract a signature that characterizes it according to its visual aspects. This signature represents for an object what the fingerprint represents for a human being. These signatures are stored in our platform database and linked to identifiers (ID).

According to the invention, the step of detecting monitored objects comprises the following steps:

• Analysis of each object passing through and through the access areas of the monitored area by the intelligent access cameras (1) installed in the entrances and exits of the monitored area; and

• detection of each object passing through a smart access camera (1) using an appropriate computer vision algorithm executed by the internal system of the smart access camera.

The Construction step of a multi-view 3D signature, on the other hand, is done by • the construction of the characteristic vectors of the detected objects from the visual aspects extracted from different angles of view of each object monitored by each intelligent access camera (1);

• the merging of the different characteristic vectors extracted from the different viewing angles of the monitored object in order to construct a multi-view 3D signature -3DSiD- which uniquely represents each monitored object; and

• storage of the multi-view 3D signature -3DSiD- in the central database hosted in the central server (3) and association of a unique identifier (ID) with each stored signature.

The tracking step of the different monitored objects in the field of view of each agent smart camera (2) is done by constructing a 2D signature -2DSiD- which uniquely characterizes each object using an appropriate computer vision method , and subsequently following each object monitored by each agent intelligent camera and ensuring the recording of its trajectory traversed in the area monitored by each agent intelligent camera (2);

The identification of each monitored object during its passage through the various zones monitored by intelligent agent cameras (2) is done by comparing the 2D signature -2DSiD- of the detected object with all the 2D signatures - 2DSiD- stored locally in the smart camera's local database, and treats them as follows:

• If no match is found, the 2D signature -2DSiD- is sent to the central server database (3) to compare it with the stored multi-view 3D signatures -3DSiD-;

• If a match is found either in the local database or in the central database, the corresponding unique identifier ID is assigned to the object, otherwise the object is marked as new and its signature is added to the databases of local and central data with a new unique identifier ID.

The storage of the complete path traveled by each monitored object is carried out by tracing the trajectory traveled by each monitored object in the area monitored by each intelligent camera and the continuous recording of the trajectory of each monitored object traced by the smart cameras in the database.

According to the invention, the updating of the signature and the position of the monitored object is done by comparing the new 2D signature -2DSiD- extracted by the intelligent camera which ensures the monitoring with that already stored and built by another smart camera, and if they are different, the two signatures are merged to obtain a new signature which better represents the object.

The search and recovery of monitored objects is done by creating the multi-view 3D signature - 3DSiD - of the object to be searched for from its images given as input to the central server (3), and by comparing, thereafter, this multi-view 3D signature -3DSiD- d with all multi-view 3D signatures -3DSiD- stored in the central database in order to find a match. The unique identifier -ID- of the object in question will be sent, in case of similarity, otherwise a message will be displayed indicating that the object sought does not exist in the database.

The sharing of geographic information of monitored objects is done by linking smart cameras to a geographic location system and the storage and sharing of geographic information of each monitored object based on a geographic database shared between all of the smart cameras.

The other aspect of the invention relates to a platform comprising low-cost intelligent cameras and a lightweight central server (Fig 3). Each smart camera, which performs the method of detecting, identifying and tracking monitored objects, consists of a camera sensor (4), processing units (6) (Central Processing Unit (CPU)/ graphics processor (GPU)/ video processing unit (VPU)), connection entities (7) to ensure connectivity between the different elements of the platform, as well as memory areas (5) which guarantee space for data storage. The camera sensor (4) is capable of recording video scenes in real time, and providing frames in each time lapse which will be processed by the processing units (6). The processing units run appropriate computer vision algorithms for detecting, tracking and identifying moving objects. Indeed, these units make it possible to run the method of detection, identification and object tracking, as well as to run the various daemons for detecting objects of interest and dangerous behavior. The memory areas (5) host a local database for each intelligent camera where all the characteristic vectors of each detected object are stored as well as the geolocation information concerning each object. The central server (3) is also composed of a processing unit (6) (CPU/GPU/VPU) making it possible to run the object detection, identification and tracking method, to search for a specific object in the center database, as well as running the various daemons for detecting objects of interest and dangerous behavior, connection entities (7) making it possible to ensure communication between the intelligent cameras and the central server and (5) higher capacity memory compared to smart cameras used as central database, making it possible to store the three-dimensional characteristic vectors of each detected object (3DSiD) as well as the geolocation information concerning each object .

Smart cameras play the role of a security guard. They monitor the entire monitored area, identify each new object of interest that has just appeared for the first time, and effectively track it as it moves through the monitored area and through the smart cameras, by exchanging relevant information between all the smart cameras that make up our platform (Fig 1). The different processing operations (Fig 2) that constitute our proposed method as well as the architecture of the platform are described in the following paragraphs:

The platform is based on a lightweight central server (3) and a set of lightweight smart cameras (1,2) distributed in the monitored area which aim to monitor objects of interest. The notion of monitored object or object of interest in this patent means any object in which the user is interested. It can be living beings, vehicles, robots, luggage or any other object that needs to be monitored.

In our platform, smart cameras are grouped into two main categories: access smart cameras (1) and agent smart cameras (2).

Access smart cameras (1) are the main source of information stored in the central database. They are installed in the entry and exit areas of the monitored area (eg in front of doors). Their main purpose is to identify any moving object entering or leaving the monitored area by performing our proposed event detection and tracking method.

The operation of these cameras, which is described in Figure 5, is as follows: To access a monitored area (for example, an airport, a university campus, a train station, etc.), each object must pass through the zones of 'access. When it does, the smart access cameras (1) detect it using appropriate detection algorithms. This process constitutes the first detection step in our proposed method. Then, the visual characteristics of each detected object are extracted by several angles of view of the monitored object and are merged and used to build a 3D identification signature (3DSiD), as shown in Figure 5. Such a 3D signature would allow any object of interest to be recognized and subsequently identified by the various smart cameras, regardless of the field of view through which it is filmed. The high performance of these smart access cameras, their positions and the angle from which they capture the various incoming objects give them a closer view of the objects and ensure the extraction of more details, which allows them to build a more relevant characteristic vector and robust which remains invariant to the various changes that may occur in the appearance of the object, due for example to changes in luminosity, during its movement in the monitored area. Subsequently, each 3DSiD signature will be stored in the database (local database as well as on the central server) and will be linked to a unique identifier (ID) specific to each monitored object. This ID will be used to uniquely identify monitored objects. In addition, the smart access cameras (1) are used to detect and prevent possible dangerous situations thanks to specific daemons that run permanently in each camera (Fig 6). An example of such daemons is to compare the signature (3DSiD) of each detected object with a set of signatures (3DSiD) stored in the database which contains the different signatures (3DSiD) of forbidden objects (for example, suspicious people or wanted, animals prohibited in certain establishments, etc.) in order to research and find possible similarities. Then, if necessary, an alarm signal is sent to the authorities indicating the source of the danger.

In turn, the agent smart cameras (2) are distributed throughout the monitored area to ensure optimal coverage of the area. Each agent smart camera (2) performs our event detection and tracking method. It ensures the detection of moving objects and then builds a 2D identification signature (2DSiD) for each monitored object. The 2DSiD represents for a moving object what the fingerprint represents for a human being. It characterizes each moving object in a unique way and enables smart cameras to distinguish between different moving objects in the monitored area. Once built, the 2DSiD is stored in the local database of the agent smart camera (2). In order to facilitate the tasks of research and follow-up of objects of interest, each signature stored in the database (local database as well as on the central server) is linked to a unique identifier (ID) and specific to each object. monitored. This ID is used to identify monitored objects (Fig 2). The signature (2DSiD) of each monitored object extracted by the agent smart camera (2) is compared with those stored in the local database of the agent smart camera which monitors the area. If the same signature is present in the local database, the ID corresponding to the 2DSiD extracted from the object is assigned to the latter during its tracking. Otherwise, the camera sends the 2DSiD to the central server (3) which, in turn, performs a search to find a match in the list of 3DSiD signatures. The server compares the 2DSiD sent by the agent smart camera with the 3DSiD stored in the central database using an appropriate comparison method. Once the search for a similar signature in the central database is carried out, the central server sends the corresponding ID to the agent smart camera. If neither the camera nor the server can find a match for the signature extracted from the object monitored, it will be considered as a new object. His signature will be stored in the local and central database, and a new unique identifier (ID) will be assigned to him (Fig 2).

In addition, the smart cameras are connected to a geographical location system to ensure the recording of the geographical location of each monitored object during its tracking. These cameras work collectively by exchanging relevant information with each other (Fig 2). The 2DSiDs of monitored objects are shared between the set of smart cameras using an appropriate communication protocol. Thus, we track every object from one smart camera to another and guarantee a plot of the complete path traveled by each monitored object in the area over time. For the identification and tracking processes, no images are exchanged either between the smart cameras among themselves or between them and the central server. Only the identification signatures are exchanged in order to guarantee a fast and light exchange of information, and to ensure good management of the network bandwidth. The exchange of images is only carried out subsequently for archival and legal purposes.

To make our system more scalable and efficient, the 3DSiD and 2DSiD signatures are constantly updated each time the object enters the field of view of a new smart camera. This update is done in order to make the signature more robust by gathering more details about the visual appearance of each tracked object. For performance reasons and possible storage issues, the local database of each smart camera is constantly updated and contains only a specific set of data. Cleanup of the local database will be based on the time factor. The duration of the last appearance of the object in the camera will be calculated; if it exceeds a threshold, which will be set during the experimentation phase, the object data will be deleted from the local database after an archiving phase in the central database by sending the signature and geolocations to the central server.

Our described method allows offline searching for specific objects and returns the identifier and full path that an object has traveled over time in the monitored area. The image of the searched object is given as input to the central server, which constructs the 2DSiD which corresponds to it and compares it with the 3DSiDs already present in the database.

The platform we offer is designed for use with mobile camera sensors. In this mode, smart cameras, which are connected to a geographical location system, can be installed for example on any vehicle or in the streets and allow shared surveillance of controlled areas (FIG 7). A geographical database shared by all the cameras can thus be accessed and updated in real time by all the intelligent cameras analyzing the monitored area in order to centralize the information on a single system. Such a platform will be a good contribution to the development of smart cities by improving municipal services and citizens' lives.

An application would be, for example, the search for free parking spaces in real time thanks to the equipment of private or public transport vehicles by this type of camera, avoiding traffic jams and reporting accidents by sharing the geographical location of traffic jams in the city, or to ensure the security of cities by monitoring and searching for wanted people or cars (FIG 8).

Industrial application:

The present invention can be applied in wireless video surveillance systems.

Claims

Claims What is claimed is:

1. A method for detecting, tracking and identifying monitored objects through an autonomous and distributed video surveillance platform characterized in that it comprises the following steps:

• Detection of monitored objects in the access areas of the monitored area thanks to intelligent access cameras;

• Construction of a multi-view 3D signature -3DSiD- which models each detected object in a unique way;

• Tracking of the various monitored objects in the field of vision of each agent smart camera;

• Identification of each object as it passes through the various zones monitored by intelligent agent cameras;

• Storage of the complete route traveled by each monitored object by grouping together its different positions marked by the different smart cameras;

• Update of the signature and the position of the object monitored by the cameras as it moves in the monitored area;

• Search and recovery of objects of interest, if necessary, through the central database; and

• Sharing geographic information of monitored objects using a geographic location system linked to smart cameras.

2. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of detecting monitored objects in the access areas of the monitored area using smart cameras access includes the following steps:

• Analysis of each object passing through and through the access areas of the monitored area by the intelligent access cameras installed in the entrances and exits of the monitored area; and

• Detection of each object passing through a Smart Access Camera using an appropriate computer vision algorithm executed by the Smart Access Camera's internal system.

3. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of Construction of a multi-view 3D signature -3DSiD- comprises the following steps:

• Construction of characteristic vectors of detected objects from visual aspects extracted from different angles of view of each object monitored by each smart access camera;

• Fusion of the different characteristic vectors extracted from the different viewing angles of the monitored object in order to build a multi-view 3D signature -3DSiD- which uniquely represents each monitored object; and

• Storage of the multi-view 3D signature -3DSiD- in the central database hosted in the central server and association of a unique identifier (ID) with each stored signature.

4. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of tracking the various monitored objects in the field of view of each intelligent agent camera comprises the following steps:

• Construction of a 2D signature -2DSiD- which uniquely characterizes each object using an appropriate computer vision method;

• Tracking of each object monitored by each agent smart camera and recording of its trajectory traveled in the area monitored by each agent smart camera;

5. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of identifying each monitored object as it passes through the different areas monitored by smart cameras agentes is done by comparing the 2D signature -2DSiD- of the detected object with all the 2D signatures - 2DSiD- stored locally in the local database of the smart camera, and processes them as follows:

• If no match is found, the 2D signature -2DSiD- is sent to the central server database to compare it with the stored multi-view 3D signatures - 3DSiD-;

• If a match is found either in the local database or in the central database, the corresponding unique identifier ID is assigned to the object, otherwise, the object is marked as new and its signature is added to the local and central databases with a new unique identifier ID.

6. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of storing the complete path traveled by each monitored object comprises the following steps:

• Tracing of the trajectory traveled by each monitored object in the area monitored by each smart camera;

• Continuous recording of the trajectory of each monitored object traced by the intelligent cameras in the database.

7. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of updating the signature and the position of the monitored object is done by comparing the new 2D signature -2DSiD- extracted by the smart camera that tracks with the one already stored and constructed by another smart camera, and if they are different, the two signatures are merged to obtain a new signature that better represents the object.

8. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of searching for and retrieving the monitored objects comprises the following steps:

• Creation of the multi-view 3D signature -3DSiD- of the object to be searched from its images given as input to the central server;

• Comparison of the multi-view 3D signature -3DSiD- of the object to be searched generated by the central server with all the multi-view 3D signatures -3DSiD- stored in the central database in order to find a match; and

• Sends the unique identifier -ID- of the object in question in case of similarity, otherwise a message will be displayed indicating that the object sought does not exist in the database.

9. Method for detecting, identifying and tracking monitored objects, according to claim 1, characterized in that the step of sharing the geographical information of the monitored objects comprises the following steps:

• Linking smart cameras to a geographic location system; and • Storage and sharing of geographic information of each monitored object based on a geographic database shared between all smart cameras.

10. An autonomous and distributed video surveillance platform which executes methods of detection, tracking and identification of moving objects in real time, object of any one of the preceding claims, characterized in that it comprises the following elements :

• A set of intelligent cameras that performs the method of detecting, tracking and identifying monitored objects comprising: i. A camera sensor (4) for monitoring the area and providing frames in each time frame; ii. A memory zone (5) used as a local database, making it possible to store the multi-view 3D signatures of each detected object (3DSiD) as well as the geolocation information concerning each object; iii. A processing unit (6) making it possible to run the object detection, identification and tracking method, as well as running the various daemons for detecting objects of interest and dangerous behavior; iv. A connection entity (7) making it possible to link and communicate the various cameras with each other as well as to ensure communication between the intelligent cameras and the central server.

• A central server which executes the method of detection, identification and tracking of monitored objects, runs daemons in real time and communicates with all the smart cameras including: i. A processing unit (6) making it possible to run the object detection, identification and tracking method, to search for a specific object in the center database, as well as to run the various object detection daemons interest and dangerous behavior; ii. A memory zone (5) used as a central database, making it possible to store the multi-view 3D signatures of each detected object (3DSiD) as well as the geolocation information concerning each object; iii. A connection entity (7) making it possible to ensure communication between the intelligent cameras and the central server.