GB2553105B

GB2553105B - Method and system for determining a relationship between two regions of a scene

Info

Publication number: GB2553105B
Application number: GB1614321.6A
Authority: GB
Inventors: Taquet Jonathan; Ouedraogo Naël
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-08-22
Filing date: 2016-08-22
Publication date: 2019-09-04
Anticipated expiration: 2036-08-22
Also published as: GB201614321D0; GB2553105A

Description

METHOD AND SYSTEM FOR DETERMINING A RELATIONSHIP BETWEEN TWO

REGIONS OF A SCENE

FIELD OF THE INVENTION

The present invention relates in general to the automatic determination of a logical topology in video surveillance systems. In particular, the present invention is directed to a method and a system for determining a relationship, for instance a target tracking or recognition relationship or an event or flow propagation relationship, between two image data sets of at least one media stream captured by at least one camera of a multi-camera networking system, the two image data sets representing two regions of a scene.

BACKGROUND OF THE INVENTION

The topology of a system of network cameras, for instance a video surveillance system, may be represented by a graph where a vertex or node is associated with a set of image data corresponding to at least a region (i.e. a portion) of a scene filmed by a network camera. In the following description, a scene is an environment comprising areas with which different cameras may be associated. Each area filmed by a given camera at a given time may be segmented into at least one region so as to obtain a finer granularity. Otherwise, a region may correspond to an area.

Two nodes of the topology graph may represent either two regions of the same area filmed by the same network camera or two areas of a scene filmed by two different network cameras at a given time, or two areas of different scenes filmed by two different network cameras.

In the topology graph, an edge between two nodes represents a link or relationship between two image data sets. The set of relationships existing between all the nodes of the graph forms the topology.

Usually, the topology takes into account whether or not a target is captured by several network cameras in the system. Thus, two network cameras recording the same target are linked. Correspondingly, two network cameras are not linked (i.e. have no relationship) when they never capture the same target.

Depending on the algorithm considered for determining the relationships between nodes (i.e. the conditions to set a link between two nodes and the robustness of this link according to at least one criterion), different topologies may exist for the same set of nodes. This algorithm defines a type of model of transition between nodes. A model of transition associated with a node quantifies the robustness of all links existing between this node and the other nodes. In other words, the model of transition associated with a node is a distribution of probabilities that an object seen by this node has been seen (and is thus formalized by an “incoming link” which corresponds to an incoming edge in the graph) or will be seen (and is thus formalized by an “outgoing link”, which corresponds to an outgoing edge in the graph) by another node.

The algorithm may include temporal considerations. In this case, the model of transition associated with a node is the probability that an object captured at a first node is or will be also captured at a second node before a predetermined amount of time elapses (transition probability).

More advanced and complex algorithms may take into account other criteria so that the model of transition associated with a node is the probability that the same object will be visible in the second region after a given amount of time, or the probability that the same object will be visible in the region if it is visible in another one or the probability that it is not visible.

In order to reduce installation cost of large video surveillance systems, automatic topology determination algorithms have been developed. They often rely on the extraction of visual descriptors from video contents that provide information on the captured scene, for example about colours (for example a descriptor can be a histogram of colours in the form of a matrix comprising 25 elements or components), textures, structures (e.g. shapes), and specific points of interest of the scene. These algorithms generally comprise a step of comparing visual descriptors relating to different regions (or areas) of a given scene. The results of the comparison are then used to determine the relationships between the nodes.

An issue with algorithms (and thus models of transition) based on a temporal probability distribution is that the longer the amount of time considered as valid for the transition between two regions, the more data there are to compare, and thus the more the comparing step is arduous.

For illustrative purposes only, several cameras capturing images each comprising the same number of regions is considered. Visual descriptors are extracted from a region and are compared to visual descriptors extracted from other regions during a considered temporal window. In order to identify relationships between one region and all the other regions, a high number of visual descriptor comparisons have to be performed. In particular, the resulting number of comparisons to be performed thus increases in a quadratic way depending on the number of descriptors.

In a distributed context in which network cameras perform the relationship estimation in a distributed manner, i.e. each network camera determines its own transition model with all other nodes (or in the case of links between regions, the transition models of all the corresponding nodes with all the other nodes), all cameras send their visual descriptors to all other cameras. Hence, a high number of visual descriptors will be transmitted over the network.

Given that the number of cameras, the number of regions and the length of the considered temporal window are usually fixed by the design of the system, the only parameters that can be optimized in order to reduce the computation cost and the amount of data to transmit over the network are the number of visual descriptors extracted during the length of the temporal window for a given region, the number of visual descriptors required to perform the relationship estimation and the size of the visual descriptors.

In the literature, the paper “Decentralized discovery of camera network topology” (ACM/IEEE Second International Conference on Distributed Smart Cameras (ICDSC), 2008 by Farrell, R., & Davis, L. S.) claims that appearance distinctiveness is key for efficient learning and proposes a decentralized approach for estimating a camera network’s topology using an information-theoretic appearance model for observation (i.e. captured data for a detected object) weighting. An information-theoretic appearance model (which is computed offline, prior to topology estimation, during a so called “Modelling Phase”) is available, in which the prior probability of observing a given appearance (i.e. descriptor) is given by a density function. The information-theoretic appearance model is a function that allows to compute (using the appearance model) a distinctiveness weight for a given appearance. This weight determines how much to emphasize distinctive appearance in matching different observations to estimate the underlying transition model.

However, a drawback of this method is that a lot of observations are required to obtain good appearance model (and so better results), and then the starting of the topology determination will be delayed by the amount of time required to construct the pre-built model. Alternatively, if the model is prebuilt from other video sequences, it may potentially be not relevant for the camera concerned by the topology estimation. Another drawback is that this method does not allow reduction in the number of observations to compare since all the observations are used with weighting values.

Consequently, there is a need for improving known automatic topology estimation methods. In particular, there is a need to provide for an automatic topology estimation method that allows reduction in the computation cost and the amount of data to transmit over the network (distributed context).

SUMMARY OF THE INVENTION

The present invention has been devised to address one or more of the foregoing concerns.

According to a first aspect of the invention, there is provided a method for determining a relationship, for instance a target tracking or recognition relationship or an event or flow propagation relationship, between two image data sets of at least one media stream captured by at least one camera of a multi-camera networking system, the two image data sets representing two regions of a scene. The method comprises: - extracting a first set of visual descriptors characterizing visual features of a first region, based on a first image data set, each visual descriptor being associated with a first extraction date; - obtaining a second set of visual descriptors characterizing visual features of a second region, based on a second image data set; - determining, for each visual descriptor of the first set, a number of occurrences and a second extraction date based on a distribution model generated based on other visual descriptors; - selecting a subset of visual descriptors from among visual descriptors of the first set in function of their determined number of occurrences and second extraction date; - comparing at least one visual descriptor of the subset with at least one visual descriptor of the second set; - based on the result of the comparing step, determining a relationship between the first region and the second region.

Therefore, the method of the invention makes it possible to reduce the complexity of the topology determination, hence the computation cost. Indeed, the number of comparisons is reduced since it is lower than the number of extracted descriptors. Also, the amount of data to be transmitted for performing such topology determination in a distributed context is reduced.

Advantageously, the solution provided applies to both intra-camera logical topology determination (determination of relationships between regions of images captured by one camera) and inter-camera logical topology determination (determination of relationships between regions of images or full images captured by different cameras). Also, it may be efficient for topology determination with overlapping cameras/regions as well as non-overlapping cameras/regions.

Optional features of the invention are further defined in the dependent appended claims.

According to embodiments, said other visual descriptors used for generating the distribution model are visual descriptors previously extracted from the first region.

For instance, the distribution model may be generated using well-known machine learning methods, such as artificial neural networks, or other data fitting technics for regression.

According to embodiments, the two image data sets are from the same media stream captured by the same camera.

According to embodiments, each of the two image data sets is from a different media stream captured by a different camera.

According to embodiments, the distribution model comprises at least one reference descriptor associated with a number of occurrences and a reference date.

According to embodiments, each reference descriptor is based on visual descriptors previously extracted from the first region.

According to embodiments, the determination step comprises, for each visual descriptor of the first set, a comparison between said visual descriptor and at least one reference descriptor of the distribution model.

According to embodiments, the determining step comprises the following steps: - computing a distance between each visual descriptor of the first set and the at least one reference descriptor of the distribution model; and - associating each visual descriptor of the first set with the reference descriptor having the minimum distance to this visual descriptor.

According to embodiments, the method further comprises a step of updating the distribution model based on the computed distances.

According to embodiments, the step of updating the distribution model comprises, for at least one visual descriptor of the subset, setting the reference date of the associated reference descriptor to the first extraction date of the associated visual descriptor.

According to embodiments, the step of updating the distribution model comprises comparing the computed distance between each visual descriptor and the associated reference descriptor to a threshold, and when the computed distance is above the threshold, adding the visual descriptor to the distribution model as a new reference descriptor associated with a reference date set to the first extraction date of the added visual descriptor.

According to embodiments, a new relationship between the first region and the second region is determined by performing again the steps according to the method aforementioned using the updated distribution model.

Hence, the distribution model used during the new performance of the method is function of past visual descriptors, i.e. extracted from the first region during the previous performance of the method.

According to embodiments, the steps of the method aforementioned may be iteratively repeated until convergence or each time a noticeable event occurs.

The regions may for instance each corresponds to a block of pixels.

According to embodiments, the number of occurrences of a given visual descriptor is based on the result of the comparing step between said visual descriptor and at least one reference descriptor of the distribution model.

According to embodiments, visual descriptors are histograms of quantized values of chroma components.

According to a second aspect of the invention, there is provided a system for determining a relationship, for instance a target tracking or recognition relationship or an event or flow propagation relationship, between two image data sets of at least one media stream captured by at least one camera of a multi-camera networking system, the two image data sets representing two regions of a scene, the system comprising computing means configured for: - extracting a first set of visual descriptors characterizing visual features of a first region, based on a first image data set, each visual descriptor being associated with a first extraction date; - obtaining a second set of visual descriptors characterizing visual features of a second region, based on a second image data set; - determining, for each visual descriptor of the first set, a number of occurrences and a second extraction date based on a distribution model generated based on other visual descriptors; - selecting a subset of visual descriptors from among visual descriptors of the first set in function of their determined number of occurrences and second extraction date; - comparing at least one visual descriptor of the subset with at least one visual descriptor of the second set; - based on the result of the comparing step, determining a relationship between the first region and the second region.

According to embodiments, at least one of the cameras of the multi-camera networking system is a Pan Tilt Zoom (PTZ) camera.

The second aspect of the present invention has optional features and advantages similar to the first aspect mentioned above.

Since the present invention may be implemented in software, the present invention may be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium, and in particular a suitable tangible carrier medium or suitable transient carrier medium. A tangible carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device or the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which:

Figures 1a to 1c illustrate an application context with one video surveillance camera in which embodiments of the invention may be implemented;

Figures 1d to 1g illustrate an application context with several video surveillance cameras in which embodiments of the invention may be implemented;

Figure 2 is a flowchart illustrating general steps of a method according to embodiments of the invention;

Figure 3 is a flowchart illustrating steps for updating the transition estimation data necessary for computing the model of transition of a given node according to embodiments of the invention;

Figure 4 is a flowchart illustrating steps for determining a set of unusual visual descriptors according to embodiments of the invention;

Figure 5 is a flowchart illustrating steps for determining a set of processed visual descriptors according to embodiments of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

In the following description, a system comprising a plurality of interconnected network cameras (e.g. video surveillance cameras) is considered.

According to general embodiments, it is provided to determine the logical links (i.e. relationships) existing between image data sets representing either different parts of an image captured by a given camera (“intra-camera” topology) or different images captured by different cameras (“inter-camera” topology) by comparing a subset of visual descriptors (“descriptors”) extracted from captured data and describing these respective image parts or images at the (first) extraction date. In the following description, the word “region” refers equally well to a part or portion of image and to an image.

Hence, the number of descriptors to be compared during the topology determination is reduced since only a subset of them is considered. It is recalled that a set SA is a “subset” of a set SB, if SA is contained inside SB, that is, all elements of SA are also elements of SB but SB comprises at least one element which is not included into SA. In addition, in contexts where the topology estimation is distributed, the number of descriptors to be exchanged between cameras is decreased.

The mentioned subset is selected according to the relevancy of considered descriptors in a considered region. The relevancy is a function of a number of occurrences, for instance the occurrence probability of the descriptors in the considered region and also the point in time (“second extraction date”) at which similar descriptors extraction were made. In particular, the relevancy of a considered descriptor is preferably evaluated based on: - an occurrence value associated with a reference descriptor (of a distribution model) similar to the considered descriptor; - the last time that a descriptor similar to the reference descriptor (of a distribution model) similar to the considered descriptor has been extracted. In this case, the second extraction date is characterized by the reference date associated with the reference descriptor.

When a considered descriptor is similar to a recent (i.e. the second extraction date is recent) or frequently observed (high occurrences number) visual descriptor, it is not selected to be part of the subset. Hence, the most outstanding (i.e. discriminatory, distinguishable or distinctive) visual descriptors are selected for topology determination.

According to embodiments, the distribution (hereafter called “distribution model”) of the visual descriptors previously extracted at a given node is enriched or refined overtime.

Advantageously, the number of false positive matches observed when the selected descriptors are compared is reduced, and the efficiency of logical topology determination is thus improved.

According to embodiments, an image is considered and the automatic topology determination comprises determining a region segmentation (or partition) for this image, and automatically estimating the relationship between the regions of the segmentation.

For illustrative purpose only, a situation where a camera films two entrances and two exits is considered. The image filmed at each instant by this camera may be segmented into four regions each comprising either an entrance or an exit. Embodiments of the present invention allows determining that there is no relationship between the regions comprising an exit and the regions comprising an entrance, whereas there are relationships between each region comprising an exit and each region comprising an entrance. Hence, when a moving target enters via an entrance, the determination of the exit it takes does not require processing the region corresponding to the other entrance.

According to embodiments, instead of regions, a subset of cameras is selected for the search (or tracking) based on the topology. These cameras are good candidates for searching for (and tracking) a target observed by a first camera, which means that good results may be achieved when considering this subset only (in spite of all the cameras available). The computation cost of this search is thus reduced and the tracking and re-identification is thus faster and better.

The determined topology can also be used in proactive systems: when something abnormal (or at least unusual) is automatically detected by a camera, the system can use the topology model to predict in which other(s) camera(s) (and so where) the anomaly may propagate.

It may control Pan Tilt Zoom (PTZ) camera(s) (or other moving cameras) to points to a given area to have a better view of the anomaly. It may also trigger alarms in areas corresponding to current or future locations of the anomaly, or warn security agents. The system may also provide a priority list of locations to be checked based on transition time and/or probabilities in the topology transition model. The transition time may be defined as the time required for an object to move from one location (camera/area) to another one, e.g., minimum time or alternatively average time. It may be further refined for different types of objects (pedestrian, car, etc.).

Advantageously, the setup of a video surveillance system is facilitated. There is no need for the installation operator to determine and setup manually the relationships between the cameras (or regions) in a large system. Hence the installation cost is reduced.

In general terms, the “model of transition” or “transition model” refers to the results of the computing/processing of transition estimation data. The transition estimation data are used to determine the model. They are thus different from the model itself. For instance, the transition estimation data may include a count of descriptors matched and a count of descriptors not matched, while the model may be the resulting probability of matching.

The transition estimation data associated with a node quantify the robustness of at least part of relationships existing between this node and the other nodes of the topology (e.g. incoming or outgoing links). Based on these transition estimation data, a probability that a relationship exists between the node and another node is computed for each other node of the system. Hence, a distribution of probabilities is obtained: this is the model of transition of this node. The whole of transition models associated with all the nodes of the system forms the topology of the system.

In the following description, the topology is determined in a distributed way, i.e. each node computes its own model of transition. However, the present invention is not limited to distributed implementations and may apply to a system where a central processing unit (e.g. a server) aggregates the image data captured by the cameras or the descriptors extracted by the cameras and computes the transition estimation data for all the nodes. In this end, the processes described below for a distributed topology determination may be easily adapted by the person skilled in the art.

Figures 1a, 1b and 1c relate to a first example of application context of the present invention with one video surveillance camera.

In particular, Figure 1a graphically illustrates this application context, where a camera 101 captures image data representing a scene 102 that is analyzed to estimate a logical topology.

The camera typically comprises capturing means for capturing media data (video and/or audio) useful for specific event detection, for instance related to a target (typically motion detection of the target), and preferably, a storage unit (e.g. a SD card or a Hard-disk) for storing media data captured locally.

Figure 1b schematically introduces some scene analysis steps for logical topology determination in the application context of Figure 1a. In this example, the scene analysis comprises two steps: region segmentation 104 and then logical topology estimation 105 for the regions thus defined. Optionally, the scene analysis may contain a feedback loop 106 used to refine the region segmentation and the logical topology determination.

According to embodiments, the region segmentation 104 may use regular grid segmentation, or it may be determined manually or automatically using a region segmentation algorithm.

Alternatively, the region segmentation 104 may be performed together with the logical topology determination 105. For instance, this may be done by starting topology determination on a fixed grid segmentation of fine granularity (e.g. blocks of size 8x8, 16x16 or 32x32 pixels), and then merging together the areas where probabilities of transition are higher than a given threshold with (during) a transition time smaller than a second threshold for example.

Alternatively, the region segmentation may be done by counting the number of unusual or outstanding visual descriptors detected in the given block during a given (predetermined) period of time, and then segmenting the grayscale image resulting from a normalized representation of the counters, using a conventional image segmentation algorithm such as an histogram thresholding followed by a connected component extraction; or a mean-shift algorithm.

Figure 1c graphically illustrates steps of the topology determination performed in the application context of Figure 1a. In this example, the captured scene 107 is analyzed to determine the logical region segmentation 108. The logical relationships between these regions are analyzed to determine a graph 109 of the possible paths between regions that an object may follow when moving in the scene. The analysis may also comprise the computing of probabilities that an object follows a given path.

Figures 1d, 1e, 1f and 1g relate to a second example of application context of the present invention with several video surveillance cameras.

In particular, Figure 1d graphically illustrates this application context with a system comprising several surveillance cameras 111, 112, 113, 114 and 115 (or network cameras) that are connected to a surveillance network.

Some cameras (e.g. 111, 112, 113 and 114) are capturing areas of a same scene with logical relations (e.g. 116), and other cameras (e.g. 115) are capturing unrelated scenes (e.g. 117).

As illustrated on Figure 1e, the captured scenes 118 are analyzed to determine a graph 119 of the possible paths between cameras that an object may follow when moving in the scenes. In this context, embodiments allow determination of existing spatiotemporal relationships between the cameras for objects displacements.

Figure 1f shows an example of such a system with several network video surveillance cameras 121, 122 in which embodiments may be implemented. In this example, the system is centralized which means that the network cameras 121 and 122 are connected to a central processing unit 123 via a network 124 thanks to respective network interfaces 125, 126 and 127. The network 124 is for instance an IP-based local or wide area network (LAN, WAN) and the communication protocol may be an IP protocol such as HTTP over TCP or RTP over UDP. It allows each camera to transmit information to the central processing unit 123. The cameras 121 and 122 may directly send their videos to the central processing unit 123. Preferably, the cameras 121 and 122 may locally perform processing on their own video (i.e. the sequenced frames they capture) and only send meta-data describing the content of their video to the central processing unit 123. In this example, the logical topology of the cameras is determined inside the central processing unit 123.

Figure 1g provides another example of system with several video surveillance cameras in which embodiments may be implemented. In this example, the system is a distributed system where several network cameras 131, 132 and 133 are connected together through a network 134, thanks to their respective network interfaces 135, 136 and 137. In such a system, the cameras generally do not exchange their videos directly with each other but rather perform computations on their own videos to analyze their content and exchange meta-data with the other cameras to inform them of specific events, or object tracking results for example. This is because cameras have limited resources and network bandwidth is also limited.

In such distributed systems, one principle often targeted is to avoid redundant processing: compared to the centralized approach presented in Figure 1f, the overall processing is distributed between cameras. For instance, each camera estimates the relations between it and each other camera. As previously mentioned, topology relationships are often estimated in two directions because they are not especially symmetric: the probability that an object seen by a first node has been seen before by a second node (incoming link) is not the same as the probability that an object seen by the first node will be seen after by the second node (outgoing link). To avoid computation redundancy, a camera may then only estimate the transition models (i.e. the whole of probabilities for a given node) in one direction: only incoming links or only outgoing ones, the other direction being handled by the other camera.

The examples described with reference to Figures 2 to 5 are based on a distributed system such as the one presented in Figure 1g. However, the present invention is not limited thereto and a skilled person may easily apply the same principles to a centralized or semi-centralized (several clusters of cameras with central servers in the clusters) system.

Figure 2 is a flowchart illustrating steps of an automatic topology estimation method.

The algorithm starts at step 201.

At step 202, a set of nodes for which a topology has to be estimated is obtained. When the kind of transition model considered is a probability that something seen in a first node (i.e. a camera/region) is or will be seen in the given (second) node (i.e. the one on which the transition estimation is performed), the transition estimation data comprise a number of visual descriptors observed by the second node and also previously observed by the first node (and received from said first node).

When the transition model is a probability conditioned by a transition time, the transition estimation data further comprise data for computing an average of the transition time, i.e. the time before the second node re-identifies an object descriptor previously observed by the first node.

The transition estimation data may also comprise a histogram of the transition time, or another way to represent multimodal transition times (such as data to represent a Gaussian mixture model for example). The transition model may also be time dependent, in order to take into account variations occurring during a day or a week.

The set of nodes obtained at step 202 represents the regions in the topology graph for which relationships have to be characterized (i.e. transition models in the graph). In the case of a system performing distributed topology estimation as shown in Figure 1g, the relationships to estimate are: - the transition models for links between other cameras/regions and the regions of the current camera, and/or - the transition models for links between the regions of the current camera.

In the case of a system performing centralized topology estimation as shown in Figure 1f, these are the models for the transitions between any regions of any cameras.

At step 204, an initial set of transition estimation data is obtained for all the nodes of the set. These data are useful for updating the transition models that have to be estimated by the algorithm.

According to an embodiment, the initial transition estimation data may be such that all the visual descriptors coming from a camera have a given probability (0, or 0.5, or 1 for example) of being found in another one.

Once these data have been obtained and until a stopping condition is fulfilled at step 205, an iterative process is repeated over time (back arrow 206) for all nodes of the set.

At each stage 208 of this iterative process, the transition estimation data of the set are updated thanks to new captured image data. Transition models of nodes can thus be updated based on these data.

According to embodiments, during this step 208, the transition estimation data for each node of the set are updated based on the comparison of visual descriptors selected according to their occurrence probability in the captured image data of the corresponding region, and preferably their (first) extraction date from the captured image data (i.e. the date of the frame from which the descriptor has been extracted).

In some cases, the update is performed simultaneously (in parallel) for several nodes of the set or for all of them if sufficient computing resources are available.

In some cases, the relationships obtained at step 204 and updated at step 208 are incoming links (i.e. transition models associated with directed edges from other nodes to the given node in the graph representation). This means that visual descriptors obtained in the past in other nodes are compared to the current visual descriptors obtained in the given node.

In some cases, the relationships obtained at step 204 and updated at step 208 are outgoing links (i.e. transition models associated with directed edges from a given node to other nodes, in the graph representation). This means that visual descriptors obtained in the past in the given node are compared to the current visual descriptors obtained in other nodes.

In some cases, the relationships obtained at step 204 and updated at step 208 are bidirectional links (i.e. both incoming and outgoing links).

When the stopping condition is fulfilled at step 205, the algorithm ends at step 209 after the updating of the transition models based on transition estimation data.

The stopping condition to fulfill at step 205 in order to end the algorithm may be a condition on the stability of the estimated topology (i.e. relationships and transition models). For instance, the stability may be checked by testing that a sum of the absolute differences of estimated values between successive iterations is below a given threshold.

The algorithm may be restarted later if required, for instance if cameras have been added into the system, or if the kind of transition model considered has been changed, for example due to a behavior change of filmed objects.

In an alternative embodiment, there is no stopping condition if it is known that the topology will change over time and the system wants to track this change without interruption.

In the case of a centralized system as shown in Figure 1f, or in the case of a camera estimating the logical topology between segmented regions of captured images, the update of step 208 can be made in the same process or device, and potentially with memory and computation resource optimizations. The communications between nodes of the same process/device may also be directly implemented by using a shared memory space for example.

In the case of a distributed system as shown in Figure 1g, the topology of the overall system may be obtained by combining relationships estimated in all the processing units of the system (i.e. all the cameras in the case of the system illustrated in Figure 1g). The knowledge of the overall topology may be used for illustrative purposes (graphical user interface, for example), or sometimes for advanced proactive notifications.

In some cases where each node only estimates the incoming links rather than the outgoing links, a given camera transmits to other cameras a representation of the estimated incoming links, and preferably of the associated transition model, estimated for its own node(s), corresponding to region(s) of its own images, leaving to the other cameras/nodes the possibility to send notification to the current camera if they detect something (for example in order to warn that something will or may arrive). This leaves the choice to the other cameras to interpret the provided relationships and to decide whether or not to send notifications.

In some cases, a first camera transmits to a second camera information about the estimated incoming links concerning the second camera only (i.e. the information for links incoming from that second camera).

In some cases, the first camera that detects noteworthy incoming links with a second camera, may directly subscribe to the second camera to receive notifications about specific events occurring in this second camera (in order to not leave to the second camera the task of interpreting the transition model).

In some cases, a first camera may subscribe to any other camera of the system (or of a cluster of cameras) to receive notifications about events relating to other cameras, even if these cameras are not especially directly related to the first camera. This allows the use of conditional transition models.

In some cases, at least one of the cameras of the surveillance system is a Pan Tilt Zoom (PTZ) camera. The PTZ cameras can move their capturing means so that they cover a larger scene than conventional cameras thanks to the displacements they can perform and with higher rendering precision thanks to their zooming ability.

However, at a given time, only a part of the scene is captured by the PTZ cameras. Hence, when a link cannot be established between image data corresponding to this part of the scene and for instance another region captured by another camera, this either means that there is no link between the two cameras or that there is a link but the relevant data to determine it are located in another part of the scene that is not captured at the given time.

Therefore, in the case of PTZ cameras, the transition estimation data (and thus the transition models, hence the topology) are updated only when a correspondence/matching between descriptors is established, but it is not updated when no correspondence/matching is determined. In this way, it is avoided to establish the inexistence of a link while it is uncertain.

According to a variant, the transition estimation data (and thus the transition models, hence the topology) are updated for the visible areas. For example, if a region is visible during a time that is longer than the descriptor tracking time window, it is sure that the received and tracked descriptors, not matched in that region for the period of the time window, have no correspondence, and thus corresponding estimation data can be updated (e.g., the number of descriptors received from the sending node and not matched, for example). This allows determining nodes for which no link has to be established.

In practice, the location of the part of the scene currently captured by a PTZ camera in the complete scene may be deduced from the PTZ parameters.

According to embodiments, the region segmentation used with a PTZ camera is a region segmentation of each specific view associated with a given configuration of the PTZ parameters (i.e. there is a specific sets of nodes for each PTZ position/zoom, preferably quantized, pair of values). In these cases, there are several predefined (or sampled) positions of the PTZ camera (referred to as PTZ position/zoom) and there is a segmentation into regions specific to each corresponding position. In the simplest case, there is a unique region by position/zoom.

In some cases with PTZ cameras, the logical relation links may be associated with (or alternatively weighted by) a reliability score that depends on the percentage of the time the PTZ has effectively captured the associated region.

Figure 3 is a flowchart illustrating steps for determining/updating the relationships between a given node and other nodes according to embodiments of the invention. These steps may be performed during step 208 shown in Figure 2. In this example, a camera may be associated with a unique node (inter-cam) or several nodes (intra-cam). Thus, in this example, the steps are performed by each camera to determine the relationships involving nodes with which it is itself associated.

Hence, in the following example, the steps are performed by a camera (or a processing device associated with it) but the descriptors are associated with regions, each region being represented by a node in the topology, and the algorithm aims thus at computing transition estimation data for the nodes in order to determine the links (and their robustness, i.e. the transition probabilities) between a considered node and the others. It should be noted that the transition estimation data are associated with links but in practice they are stored in memories associated with nodes.

In other words, the topology of the network cameras is formalized by a directed graph composed of nodes. A process manages a region of the scene represented by a node. According to an embodiment, a network camera has to process one region which also corresponds to the area captured (inter-cam). In that case, said network camera is associated with one node of the graph. According to another embodiment, a network camera has to deal with several distinct regions of a same area (intra-cam), and said network camera is associated with several nodes of the graph.

According to general embodiments, a method for determining/updating relationships between regions/areas comprises: - extracting a first set of visual descriptors describing some visual features of a first region, based on a first image data set, each visual descriptor being associated with a first extraction date; - obtaining a second set of visual descriptors describing visual features of a second region different from the first region or partially overlapping it, based on a second image data set; - determining, for each visual descriptor of the first set, a number of occurrences and a second extraction date based on a distribution model generated based on other visual descriptors; - selecting a subset of visual descriptors from among visual descriptors of the first set according to their determined number of occurrences and second extraction date; - comparing at least one visual descriptor of the subset with visual descriptors of the second set; - based on the result of the comparing step, determining/updating a relationship between the first area and the second area.

For these purposes, an exemplary algorithm starts at step 301. At step 302, a set of extracted descriptors is obtained by the camera. At step 303, a set of tracked descriptors is obtained by the camera. At step 304, the transition estimation data of the considered node are obtained. Next, during a step 305, a set of unusual descriptors is determined. This step is described in detail with reference to Figure 4. At step 306, a set of processed descriptors is determined. This step is described in detail with reference to Figure 5. At step 307, once the set of processed descriptors has been determined, it is sent to other nodes.

At step 308, a set of the processed descriptors received from the other nodes is also obtained. At step 309, the set of tracked descriptors is updated. At step 310, the transition estimation data (and, as a consequence the transition models) of the given node are updated. The algorithm ends at step 311. Optionally, the transition model may also be computed based on these updated data at each iteration (between steps 310 and 311). In a variant, the transition model may be updated only at the end of the loops, i.e. based on the last transition estimation data.

This algorithm is now described in detail.

At step 302, a set of extracted descriptors is obtained by the camera. These descriptors are or have been extracted based on image data captured locally by the camera in a given region represented by a given node, either by the camera itself or by a processing unit, using a video analytics algorithm. The descriptors provide information on the captured region or part of it (e.g. a sub-block of pixels or a salience point). For instance, they may comprise information computed on a single frame: - the colour (a mean or a median colour, one or more histograms, etc.), - the texture (histogram of gradients, covariance matrix, etc.), - the structure (shape, graph of segmented elements, etc.) or - specific points of interest (such as SIFT or SURF descriptors for example) or - a combination thereof.

Descriptors may also contain information computed on a sequence of frames and further include temporal information, such as motion behaviour, average speed.

It is possible for the extraction algorithm only to consider parts of images, named “pixel blocks” in the following (i.e., a set of contiguous pixels not necessary rectangular in shape) detected thanks to a background subtraction algorithm on the current frame of the video. The extracted descriptors may also be related to higher level analysis (e.g., Video Content Analysis), such as descriptors specific to detected target objects (human, bag, etc.).

For illustrative purposes, an example where descriptors are histograms of quantized values of the (Cb, Cr) pair extracted from the YCbCr (a.k.a YUV) representation of pixel values is considered. This kind of descriptor is robust to luminosity changes that may occur between captures of the same entity, for example in different cameras or in different regions captured by the same camera. For example, a given histogram is a matrix of 25 elements. In this illustrative example, the process of step 302 comprises segmenting a region (corresponding to a node) into blocks of pixels and obtaining a matrix of 25 elements for each block. Preferably some blocks overlap, and have various sizes (e.g. 16x16, 32x32 and 64x64).

At step 303, a set of tracked descriptors is obtained by the camera. These descriptors form a subset of the descriptors that have been extracted from image data captured in regions other than the one corresponding to the considered node (e.g., by performing a step similar as step 302 in these other nodes) and received from them, in the case of overcoming transition models(i.e. topological links incoming from other nodes).

In this case, the tracked descriptors obtained at step 303 correspond to received descriptors that have been selected by other nodes as particularly relevant and that are not too old compared to a predefined duration for considering that there is a relationship between two nodes. Alternatively, at least some of the other nodes may not implement the invention, and the received descriptors correspond to all their extracted descriptors.

The matching of these tracked descriptors obtained at step 303 with a subset of the descriptors obtained at step 302 (i.e. from locally captured image data) is checked over a predetermined number of iterations of the algorithm (on successive frames). These tracked descriptors are associated with a node identifier identifying the sending node, to allow updating of the transition estimation data with the results of the descriptor tracking.

If the outgoing transition models are determined (i.e., transition models determined for outgoing links), the tracked descriptors obtained at step 303 correspond to descriptors obtained in a previous iteration of step 302 (i.e. in a previous frame, and in the same region) and that were previously selected as particularly relevant.

The matching of these tracked descriptors obtained at step 303 with a subset of the descriptors received at step 308 (i.e. from other nodes) is checked over a predetermined number of iterations of the algorithm (on successive frames). The received descriptors are associated with a node identifier identifying the sending node, to allow updating of the transition estimation data with the results of the descriptor tracking.

According to embodiments, both incoming and outgoing transition models are determined and thus the tracked descriptors obtained at step 303 comprise both previously selected descriptors describing the region of the current node and the regions of other nodes stored in two distinct subsets. The first subset will be used for estimating incoming links while the other set will be used for estimating the outgoing links.

At step 304, the transition estimation data of the considered node are obtained. They are the result of the update of the relationships (steps 301 to 311 or equivalently step 208 of Figure 2) performed during a previous iteration, or initialized at the beginning of the topology estimation algorithm.

Next, at step 305, a set of unusual descriptors is determined based on the descriptors obtained at step 302. This step is described in detail with reference to Figure 4.

In practice, the descriptors obtained at step 302 may be filtered in order to select a subset of the extracted descriptors. This selection is based on a distribution model comprising at least one reference descriptor associated with an occurrence probability and a reference date (i.e. a reference point in time). Alternatively, the distribution model may be generated using well-known machine learning methods, such as artificial neural networks for example, or other data fitting technics.

Thus the distribution model allows determining, for a considered descriptor, at least an occurrence probability (or more generally a value relating to the novelty of a descriptor) and a reference date (or more generally a value relating to the recentness of similar descriptors previously extracted). These values obtained/determined for a considered descriptor and a given distribution model are also referred to as the result of comparing the considered descriptor to the given distribution model.

At the beginning of the topology estimation (i.e. during the first iteration of the present algorithm), the distribution model may be arbitrarily initialized, for instance using reference descriptors that may be arbitrarily chosen, for instance descriptors extracted during a preliminary step based on training video samples, and/or based on statistics computed on the descriptors extracted from the current frame or on a previous one, for instance the first one.

During the process, the distribution model may be updated based on set of descriptors extracted at step 302, or on the set of unusual descriptors determined at step 305, or on the set of processed descriptors determined at step 306.

For instance, the selection comprises the following steps: - computing a distance (or dissimilarity distance) between each visual descriptor extracted at step 302 and each reference descriptor of the distribution model; - associating each visual descriptor extracted at step 302 with the reference descriptor having the minimum distance to this visual descriptor (i.e., a given visual descriptor is associated with the reference descriptor the most similar to the given visual descriptor); - selecting a subset of extracted visual descriptors depending on the occurrence probability and reference date of the associated reference descriptors.

Thus, the distribution model may be updated based on the computed distances. For instance, updating the distribution model may comprise, for each visual descriptor extracted at step 302, the steps of: determining if the distance with the associated reference descriptor is below a threshold (i.e. they are considered to be similar), setting the reference date of the associated reference descriptor to the first extraction date of the associated visual descriptor and incrementing the occurrence counter of the associated reference descriptor.

The incrementing of the occurrence counter may be performed when the extraction date of the associated visual descriptor is higher than the reference date of the associated reference descriptor plus an arbitrarily (e.g. empirically) chosen amount of time (one second for example), in which case it means that no descriptor similar to the reference descriptor has been extracted too recently. It allows not counting several times a same descriptor that would be extracted in successive frames.

In another example, updating the distribution model may comprise comparing the computed distance between each visual descriptor and the associated reference descriptor to a threshold, and when the computed distance is above the threshold (i.e. the visual descriptor is not similar to the nearest reference descriptor), adding the visual descriptor to the distribution model as a new reference descriptor associated with a reference date set to the extraction date of the added visual descriptor and with an occurrence counter set to one.

The distribution model may also be updated by adding new reference descriptors or replacing existing reference descriptors by descriptors resulting from a processing based on several extracted visual descriptors (e.g. the mean value of several extracted descriptors), or resulting from a processing based on at least one extracted visual descriptor and at least one reference descriptor (e.g. mean shift), or resulting from a processing based on several reference descriptors (e.g. the mean value of several reference descriptors, for reducing the number of reference descriptors). Their reference date may be the most recent time at which one of the descriptors on which is based the resulting descriptor, has been extracted, and the occurrence counter may result from the sum of: - the value of the occurrence counter of each of the reference descriptors on which is based the resulting reference descriptor, and - the number of extracted descriptors on which is based the resulting reference descriptor.

The occurrence probability of a given visual descriptor is estimated based on the occurrence counter of the associated closest reference descriptor and the sum of the values of all the occurrence counters of all the reference descriptors. If the distance to the closest reference descriptor is below a threshold (i.e. they are considered as similar), the probability is estimated to be really small (or even null) because the visual descriptor is considered to be novel (no similar descriptor in the model).

Back to the illustrative example with histograms having 25 elements, the 25 elements of a histogram are stored into a vector of 25 values (i.e. matrix with a single row or a single column). The (dissimilarity) distance between two histograms is thus the Euclidian distance between the two corresponding vectors. The distance between the histogram obtained at step 302 and each reference histogram of the distribution model is computed in order to determine the closest reference histogram in the distribution model. If the corresponding minimum distance is lower than a predetermined threshold, it is checked whether the closest reference descriptor in the model has a low occurrence probability and an extraction date rather old compared to the current time (the order of these checks is not essential), and if this is the case, the considered descriptor may be selected to be in the set of unusual descriptors. When the closest reference descriptor is not close enough (above a threshold), it may be directly selected, no matter what are the occurrence probability and the extraction date of the closest reference descriptor. The distribution model is updated with the selected descriptors, or preferably with all the extracted descriptors as previously illustrated.

At step 306, a set of processed descriptors is determined based on the set of unusual descriptors previously determined at step 305, and possibly new descriptors computed based on them.

It may be predetermined at step 306 that the set of processed descriptor contains the same descriptors as the ones in the set of unusual descriptors determined at step 305, but preferably the determination of the processed descriptors 306 is a post-filtering on the unusual descriptors determined at step 305. It allows selection of the most unusual (i.e. noteworthy, discriminatory) descriptors. This step is described in detail with reference to Figure 5.

In practice, the unusual descriptors may be ordered in a priority list according to priority rules for instance depending, for each descriptor, on a value relating to its novelty, on a value relating to the recentness of similar descriptor(s) extraction, and/or on a value relating to a confidence in the model for the descriptor.

All these values are issued from the distribution model, and may be retrieved while selecting unusual descriptors.

The value relating to the novelty of a descriptor reflects the resemblance of a given descriptor with the usually extracted descriptors. It may be, for instance, an estimation of the probability of extracting similar descriptors (the more they are probable, the less they are novel), or equivalently an estimation of the frequency of encountering similar descriptor (the more they are frequent, the less they are novel). The frequency is in general related to the probability but is not bounded to values between zero and one. Alternatively, the value relating to the novelty of a descriptor may be directly a novelty score (as in the “novelty detection theory” for example), not especially directly related to a probability or a frequency: it may be a value obtained by regression for example. As an example, the frequency/probability of extracting similar descriptors may be estimated in the same way as for the occurrence probability of a given visual descriptor.

For a given descriptor, the value relating to the recentness of similar descriptor(s) extraction reflects the amount of time elapsed since the last time a descriptor similar to the given one have (or may have, if it is estimated) been seen. It may be estimated using a regression method, based on a model iteratively built using values of previously extracted descriptors and their acquisition time for example. Or it may be obtained, for instance, using the reference date of the associated reference descriptor if the distance to the closest reference descriptor is below a threshold (i.e. they are considered as similar), else the value may be set to a very high value, or infinity for instance, because the given descriptor is considered to be novel (no similar descriptor in the model).

The value relating to a confidence in the model for a given descriptor reflect how well is represented the given descriptor by the model. It may be estimated using a regression method, based on a model iteratively built using values of previously extracted descriptors for example. This model could reflect the density of the values of the extracted descriptor and the value relating to a confidence in the model could be a value reflecting the density of the model at the location of the given descriptor. For instance, this value can be the distance to the closest reference descriptor. Depending on the chosen modelling method, using the value relating to a confidence in the model may be optional in particular if the value relating to the novelty is a sufficiently precise estimation, since the value relating to the confidence may be understood as a refinement of the value relating to the novelty.

More specifically, the extracted descriptors may be associated with a class (predefined or not) using known classification algorithms, for instance based on automatic learning, identifying to which of a set of classes, each descriptor belongs. Priorities may be associated with each class, depending on the number of occurrences of encountered (extracted) descriptors that belong to that class. For instance, a class may include all descriptors that have the same (closest) reference descriptor and have their distance to this (closest) reference descriptor below a threshold. Typically, another specific class may also be defined to include all the descriptors that have their distance to the closest reference descriptor above the threshold: this class could be understood as containing any novel descriptor and may be associated with a maximum priority (not depending on the number of occurrences of novel descriptors).

The interest of using a priority list is that it makes it possible to directly extract a given number of the most noteworthy unusual descriptors. This given number may be predetermined (e.g. five descriptors) or may change overtime according to the network load, available memory, and/or on an estimated quantity of activity in the frame (for example based on a motion analysis).

At step 307, once the set of processed descriptors has been determined, it is sent to other nodes.

According to embodiments, all the processed descriptors are sent to all the other nodes 307.

According to other embodiments, it is possible to send only a subset of the processed descriptors (preferably the ones that were the first in the priority list) if the network load is too important. In this case, processed descriptors may be buffered to be sent later (in a following frame for example) or they may be dropped (not sent). If they are buffered, the buffer may be configured as a priority queue updated at each iteration with newly determined processed descriptors, in order to transmit in priority the most unusual descriptors even if they have been extracted more recently. In this case, it is preferable that after a given period of time that the descriptors are buffered, they are dropped (i.e. they are considered as being outdated because they can introduce bias in the transition estimation data if they are sent to late).

Alternatively, the processed descriptors may be sent to only a subset of the other nodes selected based on the network traffic, or on a topology determination strategy aiming at reducing the network load (for example by estimating only a subset of the links simultaneously).

According to embodiments, the extraction date of the sent processed descriptors is associated with each of them. This allows a receiving node to track and handle correctly a descriptor received at a later date.

According to embodiments, the sent descriptors are filtered to avoid redundant transmissions of the same processed descriptors in successive iterations. For example, the sent processed descriptors are stored in the node for a given amount of time (10 seconds for example). When a new processed descriptor is determined, it is compared to the stored descriptors. If the distance between the new unusual descriptor and the closest one is below a threshold (e.g. 0.1), it is dropped, or else is it added to the stored descriptors. If the given amount of time is reached for a stored descriptor, it is sent.

At step 308, a set of the (processed) descriptors received from the other nodes is also obtained. The set contains processed descriptors that have been sent by other nodes implementing steps 301 to 311 of the invention or by other nodes/cameras not implementing the invention, but that are able to transmit descriptors to other nodes.

According to embodiments, the set of received processed descriptors obtained at step 308 corresponds to the result of a descriptor transformation function applied to each processed descriptor received from (any) nodes. The parameters of the transformation function are specific to the nodes that previously sent the received descriptors. This transformation function is used to compensate for the impact, on the received processed descriptors, of a difference of acquisition context/parameters between a sending node and the current node.

Typically, for histograms or colour-based descriptors, the transformation function is similar to a brightness transfer function that allows compensation of the colour or illumination changes between an object (or a part of an object) captured in the image region associated with the other node, and the same object (or a part thereof) captured in the image region associated with the current node.

For other kinds of descriptor, the transformation may also take into account the difference of acquisition geometry between the two nodes for example. Typically, a difference of acquisition geometry may occur between two parts of an “area” captured by a same camera if a wide angle lens (fisheye for example) is used. In such a case, there are lens distortions that produce geometrical deformations that differ depending on the positions of an object in the captured area. Similarly, there may be other geometrical deformations that may affect descriptors, such as scale changes. These geometrical deformations may depend on the field, on the depth and/or on the zoom factor of the acquisition. As another example, rotation and perspective deformations may depend on the orientation of the camera. The difference in acquisition geometry between two nodes is similar to the difference between the geometrical deformations of the two nodes. Preferably, the transformation will compensate the impact of these differences on the descriptors.

According to embodiments, the parameters of the transformation function (which are specific to a given node) are determined once and for all by using the difference of camera acquisition parameters.

Alternatively, these parameters are dynamically determined and updated/optimized, in each one of the iterations of the loop 206, depending on matching results obtained at step 310.

According to embodiments, obtaining a set of received processed descriptors at step 308 is an asynchronous task. This means that the node does not wait for the complete reception of the descriptors sent by the other nodes (they will be handled in a next iteration). And so, the obtained processed descriptors are the descriptors already received but not yet handled. They can be related to past frames in another camera.

According to embodiments, the set of received processed descriptors obtained at step 308 does not especially comprise all the descriptors received. An additional filtering step is performed in order to remove received descriptors considered as less or not noteworthy. This filtering makes it possible not to take into account descriptors that are possibly considered as unusual descriptors by another node implementing the invention, but that are usual (or non-discriminating) for the current node, and thus for which there is a higher risk of false positive match in the current node.

According to embodiments, this filtering step is performed by comparing each one of the received descriptors to the distribution model described in detail with reference to Figure 4. The comparison of a received descriptor to the distribution model may be performed with the same process as the one used to perform the comparison of an extracted descriptor with the distribution model. It is used to obtain the value relating to the novelty of a descriptor, for instance by obtaining the occurrence probability of the closest reference descriptor. According to embodiments, there may be several distribution models for a same node (i.e. image region), and the distribution models may be specific to parts of an image for example. In such a case, a received descriptor may be compared to all of the several distribution models. Depending on the comparison result(s), the received descriptor may be discarded from the set of received processed descriptors. This may be the case when the received descriptor is considered as not distinguishable enough in at least one of the parts of the image.

According to embodiments, the set of received descriptors obtained at step 308 has a limited size (due to resource limitations). In such a case, all the received descriptors are compared to the distribution model of the descriptors, as described with reference to Figure 4. Only the most noteworthy received descriptors are stored according to the comparison results and to the maximum size of the set.

Alternatively, received descriptors are associated with data relating to their comparison with the distribution model obtained at the sending node and that caused them to be detected as processed descriptors by the sending nodes. The data relating to their comparison with the distribution model obtained at the sending node may include the value relating to the novelty of the descriptor, the value reflecting the recentness of the descriptor and/or the value relating to the confidence on the model, as computed and sent by the sending node. These data relating to their comparison with the distribution model obtained at the sending node are used to order the received descriptors, potentially coming from several other nodes, in a similar way as in the ordered lists used in steps 306 and 307, and to keep only the most unusual ones in the constrained size set of received descriptors.

According to embodiments, a node may also limit the number of received processed descriptors to be kept per sending node or per sending camera.

At step 309, the set of tracked descriptors is updated.

According to embodiments where the set of tracked descriptors comprises received processed descriptors (i.e. incoming links are estimated), the set of tracked descriptors is updated at step 309 by adding each one of the received processed descriptors in the set obtained at step 308 from other cameras/regions. When a descriptor is added to the set of tracked descriptors, it is associated with a matching result initialized to a specific value to indicate that it has not been compared to locally extracted descriptors yet. For instance this specific value may be a ‘zero’ value if the matching result will contain a matching score (i.e. the greater the value is, the best is the matching), or it may be an ‘infinity’ value if the matching result will contain a comparison distance (i.e. the smaller the value is, the best is the matching). This matching result will be updated later and at each iteration with the best matching results of the comparisons between the tracked descriptor and locally extracted descriptors.

According to embodiments where outgoing links from the given node to other nodes are determined and where the set of tracked descriptors obtained at step 303 comprises processed and/or unusual descriptors that were previously locally determined in the node, the set of tracked descriptors is updated at step 309 by adding all or some (the most distinguishable according to the same ordering rules at the ones used to select processed descriptors in step 306, for instance) locally determined processed and/or unusual descriptors.

At step 310, the transition estimation data (and thus the transition models) between the given node and the other nodes are updated.

According to embodiments where the incoming links are determined by the node, the update of the transition estimation data is performed using the unusual or processed descriptors, and the set of tracked descriptors.

In general, there are not so many processed descriptors and some descriptors identified as outstanding by other nodes may appear to be less relevant (merely unusual) to the given node.

Embodiments in which only processed descriptors are considered for the matching are particularly advantageous for limited resources systems.

According to embodiments where the incoming links are determined by the node, this step 310 comprises comparing each one of the unusual or processed descriptors locally determined by the given node with each one of the tracked descriptors received from another node and still tracked. And for each tracked descriptor updating the associated matching result with the result of the best of the matches (according to comparison results) with the unusual descriptors if it is better than its current value (obtained in a previous iteration). The update of the set of tracked descriptors at step 309 may also include a step of removing the descriptors that are too old (i.e. the associated extraction date indicates that they are older than a given tracking time limit, e.g. 60 seconds). If a descriptor is removed, the relationship between the current node and the node from where the descriptor comes from (the other node) is updated according to the results of the best descriptor match. For example, if the transition estimation data comprise the number of descriptors obtained by the current node from the other node and the number of descriptors from the other node that have been re-identified in the current node, a counter of obtained descriptors ‘O’ is incremented by one and, if the best matching result for the given tracked descriptor is better than a given threshold (e.g. greater than 90% in the case of a matching score, or lower than 0.1 in the case of a matching distance), a counter of how many have been re-identified ‘R’ is incremented by one; the transition probability can then be estimated as ‘R/O’.

According to embodiments where outgoing links are determined, the update step 310 uses the set of tracked descriptors and the set of received descriptors. This step comprises comparing each one of the tracked descriptors Gust or previously extracted locally by the node) to each one of the processed descriptors just received from another camera/region and for each received unusual descriptor, associating the best of its matching results with any of the tracked descriptors. The update of the set of tracked descriptors at step 309 may also include a step of removing the descriptors that are too old (i.e. an associated acquisition date indicates that a tracked descriptor is older than a given tracking time limit, e.g. 60 seconds). The transition estimation data and so the relationship with a given node are updated for each one of the descriptor received from this given node.

For example, a counter of the number of unusual descriptors received from that given node ‘0’ is incremented by one for each processed descriptor received from that node (and kept for the comparison), and a counter of the number of descriptors reidentified for that node ‘R’ is incremented by one each time the received descriptor from that node obtained a best matching result better than a given threshold (e.g. greater than 90% for matching score or lower than 0.1 for a matching distance); the transition probability can then be estimated as ‘R/O’.

According to embodiments, each one of the sent processed descriptor 307 is associated with pieces of information containing data relating to their comparison with the distribution model and that caused them to be detected as processed descriptors. The data relating to their comparison with the distribution model may include the value relating to the novelty of the descriptor, the value reflecting the recentness of the descriptor and/or the value relating to the confidence on the model. Especially for embodiments where a limited size is available for the set of tracked descriptors obtained in 303 (limited use of memory and processing resources), it allows the receiving nodes to keep in preference the most noteworthy ones. These pieces of information could also be used to weight the matching results when performing the update of the transition estimation data at step 310. This is in order to give more importance to the least usual ones. The advantage of transmitting this piece of information is that it allows a receiving node to select the descriptors that it will track without having to compare them to its own distribution model. It allows reducing computing resources.

The algorithm ends at step 311.

According to embodiments where incoming links are determined, the graph representation of the topological relationships may also include source nodes that do not directly represent a camera or a region. Such a node is not processed by the loop 206 of Figure 2. It is used to model incoming links from a source for the estimation of a transition model for object/descriptors appearing that have not been previously seen elsewhere during the considered time window. Thus, at the update step 208 of Figure 2, the processing for a given node further comprises updating a probability model for a link incoming from a source.

According to embodiments, this is done by counting the number of processed descriptors detected for the given node, that are not similar (according to a threshold based on the comparison result) to any of the descriptors in the set of tracked descriptors 303.

According to embodiments where outgoing links are estimated, the graph representation of the topological relationships may also include sink nodes that do not directly represent a camera or a region. Such a node is not processed by the loop 206 of Figure 2. It is used to model outgoing links to a sink for the estimation of a model for object/descriptors disappearing (i.e. that are no more seen elsewhere) during the considered time window. Thus, at the update step 208 of Figure 2, the processing for a given node further comprises updating a probability model for a link outgoing to a sink.

According to an embodiment, this is done by counting the number of processed descriptors in the set of tracked descriptors 303 that are not matched at the end of the considered time window.

According to embodiments where the transition model includes probabilities that something captured in a first region associated with a first node is or was visible in a second region associated with a second node, the relationships are based on a count of the number of processed descriptors determined in the current node (during a given period of time for example), and, for any other node, a count of the number of processed descriptors received from that node that were matched with a descriptor of the current node. The ratio of these two numbers allows estimation of the probability that a processed descriptor extracted by the current node was previously seen in the other node.

According to embodiments, the transition model includes probabilities that an object detected in a first node will be next detected in a second node, and/or the probabilities that an object detected in the second node is directly coming from the first node, such that only direct transition are considered for these probabilities, i.e. for the nodes that are contiguous on the path of an object in the topological graph. To perform the estimation of the probabilities, communications between nodes are used to determine which one of the nodes first matched a given descriptor (coming from a given node). Then the probabilities are computed as in other embodiments, but using the number of firstly matched descriptors for a given node instead of using any matched descriptor.

Figure 4 is a flowchart illustrating steps for determining unusual descriptors according to embodiments of the invention. It corresponds to a possible implementation of step 305 of Figure 3.

The algorithm starts at step 401.

First, a distribution model of the descriptors is obtained at step 402. The set of extracted descriptors (obtained at step 302) is compared to this distribution model at step 403. It corresponds to comparing each one of the extracted descriptors to the distribution model. According to comparison results, a set of unusual descriptors is determined at step 404. At step 405, each one of the determined unusual descriptors is associated with the result of its comparison with the distribution model. Finally, and optionally, the distribution model is updated using the extracted descriptors according to comparison results at step 406. The algorithm ends at step 407.

This algorithm is now described in detail.

At step 402, a distribution model of the descriptors is obtained. As already mentioned, this distribution model may comprise a given number of reference descriptors, each associated with a reference date (e.g. it corresponds to the last time of extraction of descriptor similar to the reference descriptor) and an occurrence probability (e.g. it corresponds to a number of occurrences of descriptors similar to the reference descriptor and previously extracted). As explained in the following description, other reference descriptors may be added to the distribution model during the processing (i.e. the distribution model may be updated).

The distribution model may be initialized at each video frame handled during the topology estimation algorithm as shown in Figure 3, for instance with statistics on the extracted descriptors on that frame. In a variant, it is initialized only at the beginning of the topology estimation algorithm, using the first video frame or a background image model of the camera handled during the topology estimation algorithm.

Alternatively, the initialization of the distribution model may be performed using statistics computed offline on descriptors extracted from videos samples (not especially coming from the camera), and/or on video data acquired by the camera previously to the start of the topology estimation.

The distribution model obtained at step 402 may comprise different kinds of occurrence probabilities that are used at step 403 to estimate the relevancy of an extracted descriptor.

For instance, with a nearest neighbour modelling, each reference descriptor may have a given occurrence probability and a given reference date.

In practice, classes/clusters are defined based on known unsupervised learning algorithms. Then, each given (extracted) descriptor is associated, at step 403, with one class, for instance by calculating the distance between the considered descriptor and each reference descriptor representative of each class in the case of a nearest-neighbour algorithm.

Other classification methods may be used as well, for clustering the values of the extracted descriptors. Such classification methods may include support vector machine or neural network based algorithms as well as fast/accelerated closest neighbour searches relying on locality sensitive hashing.

The occurrence probability associated with a given class is the occurrence probability associated with the corresponding reference descriptor, (e.g., the one representative of that class).

The reference date associated with a given class is the reference date associated with the corresponding reference descriptor, (e.g., the one representative of that class). A “recentness” score may be computed, at step 403, based on the reference date of the reference descriptors of the class, and based on the extraction date of an extracted descriptor. An example of recentness score is described below.

The clustering of the possible descriptor values (i.e. of any value that could take an extracted descriptor) into classes/clusters is particularly advantageous in the cases where extracted descriptors may have more than hundreds of possible values, which is the case with Real valued descriptors and/or with high dimensionality descriptors (e.g. matrices of 25 elements), e.g., when using the histograms of quantized (Cb, Cr) pairs for instance. The clustering allows not to store statistics for each possible value of a descriptor, which would require a lot of memory (and would even be impossible in the case of Real valued descriptors for instance), and also a lot of extracted descriptors for learning statistics before having a good estimation of the distribution model. Thus the clustering allows to reduce the memory usage, and to reduce the number of extracted descriptors necessary to have a good estimation of the distribution. This clustering cost is linked to the cost of determination of the cluster to which belongs a given descriptor, when comparing the given descriptor to the distribution model, or when updating the model statistics with the given descriptor. For instance, in the case of the previously presented nearest neighbour clustering, the given descriptor is compared to each one of the reference descriptors of the model, in order to find the nearest one and so the cluster to which belongs the given descriptor and the associated statistics.

The occurrence probability may be based on a simple count of occurrences”. In practice, each reference descriptor is associated with a value corresponding to the number of times that an extracted descriptor similar to the reference descriptor is identified at step 403. When comparing a(n) (extracted) descriptor to the distribution model, instead of obtaining an occurrence probability, a score of “distinctiveness” or a score of “habituation” may be computed and obtained instead.

The habituation score obtained for a given extracted descriptor depends on the value of the occurrence counter associated with the closest reference descriptor and on the distance denoted distance_ref between the extracted descriptor and the reference descriptor. The higher the counter, the higher the habituation score. Inversely, the higher the distance, the lower the habituation score. For instance, the habituation score may be computed according to the following formula:

Similarly, the distinctiveness score obtained for a given extracted descriptor depends on the value of the occurrence counter associated with the closest reference

descriptor and on the distance between the extracted descriptor and the reference descriptor. However, the higher the counter, the lower the distinctiveness score. Inversely, the higher the distance, the higher the distinctiveness score. For instance, the distinctiveness score may be computed according to the following formula:

Hence, the higher the distinctiveness score is (respectively, the lower the habituation score is), the more the extracted descriptor is different from the previously extracted descriptors (according to the distribution model).

According to another example, the distinctiveness score obtained for a given extracted descriptor (at step 403) may be computed thanks to a “k-closest neighbours search”.

For the given extracted descriptor, the reference descriptors are ranked according to their distance (e.g. Euclidian distance between the histograms) to the given extracted descriptor, and the k-closest neighbours are the k reference descriptors having the smallest distances to the given extracted descriptor.

Each given reference descriptors is associated with an ‘A’ value quantifying the number of extracted descriptors previously used to update the distribution model in step 406 (in previous iterations of the algorithm) and having the given reference descriptor as first closest neighbour. The model is also associated with ‘13’ value equal to the sum of the ‘A’ value of all the reference descriptors (i.e. it is equal to the total number of extracted descriptors previously used to update the distribution model in step 406). Alternatively, the ‘A’ value may correspond to the number of extracted descriptors previously used to update the distribution model in step 406, and having the given reference descriptor as one of their k-closest neighbours. The ‘B’ value still corresponds to the sum of all the ‘A’ values (i.e. it is equal to ‘k’ times the total number of previously extracted descriptors.

Using these ‘A’ and ‘B’ values, a regressive model of the scores of “habituation” or of “distinctiveness” is obtained using an extrapolation process. This process is performed for any extracted descriptor value based on ‘A’ and ‘B’ values of its closest neighbours. For example, a score of “habituation” H for a given extracted descriptor is computed according to the following formula:

Where:

- Ak is the A value associated with the k-th closest reference descriptor of the given extracted descriptor; - B is the sum of the A values of all the reference descriptors; - dk is the distance between the k-th closest reference descriptor and the given extracted descriptor.

Accordingly, a “distinctiveness” score could be computed according to the following formula:

According to embodiments, the “habituation” and “distinctiveness” scores could be computed according to other functions, e.g. a function decreasing with the distances ‘dk1 for “habituation”, or a function increasing with the distances ‘dk for “distinctiveness”.

Also, a “recentness” score may be computed based on the reference date of the reference descriptors.

For example, the recentness score may be the time difference between the extraction date associated with the extracted descriptor and the reference date associated with the closest reference descriptor.

Alternatively, it may be computed as a function of each one of the distances between the extracted descriptor and each one of the k-closest reference descriptors, and of the extraction date associated with the extracted descriptor and of reference date of associated with each one of the k-closest reference descriptors. This is regressive model/ extrapolation method.

For example, the “recentness” is computed according to the following formula:

where: K is the number of considered closest neighbours (K-closest neighbours); - T is the extraction date of the extracted descriptor; - tk is the extraction date associated with the k-th closest reference descriptor;

- dk is the distance between the extracted descriptor and the ‘k’-th closest reference descriptor.

At step 403, the set of extracted descriptors obtained at step 302 is compared with the reference descriptors of the distribution model obtained at step 402.

This step 403 depends on the nature of the distribution model obtained at step 402. According to the distribution model of descriptors used in embodiments, each one of the extracted descriptors obtained in 302 (or a subset of them) is compared to the distribution model (i.e. classification/regression/extrapolation is performed for each value of the extracted descriptors) to obtain a “habituation” or a “distinctiveness” score, as well as a “recentness” value.

For instance, an artificial neural network is used to determine a number of occurrences (or directly an estimation of the habituation score). A visual descriptor (e.g. in the form of a list of numbers) is provided as an input of the artificial neural network (i.e. to be used by its first input layer). In response to the given input, the output layer of the artificial neural network provides one value which is an estimation of the number of occurrences of the visual descriptor.

The artificial neural network could also be used to estimate a “recentness” value by adding a second output value to the output layer that would provide an extraction date (“second extraction date”). But preferably a k-nearest neighbour algorithm is used to provide that extraction date estimate. The distribution of the descriptors is thus modelled by using the synaptic weights of each layer of the artificial neural network previously trained/updated to estimate the habituation score, and by a set of reference descriptors associated with an extraction date for the estimation of the “recentness” value.

Also, a “distinguishable” score may be computed when comparing an extracted descriptor to the model during step 403. This score represents the dissimilarity of the extracted descriptor with reference descriptors of the model. It indicates how badly the extracted descriptor is represented by the model: the higher the value, the less the descriptor is well described by the model. For instance, the distinguishable score may be the distance between an extracted descriptor and its closest reference descriptor.

The distinguishable score may be useful to determine whether the distribution model is to be updated using the given extracted descriptor (i.e. the given extracted descriptor is considered as novel and is to be learnt), if it is above a given threshold (e.g. 50%) for example. For example, in the case of the closest neighbour based model, it means that the given extracted descriptor is to be added as a new reference descriptor at step 406.

At step 404, according to comparison results from step 403 (i.e. based on the habituation or distinctiveness score, the recentness value and/or the distinguishable score), a set of unusual descriptors is determined.

If it has been determined at step 403 that the model is to be updated using a given extracted descriptor (based on the distinguishable score), this given extracted descriptor is considered as unusual and is added to the set of unusual descriptors. Otherwise, if the result of the comparison of a given descriptor with the distribution model makes it possible to obtain (if appropriate/available) a recentness score that is below a threshold, the extracted descriptor is considered as recent. In this case, it is ignored and it is NOT added to the set of unusual descriptors. This makes it possible to avoid considering the same descriptor/object several times in successive frames for example. Otherwise, if the comparison allows obtaining a habituation score that is below a threshold, the extracted descriptor is added to the set of unusual descriptors.

According to embodiments, at the beginning of the step 404, the set of unusual descriptors is empty. Alternatively, at the beginning of the step 404, the set of unusual descriptors initially contains a subset of the unusual descriptors of the previous iterations (the most discriminative ones) that were not sent.

At step 405, each one of the determined unusual descriptors is associated with the result of its comparison with the distribution model, i.e. the habituation or distinctiveness score, the recentness value and/or the distinguishable score.

Finally, and optionally, but preferably, the distribution model is updated with the unusual descriptors according to their comparison results at step 406.

Where each reference descriptor (or at least each class) of the distribution model is associated with a reference date, this reference date may be updated as follows: - if an extracted descriptor has been identified as belonging to a given class, the reference date associated with this class (e.g. representing the last date at which the descriptor belonging to the given class was identified) is updated with the extraction date of the extracted descriptor (e.g. the date of the current frame); - if an extracted descriptor has been identified as being similar to a reference descriptor (distance to this reference descriptor below a minimum distance), and that reference descriptor is belonging to the k-closest ones (or alternatively, that reference descriptor is the closest one), the reference date associated with this reference descriptor (e.g. representing the last date at which the same descriptor was identified), is updated with the extraction date of the extracted descriptor (e.g. the date of the current frame), if it is more recent than the reference date; - in the case of a regressive determination of the reference date, the update depends on the regressive model used and on implementation choices. For example, in the case of a regressive model using a k-closest neighbour search, the update of the reference date associated with reference descriptors of the model may, for instance, be performed by adding to them their difference with the current date weighted by their contribution in the regressive determination of the reference date (e.g. for a given reference descriptor,

is used to weight its time difference with the time of the current frame accordingly to the minimum distance dk obtained for any extracted descriptor of the current frame to which the reference descriptor was a closest neighbour). Alternatively, the update of the reference date associated with reference descriptors of the model is performed by simply setting to the date of the current frame the reference date of each of the reference descriptors that has been found as the closest neighbour of at least one extracted descriptor in the current frame (i.e. it is set to have the date of the current frame as reference date).

The update of the item of information used to compute the “habituation” score or “distinctiveness” score (e.g. the ‘A’ values and implicitly the ‘B’ value) is also updated.

When each reference descriptor (or at least each class) of the distribution model is associated with an occurrence counter, this occurrence counter may be updated as follows: - if an extracted descriptor has been identified as belonging to a given class, the occurrence counter associated with this class (e.g. representing the number of extracted descriptors identified as belonging to the given class) is incremented by one; - in the case of a regressive determination of the “habituation” or “distinctiveness” score, the update depends on the regressive model used and on implementation choices. For example, in the case of a regressive model using a k-closest neighbour search, the update of the occurrence counter associated with reference descriptors of the model may, for instance, be performed by incrementing them by a constant value (e.g. one) weighted by their contribution in the regressive determination of the “habituation” score ore “distinctiveness” score for each of the

extracted descriptor for which it is one of the k closest neighbour (e.g. for a given reference descriptor,

is used to increment its occurrence counter according to the distance dk obtained for a given extracted descriptor to which the reference descriptor was a closest neighbour).

According to embodiments, other incremental/online learning methods based on classifications algorithm may be used to model the distribution of the descriptors, such as support vector machine or neural networks. Some of these methods may be used to perform directly a probabilistic/statistical modelling of the descriptors (i.e. directly estimating the occurrence probability) as in the literature of the novelty detection. The update of the distribution model is then done by directly applying the incremental learning procedure of the algorithm to the new extracted descriptors. Other methods that only perform a clustering of the extracted descriptor require to rely on a per class/cluster statistical modelling, using for instance, occurrence counters as previously described. In such a case, the update necessitates to perform the incremental learning of the extracted descriptors, and then to update the occurrence counter of the classes/clusters according to the classification of each one of the extracted descriptor. Most of the classification methods also allow obtaining a confidence score on the classification of a given extracted descriptor. That confidence score can be seen as similar to the distance between an extracted descriptor and the closest reference descriptor in the case of the k-nearest neighbour. Thus, the confidence score can be used directly as a distance (or a function of the confidence score can be used as a distance) and so can be used for weighting the update of the occurrence counters of the different classes.

According to embodiments, two distinct models (and eventually two different modelling methods) are used for the distribution model of the descriptors. A first one is used for modelling the temporal information (i.e. a specific model is used to obtain a reference date for a given extracted descriptor) and another one is used for modelling the occurrence frequency of the descriptors (i.e. a specific model is used to obtain the “habituation” score or the “distinctiveness” score for a given extracted descriptor). It means that the data (e.g. the reference descriptors, or the classes) used internally by the two models are not especially the same. This allows having more dedicated and efficient models and also offers more degrees of liberty while tuning the system (processing and memory consumption versus topology estimation efficiency for example).

According to embodiments where the “recentness” value is used, the occurrence counters are preferably updated considering only the extracted descriptors that were not ignored because of the “recentness”, thus making it possible not to consider an unusual descriptor as becoming usual if it stays a long time at the same position for example.

In the update step 406, according to embodiments where a dynamic clustering or regressive model is used, new classes or reference descriptors (or other modelling information) are added to the model if extracted descriptors that were compared to the model were not well represented (i.e. “distinguishable” score is above a threshold). In the case of a k-closest neighbour search for example, it consists in adding new reference descriptors in the model.

The model has a limited size, in order to limit the memory used, and to limit the computation when comparing an extracted descriptor to the model. If the size limit is reached, classes or reference descriptors (or other modelling information) are removed. For instance, the choice of reference descriptors or classes to remove may be performed based on the “habituation” or “distinctiveness” associated with the class or with the reference descriptor (e.g. based on the occurrence counter ‘A’ value). The less usual class(es)/reference descriptor(s) (e.g. the one(s) with the lowest occurrence counter(s)) are removed before adding new class(es)/reference descriptor(s). In another example, the number of classes/ reference descriptors is reduced by merging some of them. For these purposes, information about the closest ones are merged (e.g. counters are summed, temporal information if any is set to the most recent one, representative descriptor of the cluster is computed as a weighted sum of the reference descriptors according to the counters). Alternatively, if temporal information is used in the model, the classes or reference descriptors that are associated with the oldest time information are removed (e.g. the oldest ones according to their time information).

According to embodiments, only some of the extracted descriptors are compared to the distribution model. In this case, the extracted descriptors, of a given frame for instance, are placed into clusters/classes (e.g. similarly as done for the distribution model) and a representative descriptor is chosen for each class (e.g. similarly as done for reference descriptors of the model) and only the representative descriptors of all the classes are compared to the distribution model at step 403. As well, if the statistical model is dynamically updated, only the representative descriptors of all the classes are used to update the distribution model at step 406. Hence, the number of descriptors to consider and the complexity of the computing are reduced.

If the extracted descriptors are too noisy, i.e. there is a high randomness part in the extracted descriptors, the descriptors are preferably filtered (possibly using the distribution model of the descriptors obtained at step 402, to improve de-noising) to reduce the noise and improve the efficiency of the algorithm.

According to embodiments, the distribution model is specific to pixels blocks that correspond to a part of an image smaller than the region associated with the corresponding node. Thus, if a descriptor is extracted at a location in the video belonging to a given pixels block, it is compared to the specific distribution model of that pixels block at step 403. In the case of a dynamic distribution model (i.e. updated over time), the update of the distribution model at step 406 is also performed for each specific pixels block., The result of the comparison and more precisely the set of unusual descriptors is still associated with the node. A possible implementation would be to run the algorithm of Fig. 4 for each pixels block of a given node, and to merge together the sets of unusual descriptors obtained for each pixels block at step 404 into one set of unusual descriptors.

For example, the spatial pixels blocks result from a splitting of the camera field into rectangular pixels blocks of fixed size. The advantage of using a splitting into pixels blocks is that the distribution model may have a smaller size, and thus the comparison of an extracted descriptor with the distribution model is faster. Another advantage of using a splitting into pixels blocks is that the model is more specific to that zone, and so occurrence statistics allow detecting locally unusual/outstanding descriptors that could be considered as usual if the whole region was considered. This is better for anomaly detection for example. If no background subtraction algorithm is used prior to descriptors extraction, the splitting into pixels blocks allows directly modelling the background information locally and so efficiently ignoring background because it is “usual” or “recent”. Thus a background subtraction algorithm is not necessary, and the algorithm starting at step 401, performs as efficiently as if a background subtraction was used.

Figure 5 is a flowchart illustrating steps for determining processed descriptors according to embodiments of the invention. These steps may be performed during step 306 shown in Figure 3. In this example, a camera may be associated with a unique node (inter-cam) or several nodes (intra-cam). In this example, the steps can be performed by each camera to determine the relationships between nodes with which it is itself associated.

The algorithm starts at step 501. A set of descriptors is obtained at step 502. This set of descriptors is updated based on the current frame at step 503. At step 504, processed descriptors are extracted from the updated set of descriptors. The algorithm ends at step 505.

This algorithm is now described in detail.

At step 502, a set of descriptors is retrieved. This set may be empty or alternatively it may comprise some unusual descriptors determined in previous frames (if any), for instance at step 305 of Figure 3, or a combination of them (e.g. a mean value of several descriptors).This set of descriptors may have an arbitrary size that allows limitation of memory use and computation resources necessary when comparisons or ordering are performed.

According to embodiments, the set of descriptors obtained at step 502 is ordered according to the results of comparison associated with the unusual descriptors at step 405 of Figure 4. It allows simplification of the management of the set and in particular the removal of descriptors if a limited (arbitrary) size set is used: when a descriptor is determined as being ordered outside of the limited size, it can be ignored/removed. For these purposes, the descriptors of the set may be ordered in a priority queue according to ordering rules (or criteria). For instance, these ordering rules may give the highest priority to the descriptors that have a distance to the reference descriptors of the distribution model above a threshold (e.g. 0.5), and give a lower priority to the descriptors with the highest distance to the reference descriptors. For the other descriptors (e.g. below a threshold), ordering rules may give the highest priority to the less frequent ones (i.e. the ones with the lowest “habituation” score or equivalently the highest “distinctiveness”). If descriptors have a same frequency, the rules may for instance give more priority to the descriptor associated with the oldest extraction date or alternatively and preferably to the ones with the lowest “recentness” value. If considered descriptors have a same frequency and extraction date (or alternatively “recentness” value), the priority may then be determined in function of the distance between these considered descriptors and the closest reference descriptor. Other ordering rules, combining any value that can be deduced from the descriptors and from their previous comparison to the distribution model, could be used as well.

At step 503, the set of descriptors is updated based on the current frame.

If descriptors are kept between frames, any outdated descriptor (i.e. any descriptor the extraction date of which is older than a maximum amount of time threshold, e.g. 10 seconds) is removed. Next, each one of the unusual descriptors determined at step 303 of Figure 3 for the current frame, for the current node/region (and in all pixels blocks of the region if any) is added to the set obtained at step 502.

When the set of descriptors has a limited (arbitrary) size which is reached, the less noteworthy descriptors (according to the same ordering rules as previously described) are removed from the set.

When the set of descriptors is ordered, the priority queue is updated each time a new descriptor is added in the set. Alternatively, the reordering may be performed after all the descriptors have been added. A pre-filtering may be performed before adding unusual descriptors to the set. If the priority of an unusual descriptor is below a given threshold, it is not added to the set of descriptors. The threshold may be pre-established or alternatively it may change over time (e.g. at each iteration). For instance, the threshold may correspond to the lowest priority of the descriptors already in the queue, or alternatively the priority of the descriptor at the half of the queue if the queue is already filled by more than a half. Instead or in complement of using a threshold on the priority, a maximum number of the unusual descriptors with the highest priority may be considered. In the case where the set of unusual descriptors is already ordered, only the first ones are considered. This maximum number of unusual descriptors may be for instance half of the total size of the set (priority queue) of descriptors, or half of the size of the set of unusual descriptor.

At step 504, processed descriptors are extracted from the set of descriptors updated at step 503.

In practice, the processed descriptors thus extracted correspond to the most unusual descriptors of the set of descriptors. When the updated set of descriptors is ordered as a priority queue, extracting processed descriptors is performed by taking the ones with highest priority, for example a number N of descriptors having the highest priority in the set or the descriptors having a priority number above a given threshold.

This given number or threshold may be dynamically adjusted, for instance based on the network load or on the number of descriptors currently tracked by the node and/or according to an estimation of the total number of descriptors tracked in the system and or in cameras. It may also be determined according to an estimation of the stability of the transition model determined by the stability of the values in transition estimation data obtained in 304. For instance, the stability may be checked by testing that the variance of a sum of the absolute differences of estimated values between successive iterations is below a given threshold.

According to embodiments where the set of descriptors is kept between frames, the processed descriptors that are sent at step 307 are also removed from the set of descriptors to be not considered anymore in the next iterations of the algorithm.

The algorithm ends at step 505.

While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive, the invention being not restricted to the disclosed embodiment. Other variations to the disclosed embodiment can be understood and effected by those skilled in the art in putting into practice (i.e. performing) the claimed invention, from a study of the drawings, the disclosure and the appended claims. For instance, some steps may be performed in a different order or merged to be performed simultaneously while obtaining similar results.

Also, when steps are made dependent on others by using or setting common data, the data of a previous step may also be used in a context where the algorithm is iteratively repeated, and so the steps may also be reordered to perform optimizations.

In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used. Any reference signs in the claims should not be construed as limiting the scope of the invention.

Claims

1. A method for determining a relationship, for instance a target tracking or recognition relationship or an event or fiow propagation relationship, between two image data sets of at least one media stream captured by at least one camera of a multi-camera networking system, the two image data sets representing two regions of a scene, the method comprising: - extracting a first set of visual descriptors characterizing visual features of a first region, based on a first image data set, each visual descriptor being associated with a first extraction date; - obtaining a second set of visual descriptors characterizing visual features of a second region, based on a second image data set; - determining, for each visual descriptor of the first set, a number of occurrences and a second extraction date based on a distribution model generated based on other visual descriptors; - selecting a subset of visual descriptors from among visual descriptors of the first set in function of their determined number of occurrences and second extraction date; - comparing at least one visual descriptor of the subset with at least one visual descriptor of the second set; - based on the result of the comparing step, determining a relationship between the first region and the second region.

2. The method of claim 1, wherein said other visual descriptors used for generating the distribution mode! are visual descriptors previously extracted from the first region.

3. The method of claim 1 or 2, wherein the two image data sets are from the same media stream captured by the same camera.

4. The method of claim 1 or 2, wherein each of the two image data sets is from a different media stream captured by a different camera.

5. The method of any one of claims 1 to 4, wherein the distribution model comprises at least one reference descriptor associated with a number of occurrences and a reference date.

6. The method of claim 5, wherein said reference descriptor is based on visual descriptors previously extracted from the first region.

7. The method of claim 5 or 6, wherein the determination step comprises, for each visual descriptor of the first set, a comparison between said visual descriptor and at least one reference descriptor of the distribution model.

8. The method of any one of claims 5 to 7, wherein the determining step comprises the following steps: - computing a distance between each visual descriptor of the first set and the at least one reference descriptor of the distribution model; and - associating each visual descriptor of the first set with the reference descriptor having the minimum distance to this visual descriptor.

9. The method of claim 8, further comprising a step of updating the distribution model based on the computed distances.

10. The method of claim 9, wherein the step of updating the distribution model comprises, for at least one visual descriptor of the subset, setting the reference date of the associated reference descriptor to the first extraction date of the associated visual descriptor,

11. The method of claim 9 or 10, wherein the step of updating the distribution model comprises comparing the computed distance between each visual descriptor and the associated reference descriptor to a threshold, and when the computed distance is above the threshold, adding the visual descriptor to the distribution model as a new reference descriptor associated with a reference date set to the first extraction date of the added visual descriptor.

12. The method of any one of claims 9 to 11, wherein a new reiationship between the first region and the second region is determined by performing again the steps of the method according to ciaim 1 using the updated distribution model.

13. The method of ciaim 7, wherein the number of occurrences of a given visual descriptor is based on the result of the comparing step between said visual descriptor and at least one reference descriptor of the distribution mode!.

14. The method according to any one of claims 1 to 13, wherein visual descriptors are histograms of quantized values of chroma components.

15. A system for determining a reiationship, for instance a target tracking or recognition relationship or an event or flow propagation reiationship, between two image data sets of at least one media stream captured by at ieast one camera of a multi-camera networking system, the two image data sets representing two regions of a scene, the system comprising computing means configured for: - extracting a first set of visual descriptors characterizing visual features of a first region, based on a first image data set, each visual descriptor being associated with a first extraction date; - obtaining a second set of visual descriptors characterizing visual features of a second region, based on a second image data set; - determining, for each visual descriptor of the first set, a number of occurrences and a second extraction date based on a distribution model generated based on other visual descriptors; - selecting a subset of visual descriptors from among visual descriptors of the first set in function of their determined number of occurrences and second extraction date; - comparing at least one visual descriptor of the subset with at least one visual descriptor of the second set; - based on the result of the comparing step, determining a relationship between the first region and the second region.

16. The system of claim 15, wherein at least one of the cameras of the multi-camera networking system is a Pan Tilt Zoom (PTZ) camera.

17. A computer program product for a programmable apparatus, the computer program product comprising instructions for carrying out each step of the method according to any one of claims 1 to 14 when the program is loaded and executed by a programmable apparatus.

18. A computer-readable storage medium storing instructions of a computer program for implementing the method according to any one of claims 1 to 14,