GB2584717A

GB2584717A - Autonomous search and track using a wide FOV

Info

Publication number: GB2584717A
Application number: GB1908503.4A
Authority: GB
Inventors: Sykes Stephen
Original assignee: Thales Holdings UK PLC
Current assignee: Thales Holdings UK PLC
Priority date: 2019-06-13
Filing date: 2019-06-13
Publication date: 2020-12-16
Anticipated expiration: 2039-06-13
Also published as: GB201908503D0; GB2584717B

Abstract

A surveillance system (105) comprises a sensor manager 155, a target detection module 175, and a sensor 165. The target detection module receives imagery developed from the sensor and is configured to detect objects of interest within the imagery. The sensor manager sends command messages, indicating where the sensor should survey. The system is configured to analyse the detected objects of interest using an exploration - exploitation algorithm to determine where to survey with the sensor. The system could be used on unmanned aerial vehicles (UAVs) equipped with sensors to survey areas from a remote location. The sensor could be a camera mounted on a Pan Tilt Zoom (PTZ) apparatus allowing it to point in any direction and to control its zoom. The objects of interest could be associated with rewards and the system configured to maximise or minimise the reward. Also described is an associated sensor manager and a method of operating a surveillance system.

Description

AUTONOMOUS SEARCH AND TRACK USING A WIDE FOV

FIELD OF INVENTION

Described herein is a surveillance system. The present invention also relates to an associated method of controlling a surveillance system to survey an area.

BACKGROUND OF INVENTION

Unmanned aerial vehicles (UAVs) equipped with sensors can be used to survey areas from a remote location. Typically the sensor is a camera, also called an Electra-Optic / Infra-Red (E0/IR) sensor, mounted on a Pan Tilt Zoom (PTZ) apparatus allowing the sensor to point in any direction and to control its zoom UAVs can be used to survey large areas quickly. This can be useful, for example in search and rescue operations.

Increasing the number of UAVs can increase the speed at which a region is surveyed. However, increasing the number of operators may not be practical. Further, operating an aerial sensor typically requires an operator's undivided attention, since the operator is required to both analyse and understand the imagery, and to also control where the sensor points.

It is at least one objective of at least one embodiment of the present disclosure to provide an improved surveillance system, such as an improved surveillance system for surveying an area.

SUMMARY OF INVENTION

Aspects of the present disclosure are defined by the independent claims appended herewith. Preferred features are defined by the dependent claims appended herewith.

According to a first aspect of the present disclosure there is provided a surveillance system that comprises a sensor manager, a target detection module and at least one sensor. The target detection module receives imagery developed from the at least one sensor and is configured to detect objects of interest within the imagery.

The sensor manager may be configured to send command messages, indicating where the sensor will survey. The surveillance system may be configured to analyse the detected objects of interest, e.g. using an exploration -exploitation algorithm, to determine where, e.g. an area, to survey with the at least one sensor.

The sensor manager may be configured to control where to point the sensor, e.g. by issuing the command messages to a sensor controller that drives the sensor. The sensor may be a pan-tilt-zoon (PTZ) sensor. The sensor controller may drive pan, tilt and zoom mechanisms of the sensor. The sensor manager may be configured to issue command messages to control the sensor to survey the determined area.

The surveillance system may be configured to balance searching for new objects of interest, with the observation of objects already found. A surveillance system that only observed objects already found, but spent no time searching for new ones, would fail to discover important objects; and, conversely, a surveillance system that only searched for new objects, but spent no time observing ones already found, would fail to provide the operator with useful information about the objects that have been found. Therefore the operator may be best served by the present surveillance system that may balance the search for new objects of interest with the observation of known ones.

The surveillance system may be encouraged to balance the search for new objects of interest with the observation of known ones by (1) defining a system of reward that the surveillance system aims to maximise, wherein the observation of an object of interest leads to the surveillance system accruing a notional reward, and (2) by partitioning the ground into a set of non-overlapping areas that may be surveyed, such that observation of an area delivers the sum of the rewards associated with any objects of interest found there, and (3) by the use of an exploration-exploitation algorithm which may operate as follows: for each of the plurality of areas of ground, the surveillance system defines the distribution of reward that is expected should the area be surveyed, wherein a 'distribution of reward' may be a function whose argument is a reward value and whose result is a probability value, and wherein 'defining the distribution' may mean that the surveillance system stores the parameters that define the reward distribution; and the surveillance system may compute an estimate of an upper quantile of the reward distribution, which we call the potential reward, so that an area of ground that might deliver a high reward value when it is surveyed is given a high value of potential reward; and the surveillance system may select the area of ground having the greatest value of potential reward and surveys that area; and the surveillance system may update the reward distributions in the light of the detections that are made in the course of the surveys, and in the light of the passage of time.

Each object of interest may be associated with a reward, that may be accrued when the object is observed. The reward for an area may be dependent on the number of objects of interest in the area. The reward for an area may be or may depend on the sum of the reward values for each of the objects of interest in that area.

The value may depend on a type of object of interest. The value may be set, pre-set or user defined, for example. The value may be selectable or adjustable, e.g. by a user, which may be used to bias the sensor manager towards particular objects of interest or types of object of interest.

The surveillance system may be configured with a goal of maximising or optimising the reward for detectable objects of interest within areas that are surveyable or presently surveyable using the sensor The surveillance system may comprise at least one situation awareness manager (SAM). The SAM may be configured to receive target detection reports from the target detection module. The target detection reports may be indicative of objects of interest identified by the target detection module. The SAM may be configured to compute the values of the potential reward based on the target detection reports. The SAM may be configured to provide potential reward values to the sensor manager. The sensor manager may be configured to determine where to survey with the sensor taking into account or balancing both searching for new objects of interest and observing already identified objects of interest, e.g. based on the potential reward values from the SAM.

The system may be configured to provide tracking and/or observing, e.g. wide field of view tracking and/or wide field of view observing, of already identified objects of interest and searching, e.g. wide field of view searching, for previously unidentified objects of interest (e.g. targets).

The analysis of the detected objects of interest using the exploration -exploitation algorithm may comprise computing an estimate of potential reward associated with surveying of each of a plurality of areas. The sensor manager may be configured to identify and select an area having the greatest potential reward to be an area to be surveyed by the sensor. The potential reward may be indicative of how large a reward for that area might potentially be. Surveying an area may comprise videoing an area or capturing one or more images of an area over a set period of time. By selecting the area with the highest potential reward, the exploration -exploitation algorithm learns which areas are most valuable. For example, an area of ground might have a high potential reward simply because it has never been observed, so that the reward distribution is very broad, representing an appreciable probability that there are objects of interest present there. By observing it, the uncertainty is resolved. Thus, by focusing attention on the areas that might deliver a large reward, the system finds those that really do.

The reward distribution for areas that are found to have a high reward may be modified accordingly, which may increase the likelihood of those areas that have been identified as yielding a high reward being surveyed again in the future relative to those identified as yielding a lower reward.

However, by adjusting the reward distribution for an area with time since that area was last surveyed towards a long term or steady state distribution, the sensor manager may take into account the need to look for new objects of interest or changes in reward (i.e. the algorithm may perform 'exploration' for new objects of interest / changes in reward as well as 'exploitation' of areas previously identified as yielding a high reward).

The sensor manager may be configured to determine or track a cumulative reward. The cumulative reward may be the sum of the reward of each surveyed area. The area with the largest potential to increase or maximise the cumulative reward with time may be the area selected to be surveyed. Increasing or maximising the cumulative reward with time may increase or maximise the number of targets or objects of interest detected by the surveillance system and/or the value of the targets or objects of interest over time.

The surveillance system, e.g. the SAM, may be configured to determine a reward distribution for each area. The reward distribution may comprise a distribution of the probability of the reward for that area. In other words, the reward distribution for an area may comprise a distribution of probabilities of an area having various possible values of reward. The surveillance system may be configured to compute the potential reward for a given area and time. The surveillance system may be configured to determine the potential reward based on the modelled reward distribution. The determination of the potential reward by the surveillance system may comprise, upon surveying a given area, adjusting the potential reward for the given area according to the results of the survey, e.g. to correspond to the reward for the area determined by the survey. The determination of the potential reward by the surveillance system may comprise, between surveys, varying the potential reward associated with an area towards a steady state value for that area depending on the time elapsed since the respective area was last surveyed.

The potential reward for any given area may comprise or be an estimate of an upper quantile of a reward distribution for the area. For example, the potential reward may be an estimate of the reward such that the probability of obtaining a higher reward is a given percentage, e.g. 5%. In this way, the potential reward indicates how large a reward for the area might potentially be.

The potential reward may be referred to as an upper confidence bound (UCB) in reference to the exploration-exploitation algorithm developed by Lai and Robbins in their 1985 paper "Asymptotically Efficient Adaptive Allocation Rules", the contents of which are incorporated by reference in their entirety as if set out in full herein. The present algorithm differs from that of Lai and Robbins in that their UCB values are not adjusted for the passage of time. In contrast, for a surveillance problem, the reward distribution associated with observing a patch of ground varies with time since new targets may arrive or depart, or become concealed or revealed. Thus, application of fixed UCB values, as in Lai and Robbins, may result in unacceptable performance in at least some situations. However, for the purpose of the present surveillance problem, the potential reward values are adjusted as a result of the passage of time, e.g. over time since an area was last surveyed, the system may be configured to adjust the reward distribution for the area towards its steady state distribution, leading to the potential reward value converging on a steady state value.

The reward distribution may be adjusted as a function of time as follows. The reward distribution (that is, the distribution of reward that might be obtained by surveying an area) may be represented using a birth-death model. A birth-death model, or birth-death process, is a Markov process that models the number of objects present in a context where objects may arrive and depart randomly. For example, if at a certain time there are 5 objects, then at a future time the number may be different because new objects may have arrived and others departed. The concept of a birth-death model is defined, for example, in "MANY SERVER QUEUEING PROCESSES WITH POISSON INPUT AND EXPONENTIAL SERVICE TIMES", by Samuel Karlin and James McGregor, 1958, the contents of which are incorporated by reference as if set out in full herein.

The surveillance system may maintain a birth death model for each of the plurality of areas of ground; and the surveillance system may represent the reward associated with surveying an area by the number of objects present in the birth-death model for that area. For example, the probability of obtaining a reward of exactly 2, may be modelled by the probability of there being exactly two objects within the birth-death model. The model has the advantage of being physically plausible for a problem in which objects of interest may arrive or depart randomly.

The default reward for an object of interest may be defined to be 1, so that default objects in the real world are mapped to objects in the birth-death model on a one-to-one basis, with the reward being equal to the number of objects. Where an object of interest has reward other than 1, it may be represented by the presence of multiple objects of value 1; so for example, an object of value 10 may be represented by 10 objects of value 1. The latter represents an approximation of the real world in that the model believes a real-world object with value 10 will depart in units of one, rather than as a single step of value 10; this approximation is held to be acceptable for the purpose of deciding where a sensor will point. In order to use this approach, the reward for an object of interest may be constrained to be a positive whole number.

The reward distribution for an area may be represented by the probability distribution associated with the birth-death model. For example, the probability of achieving a reward of five may be modelled as the probability of finding 5 objects within the birth-death model. This probability may depend on earlier observations, e.g. if an area is surveyed then the state of the birth-death model for that area at the time of the survey may be set so as to correspond to the reward identified for that area by the survey. For example, if at a certain time the area is surveyed and the reward is 3, for example, then the surveillance system may set the state of the birth-death model to 3 at that time; and a rate at which objects enter and leave the birth-death model can be used to compute the probability of the state, e.g. being 5, at a later time.

The behaviour of a birth-death model may depend on how quickly objects tend to arrive and depart. A birth-death model may be parameterised by an arrival rate, together with a mean object lifetime. Over time, the distribution of the number of objects within a birth-death model may tend towards a steady state distribution. The steady-state distribution may be Poisson, for example, assuming the mean arrival rate and the mean lifetime are fixed, and the mean of the distribution may depend on the arrival rate and the mean lifetime.

Modelling the reward distribution using a birth-death process has the advantage that the variation in reward over time may be conveniently represented within the system in a manner that that is efficient to compute, consistent mathematically, and reasonably representative of the real world. That is, variation in the real world occurs naturally over time as objects enter and leave an area; and the use of a birth-death model enables the system to capture this behaviour effectively.

The surveillance system, e.g. the SAM, may be configured to model the reward distribution of each area. The model for an area may be or comprise a birth-death model for that area. The model for an area may model the appearance and disappearance of objects of interest in that area. The surveillance system, e.g. the SAM, may be configured to update the model for an area based on surveys of the area, e.g. to reflect or learn which areas tend to provide the greatest reward. The surveillance system, e.g. the SAM, may be configured to update the model for each area over time.

The model for an area may comprise objects of interest arriving into that area, e.g. at random times. The model for an area may comprise objects of interest arriving into the area according to an average arrival rate that is stored within the surveillance system, e.g. the SAM. The model for an area may comprise objects of interest leaving the area, e.g. at random times. The model for an area may comprise objects of interest leaving the area according to a mean lifetime (or equivalently, a half-life or other equivalent parameter) that is stored within the surveillance system, e.g. within the SAM. For the purpose of the current invention, the appearance of an object for any reason, such as emerging from a building, may be regarded as an 'arrival'. Similarly an object that disappears from an area for any reason may be regarded as having 'left'.

The surveillance system may learn values for the average arrival rate and/or the mean lifetime for each area by updating these values according to the detections that are found. The surveillance system, e.g. the SAM, may initialise the average arrival rate and the mean lifetime to default values. In the preferred embodiment, only the arrival rate is learnt over time, while the mean lifetime is held fixed, representing the rate at which targets disappear from an area.

The model may not require objects of interest to enter an area from an adjacent area. This may more accurately reflect the behaviour of objects of interest in appearing from and/or disappearing into buildings, for example. The surveillance system, e.g. the SAM, may be configured to determine and/or update the model's reward distribution. The surveillance system, e.g. the SAM, may be configured to determine the potential reward from the reward distribution.

The surveillance system may be configured to re-determine, adjust or update the reward distribution for an area following (e.g. immediately after) a survey of that area, e.g. using the reward for that area determined by the survey. The re-determined, adjusted or updated reward distribution for an area following the survey of that area may reflect the reward for that area determined by the survey. The reward for the area determined by the survey may comprise or be dependent on the sum of the rewards of the objects of interest detected in the given area by the survey. The re-determination, adjustment or update of the reward distribution for the given area may comprise narrowing the reward distribution for that area, e.g. around the reward for that area determined by the survey. The re-determined, adjusted or updated reward distribution for an area following the survey of that area may reflect the reward for that area determined by the survey and the uncertainty in the reward for the area immediately after a survey may be low, which may be reflected by the narrowing of the reward distribution. The re-determined, adjusted or updated reward distribution for an area following the survey of that area may comprise a delta or other narrow function around the value of the reward for the area determined by the survey. The potential reward of an area immediately following the survey of that area may be equal to the reward for that area identified by the survey.

The surveillance system, e.g. the SAM, may be configured to change the reward distribution with time since the last survey. The surveillance system may be configured to broaden the reward distribution with time since the last survey. The surveillance system may be configured to change (e.g. progressively change) the reward distribution for an area with time since the last survey, e.g. towards a steady-state model and/or a steady-state reward distribution for that area. The steady state reward distribution may comprise a Gamma-Poisson mixture or other distribution. The spread of the reward distribution for an area may be decayed over time since the last survey from the re-determined, adjusted or updated reward distribution based on the survey towards the steady state reward distribution.

The surveillance system e.g. the SAM may learn the steady state reward distribution for an area based on successive observations of the area. The steady state distribution of the birth-death process for an area may be Poisson, and therefore completely described by its mean. This mean value may be equal to an average arrival rate multiplied by a mean lifetime wherein the average arrival rate and the mean lifetime may be associated with the birth-death model. Therefore, learning a steady state reward distribution may be equivalent to learning the mean value of the steady state reward distribution; and this in turn may be equivalent to learning the average arrival rate, where the mean lifetime may be held fixed. The steady state mean reward of the birth death process may initially be unknown and its possible range of values may be modelled, e.g. according to a distribution. The surveillance system e.g. the SAM may specify a broad initial distribution for this quantity, to be replaced by successively narrower distributions as information about the rewards associated with the area accumulate, e.g. based on surveys of the area. Thus, each time an area is surveyed, the surveillance system may update the parameters that define the distribution of the steady state mean of the birth death process for that area, e.g. based on the results of the survey.

The surveillance system e.g. the SAM may store, for each area, parameters that define the distribution of the steady state mean reward of the birth death process for that area, e.g. in the form of a Gamma distribution. For each area, the surveillance system may store parameters for the Gamma distribution called the shape parameter and the scale parameter. The Gamma distribution is well known to those skilled in the art and is documented, for example, in the "NIST/SEMATECH e-Handbook of Statistical Methods", NIST being an agency of the U.S. Department of Commerce (see httes:/Awfw.itl.nist.govidiv898/handbookiedalsection3leda366b.htm). We call the shape parameter alpha, and we call the scale parameter beta, in line with the naming convention used within the Wolfram Mathematica language; although a person skilled in the art will appreciate that the parameters could be represented by any symbols without changing the intent of the invention. The parameter names used by the Wolfram Mathemafica language are described within the documentation for that language, here: https://reference.wolfram.com/language/ref/GammaDistribution.html.

Each time an area is surveyed, the surveillance system may update the parameters of the steady state reward distribution, e.g. alpha and beta, for that area based on the results of the survey, e.g. to reflect improved knowledge of the likely range of steady state mean values associated with that area. The choice of a Gamma distribution may lead to an efficient update rule, because the Gamma distribution is the conjugate prior of the Poisson; and as stated above, in embodiments, the steady state reward distribution of the birth death process, for a fixed value of the steady state mean, is Poisson.

The update process thus described may constitute a learning process in which the surveillance system learns which areas are most valuable to observe. This longer-term learning process stands in contrast to the shorter-term process by which each reward distribution decays towards its steady state. Thus, the surveillance system may have both long-and short-term memory. The long-term memory may comprise, for each area, parameters that define the distribution of the steady state mean reward of the birth death process for that area, e.g. in the form of the parameters of the Gamma distribution, indicating which areas of ground typically deliver high rewards. The short-term memory may comprise the instantaneous reward distribution, which decays over time towards a steady state (the steady state itself based upon the long-term memory). An observable benefit is that the sensor manager may know that a particular area often contains high rewards (long term memory), so will tend to revisit it more often; and if it observes the area and finds nothing, it will ignore that area for a while (short term memory), but will tend to return once enough time has passed for new targets to have appeared (short term memory giving way to long term memory).

Further, the sensor manager may be encouraged to visit un-explored areas by initialising the statistical model so that the potential reward associated with an area is initially high, until the area has been observed. This technique is known as 'optimistic initialisation' (see http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/XX. pdf). To this end, for embodiments in which the steady state mean reward is modelled by a Gamma distribution, the sensor manager may set the initial values of alpha and beta so that the value of their product (alpha multiplied by beta), being the expected value of the steady state mean reward, is initially high, e.g. high with respect to what the measured values are likely or anticipated to be. As the sensor manager may be configured to survey the area with the largest potential reward which is within the range and/or field of view and/or within a proximity, then this arrangement may encourage visits to previously unexplored areas.

This allows the sensor manager to effectively learn the most likely area for finding maximum reward over time. By updating of the reward distribution for an area once that area has been surveyed, and then decaying the reward distribution towards the long term average over time since the last survey, and using an expected reward derived from the distribution, e.g. a function of the distribution such as the potential reward, to determine which area to survey, a balance can be achieved between surveying an area based on high rewards identified for that area in the past (i.e. exploitation) and exploring areas that haven't been surveyed for some time and whose associated reward may have changed (i.e. exploration), in a dynamically changing reward environment (i.e. the reward in each area need not be static).

The surveillance system may comprise one or a plurality of surveillance platforms. The sensor may be comprised in a surveillance platform of the surveillance system. The surveillance platform may be, comprise or be comprised in an aerial vehicle, such as an unmanned aerial vehicle, for example a drone. The surveillance system may be operable to selectively survey one or more areas from a plurality of different areas.

The surveillance system may comprise a sensor controller, for controlling the sensor. The sensor manager may be configured to communicate with the sensor controller. The sensor controller may be configured to receive the command messages sent by the sensor manager. The sensor controller may be configured to control the sensor to perform the survey with the sensor responsive to the command messages.

The sensor may be, comprise or be comprised in an electro-optical/infrared sensor (E0/IR sensor).

The sensor manager, sensor controller, target detection module and sensor may be comprised in a sensor module. The system may comprise one or more of the sensor modules. Each sensor, being part of one of the sensor modules, may be carried by one of the surveillance platforms. Each surveillance platform may comprise one or more sensors. Each of the surveillance platforms may comprise one of the situation awareness managers (SAMs). The SAM on a respective platform may be configured to receive the target detection reports from each target detection module that is part of a sensor module having its sensor on the respective surveillance platform. The target detection reports may be indicative of objects of interest identified by the respective target detection module. The SAM may be configured to compute the values of the potential reward based on the target detection reports. The SAM may be configured to provide potential reward values to each sensor manager that is part of a sensor module having a sensor on the respective surveillance platform.

Each situation awareness manager mounted on one of the surveillance platforms may transmit target detection reports to the situation awareness managers mounted on other surveillance platforms, e.g. by means of a communications subsystem. The SAMs may transmit the target detection reports to other SAMs directly or indirectly, e.g. via the control station. The communications subsystem may comprise a wireless communications subsystem. Each of the respective situation awareness managers mounted on respective surveillance platforms may compute, at least in part, an estimate of the potential reward based on the target detection reports received from the situation awareness managers mounted on other surveillance platforms.

A surveillance platform may comprise a route manager. The SAM mounted on the surveillance platform may provide the potential reward values to the route manager. The route manager may compute a route for the surveillance platform based on maximising the potential reward reported by the situation awareness manager. The surveillance platform may travel the route computed by the route manager.

The surveillance system may comprise a centralised route manager configured to compute routes for at least one surveillance platform. For example, the centralised route planner might be located at a ground control station. The surveillance system may comprise a SAM co-located with the centralised route manager which provides the potential reward values to the centralised route manager. The centralised route manager may compute a route for one or more surveillance platforms based on maximising the potential reward reported by the SAM. The surveillance platforms may travel the route computed by the centralised route manager.

The situation awareness manager and/or sensor manager may be configured to implement the exploration -exploitation algorithm. The situation awareness manager or sensor manager may be configured to probabilistically model a reward distribution of each area and determine the potential reward based on the modelled reward distribution.

The sensor manager may be configured to issue instructions to perform the surveillance. The sensor manager may be configured to instruct the sensor controller to control the sensor to perform the surveillance. The sensor manager may be configured to issue an instruction to the sensor controller to observe a patch of ground chosen by the sensor manager. The sensor manager may be configured to issue an instruction to the sensor controller to set the field of view of the sensor. The sensor manager may be configured to instruct the sensor controller to control the sensor to observe an entire area, using what is termed a wide field of view, or to observe one target, using what is termed a narrow field of view. Directing the sensor may comprise controlling a pan-tiltzoom (PTZ) apparatus to direct the sensor towards a patch of ground selected to be surveyed by the sensor manager.

The target detection module may be configured to detect targets or objects of interest within the surveyed area. Images of the surveyed area may be communicated from the sensor and/or sensor controller and/or the sensor manager and/or the sensor module to the target detection module. The target detection module may be configured to communicate the objects of interests detected for a surveyed area to the sensor manager and/or SAM, e.g. for use in determining or updating the reward, model or reward distribution for that area. The target detection module may be configured to detect targets on the basis of their motion and/or appearance. The target detection module may be configured to compare consecutive images or video frames of the surveyed area and may be configured to detect changes in consecutive images or video frames of the surveyed area. The target detection module may be configured to compensate for changes in the consecutive images or video frames of the surveyed area due to movement of the sensor. The target detection module may be configured to use image recognition or pattern matching techniques or comparison with specifications or templates to identify targets and/or objects of interest.

The target detection module may be configured to detect target or object motion using video moving target indication (VMTI). The target detection module may be configured to align successive images of a surveyed area, for example to compensate for movement of the surveillance system during the surveillance of the surveyed area. The target detection module may be configured to perform background subtraction from successive images. The target detection module may be configured to perform target or object detection followed by tracking. The target detection module may be configured to determine a target or object of interest type, for example using image recognition.

The sensor manager may be configured to control the sensor to survey an area such that targets or objects in the surveyed area are identified and/or identifiable. For example, the sensor may be controlled such that targets or objects are sufficiently large and/or clear from the survey to allow target or object recognition.

One skilled in the art will understand how a particular sensor surveying a particular region and/or area for particular targets or objects will need to be controlled (for example, what resolution is required) to ensure that the particular targets or objects are likely to be detected. One skilled in the art will also understand that these parameters (for example, the resolution required to ensure targets or objects are identifiable in the survey of the area) will determine the size of each area. For example, each area may be of the order of a hundred metres across if the targets are vehicles.

The method may comprise controlling the sensor such that the resolution of the area obtained during the surveillance is as large as possible, e.g. the area fills as much of the field of view of the sensor as possible. The sensor may be configured to control the surveillance system such that the resolution of the area obtained during the surveillance is as large as possible, (e.g. the area fills as much of the field of view of the sensor as possible).

At least part of the surveillance system (e.g. the surveillance platform(s)) may be movable. The movement of the surveillance platform(s) may be controlled using a flight module, a route manager and/or a navigation module. The surveillance platform(s) may be movable over, around, through and/or across the plurality of different areas.

The sensor manager may be configured to survey the area with the largest potential reward which is within the range and/or field of regard and/or within a proximity of the surveillance platform and/or the sensor. The sensor manager may be configured to survey the area with the largest potential reward towards which the sensor can be directed, for example without moving the surveillance platform or without changing the current navigation path of the surveillance platform. The sensor manager may be configured to select, for surveying, an area with the largest potential reward towards which the sensor can be directed using one or more or each of pan, tilt and zoom functions of the sensors. The sensor manager may be configured to use the position of the surveillance platform to determine which areas are within the range and/or field of regard and/or within proximity of the surveillance system and/or the sensor. The position of the surveillance platform may be communicated to the sensor manager and/or sensor controller by the flight module, route manger and/or navigation module. The sensor manager may be configured to use the position of the surveillance platform from the flight module, route manger and/or navigation module to determine where to direct the sensor to survey the area of largest potential.

The situation awareness manager (SAM) may be configured to evaluate the potential reward of each area, and may be configured to store the information regarding the detected targets or objects. For example, the information regarding the detected targets or objects may include one or more of target or object position, target or object track, target or object type (e.g. person, vehicle etc.), target or object identity, target or object rewards.

The SAM may be part of the sensor manager, or it may be a separate entity. The use of a separate SAM is generally preferred because it simplifies the sharing of data within the system, but the present disclosure is not limited to this arrangement.

Each surveillance platform may comprise a sensor, physical and electrical support for the sensor, and a communications module. More than one surveillance platform may perform surveillance. The surveillance platforms may be coordinated to survey the plurality of different areas. For example, more than one surveillance platform, e.g. in the form of an aerial vehicle such as an unmanned aerial vehicle (UAV) or drone, may be coordinated to perform a surveillance. A swarm of drones may be coordinated to perform the surveillance. The results of a survey of an area may be communicated, e.g. wirelessly communicated, from the surveillance platform that performed the survey of the area to one or more of the other surveillance platforms.

The results of each survey performed by each surveillance platform may be communicated, e.g. wirelessly communicated, to each other surveillance platform. The results of a survey of an area may be received from another surveillance platform which performed the survey of the area.

Any feature described in relation to a single surveillance platform may be equally applicable to each of more than one or multiple surveillance platforms.

The results of the survey of the area may be communicated to other surveillance platforms and/or other sensor managers and/or other situation awareness managers. Each surveillance platform may comprise a communication module for communicating the results of the survey of the area e.g. to one or more other surveillance platforms and/or one or more other sensor managers. For example, the results of the survey of the area may be communicated to another surveillance platform which may be, comprise or be comprised in an aerial vehicle, such as an unmanned aerial vehicle.

The results of the survey of an area may be received from other surveillance platforms and/or other sensor managers and/or other situation awareness managers.

The sensor manager and/or SAM may be configured to receive the results of a survey of an area from another surveillance platform, and use the received results to update the potential reward of the area that was surveyed by the other surveillance platform.

The results of surveys may be communicated from the surveillance platform which performed the surveillance to the other surveillance platforms following, such as immediately following, the surveillance of the surveyed area.

The results of surveillances performed by the surveillance platform may be stored, for example for a set period of time or until after communication between the surveillance platforms is possible. The stored results of surveillances from the surveillance platform which performed the surveillances may be communicated to the other surveillance platforms while communication between the surveillance platforms is possible.

Each surveillance platform may be configured to independently evaluate the potential of each area at least partially on the basis of detection results communicated between them in addition to the detection results determined by that surveillance platform. This arrangement has the benefit that the values of the potential reward are up to date at the point of use, compared with alternative designs in which the potential values are computed at one node in the communications network and then communicated to the others. This is because the surveillance platforms make tasking choices asynchronously with respect to each other, so that one surveillance platform does not generally know the precise moment when a different surveillance platform requires a value of the potential reward. In addition, communications systems may have appreciable latency, so that a system reliant on requesting potential rewards from other platforms in a request-response manner would suffer delays in the provision of potential reward values, resulting in either the delaying of tasking decisions, or the use of older (and therefore less accurate) potential reward values. A further advantage of having the potential rewards independently computed by each surveillance platform is that there may be no single point of failure, such as would be present in a system that relied on a single entity to compute potential rewards; and each surveillance platform may continue to operate even if the communication system (e.g. a wireless communication system) is temporarily unavailable.

The surveillance platforms may be configured to move away from the last known position of other surveillance platforms. For example, if there is a loss of communication between surveillance platforms, the surveillance platforms may attempt to move apart from one another. This may advantageously reduce the risk of different surveillance platforms which are not in communication with each other from surveying the same area at the same time or within a set period of time. This may prevent the same area being surveyed by different surveillance platforms in an unnecessarily short period of time.

The surveillance platforms may be controllable using a control station, such as a ground control station, which may be configured to issue instructions to the surveillance platform(s). For example, the control station may be used to bias the surveillance platform(s) towards particular areas and/or targets, to define the region and/or the plurality of areas for surveying, and/or the like. However, this is optional and the surveillance platforms may be used autonomously.

The results of the surveillance from the surveillance platforms may be communicated (e.g. wirelessly communicated) to the control station, which may be in real time, e.g. as the surveillance is performed, or in batches, e.g. when such communication is possible. For example, if communication links between the surveillance platform and the control station are lost, the surveillance platform may store the results of the surveillance, and communicate the stored results of the surveillance once communication between the surveillance platform and the control station is restored. The surveillance platform may advantageously continue with the surveillance whilst communications between the surveillance platform and the control station are unavailable. The surveillance system may advantageously continue to coordinate the results of surveys between different surveillance platforms whilst communications between the surveillance platform and the control station are unavailable.

An area may be partly surveyed. For example, an area may be partially surveyed during the survey of an adjacent area. The sensor manager may be configured to evaluate the potential reward of the partially surveyed area following the partial survey of an area. For example, the potential reward of the area may be equal to or larger than the reward for the area following the partial survey of the area. The surveillance system may be configured to evaluate the potential reward of an area following more than one partial survey of that area. The surveillance system may be configured to only use the partial survey of an area to update the reward distribution if doing so increases the potential reward. The surveillance system may be configured to determine the potential reward with the partial survey of the area included and without the partial survey of the area included and to use whichever gives the largest potential reward.

Information from the survey of the area with the largest potential reward may be displayed, for example to an operator of the surveillance system, e.g. on a display. The plurality of areas may be displayed with an indication of the potential reward of the area, e.g. by colouring or otherwise displaying each area according to the potential reward of the area.

A region, such as a region for surveillance, may be discretized into a plurality of different areas. The region may be discretised such that each area is non-overlapping (i.e. doesn't overlap) with each other area. Each area may be a regular polygon. Each area may be the same shape and size as each other area, with the optional exception of areas on the boundary of the region. Each area may be a hexagon, such as a regular hexagon. The region may be discretized into a plurality of tessellating hexagons, such as a honeycomb structure. The region may be discretized such that each part of the region is in an area, and/or no part of the region is in more than one area. Discretising the region into tessellating hexagons may allow the whole region to be discretised into non-overlapping areas whilst advantageously minimising the difference in perceived size, such as the perceived width and depth, when the area is surveyed from different directions.

The targets or objects of interest may be people, vehicles and/or buildings generally. The targets or objects of interest may be specific individual people, specific vehicles or types of vehicles and/or specific buildings or types of building.

The surveillance system may be configured to track and/or model the movement of targets or objects from one surveyed area to an adjacent area.

The surveillance system may be configured to model the performance of the target detection, to determine the effectiveness of the target detection. For example, if the number of detected targets is much higher or lower than a number of expected targets from a model, the surveillance system may be configured to report this or raise an alarm to an operator for further input or investigation.

The surveillance system may be configured to permit re-setting the potential of one or more areas as the surveillance continues. This may advantageously allow the surveillance to reset, thereby removing any biases which have been incorporated into the surveillance system from target detections in particular areas. This could allow areas of the region which have not been surveyed in a long time to be surveyed again. The surveillance system may be configured to model the region, such as the shape of the ground of the region and any occlusions in it. This may advantageously allow the surveillance system to consider that areas contain targets which are not in the

current field of view of the surveillance system.

According to a second embodiment of the present disclosure, the surveillance system of the first aspect may be integrated into the system or combined with the sensor manager disclosed in GB patent application 1806410.5. GB patent application 1806410.5 is herein incorporated in its entirety by reference.

The operation of the surveillance system in accordance with the first aspect may be one of the selectable tasks of the sensor manager of GB 1806410.5.

A method according to a second aspect may comprise: receiving imagery developed from at least one sensor; detecting objects of interest within the imagery; and analysing the detected objects of interest, e.g. using an exploration -exploitation algorithm, to determine an area to survey with the at least one sensor.

The method may comprise controlling the at least one sensor to survey the determined area to survey, which may comprise sending one or more command messages that command the surveying of the determined area by the at least one sensor.

The method may involve balancing searching for new objects of interest, with the observation of objects already found. The method may comprise balancing the search for new objects of interest with the observation of known ones, e.g. by (1) defining a system of reward that the surveillance system aims to maximise, wherein the observation of an object of interest leads to the surveillance system accruing a notional reward, and (2) by partitioning the ground into a set of non-overlapping areas that may be surveyed, such that observation of an area delivers the sum of the rewards associated with any objects of interest found there, and (3) by the use of an exploration-exploitation algorithm. The use of the exploration-exploitation algorithm may comprise the following: for each of the plurality of areas of ground, defining the distribution of reward that is expected should the area be surveyed, wherein a 'distribution of reward' may be a function whose argument is a reward value and whose result is a probability value, and wherein 'defining the distribution' may mean that the surveillance system stores the parameters that define the reward distribution; computing an estimate of an upper quantile of the reward distribution, which we call the potential reward, so that an area of ground that might deliver a high reward value when it is surveyed is given a high value of potential reward; selecting the area of ground having the greatest value of potential reward and surveys that area; and updating the reward distributions in the light of the detections that are made in the course of the surveys, and in the light of the passage of time.

The analysis may be performed, at least in part, by a sensor manager and/or a situation awareness manager (SAM). The imagery may be received by a target detection module, which may perform the detection of the objects of interest.

Each object of interest may be associated with a reward that is accrued when the object is observed. The method may comprise determining a reward for at least one or each object of interest. The method may comprise determining a reward for an area, which may be dependent on the number of objects of interest in the area. The determining of the reward for an area may comprise summing the reward values for each of the objects of interest in that area. The value may depend on a type of object of interest. The method may comprise setting, pre-setting or defining the values for one or more or each type of object of interest. The method may comprise selecting or adjusting the value, e.g. to bias the sensor manager towards particular objects of interest or types of object of interest.

The surveillance system may be configured with a goal of maximising or optimising the reward for detectable objects of interest within areas that are surveyable or presently surveyable using the sensor The method may comprise receiving target detection reports, e.g. at the SAM, from the target detection module. The target detection reports may be indicative of objects of interest identified by the target detection module. The method may comprise computing the values of the potential reward based on the target detection reports, e.g. using the SAM. The method may comprise providing potential reward values, e.g. from the SAM to the sensor manager.

The determining of where to survey with the sensor may take into account or balance both searching for new objects of interest and observing already identified objects of interest.

The system may be configured to provide tracking and/or observing, e.g. wide field of view tracking and/or wide field of view observing, of already identified objects of interest and searching, e.g. wide field of view searching, for previously unidentified objects of interest (e.g. targets), e.g. based on the potential reward values from the SAM.

The analysis of the detected objects of interest using the exploration -exploitation algorithm may comprise computing an estimate of potential reward associated with surveying of each of a plurality of areas. The method may comprise identifying and selecting an area having the greatest potential reward to be an area to be surveyed by the sensor. The potential reward may be indicative of how large a reward for that area might potentially be. Surveying an area may comprise videoing an area or capturing multiple images of an area for a set period of time.

The method may comprise modifying the reward distribution for areas that are found to have a high reward, e.g. so as to increase the likelihood of those areas that have been identified as yielding a high reward being surveyed again in the future relative to those identified as yielding a lower reward.

The method may comprise determining or tracking a cumulative reward. The cumulative reward may be the sum of the reward of each surveyed area. The area with the largest potential to increase or maximise the cumulative reward with time may be the area selected to be surveyed. Increasing or maximising the cumulative reward with time may increase or maximise the number of targets or objects of interest detected by the surveillance system and/or the value of the targets or objects of interest over time. The method may comprise determining a reward distribution for each area. The reward distribution may comprise a distribution of the probability of the reward for that area. In other words, the reward distribution for an area may comprise a distribution of probabilities of an area having various values of reward. The method may comprise computing the potential reward for a given area and time. The method may comprise probabilistically modelling the reward distribution of each area. The method may comprise determining the potential reward based on the modelled reward distribution.

The method may comprise, upon surveying a given area, adjusting the potential reward for the given area according to the results of the survey, e.g. to correspond to the reward for the area determined by the survey. The determination of the potential reward by the surveillance system may comprise, between surveys, varying the potential reward associated with an area towards a steady state value for that area depending on the time elapsed since the respective area was last surveyed.

The potential reward for any given area may comprise, be represented by and/or be in the form of an upper or uppermost quantile of a reward distribution for the area. For example, the potential reward may be an estimate of the reward such that the probability of obtaining a higher reward is a given percentage, e.g. 5%.

The potential reward may be referred to as an upper confidence bound (UCB) in reference to the exploration-exploitation algorithm developed by Lai and Robbins in their 1985 paper "Asymptotically Efficient Adaptive Allocation Rules", the contents of which are incorporated by reference in their entirety as if set out in full herein. The method may comprise varying the reward distribution associated with observing a patch of ground with time. The method may comprise adjusting the potential reward values as a result of the passage of time, e.g. over time since an area was last surveyed. The method may comprise adjusting the reward distribution for the area towards its steady state distribution, leading to the potential reward value converging on a steady state value.

The method may comprise adjusting the reward distribution as a function of time as follows. The method may comprise representing the reward distribution (that is, the distribution of reward that might be obtained by surveying an area) using a birth-death model. The concept of a birth-death model is defined, for example, in "MANY SERVER QUEUEING PROCESSES WITH POISSON INPUT AND EXPONENTIAL SERVICE TIMES", by Samuel Karlin and James McGregor, 1958, the contents of which are incorporated by reference as if set out in full herein.

The method may comprise maintaining a birth death model for each of the plurality of areas of ground. The method may comprise representing the reward associated with surveying an area by the number of objects present in the birth-death model for that area.

The method may comprise defining the default reward for an object of interest to be 1, so that the representation scheme maps default objects in the real world to objects in the birth-death model on a one-to-one basis, with the reward being equal to the number of objects. The method may comprise representing an object of interest that has a reward other than 1 by multiple objects of value 1. The reward for an object of interest may be constrained to be a positive whole number.

The method may comprise representing the reward distribution for an area by the probability distribution associated with the birth-death model. This probability may depend on earlier observations.

The behaviour of a birth-death model may depend on how quickly objects tend to arrive and depart. A birth-death model may be parameterised by an arrival rate, together with a mean object lifetime. Over time, the distribution of the number of objects within a birth-death model may tend towards a steady state distribution. The steady-state distribution may be Poisson, and its mean may depend on the arrival rate and the mean lifetime.

The method may comprise modelling the reward distribution of each area. The model for an area may be or comprise a birth-death model for that area. The model for an area may model the appearance and disappearance of objects of interest in that area. The method may comprise updating the model for an area based on surveys of the area, e.g. to reflect or learn which areas tends to provide the greatest reward. The method may comprise updating the model for each area over time. The model for an area may comprise objects of interest arriving into that area, e.g. at random times. The model for an area may comprise objects of interest arriving into the area at an average arrival rate, which may be a known average rate. The average arrival rate may be updated based on surveys of the area. The model for an area may comprise objects of interest leaving the area, e.g. at random times. The model for an area may comprise objects of interest leaving the area according to a half-life or other equivalent parameter, which may be known, e.g. based on or updated in view of surveys of the area. The model may not require objects of interest to enter an area from an adjacent area. This may more accurately determine objects of interest appearing from and/or disappearing into buildings, for example. The method may comprise determining and/or updating the reward distribution from the model. The method may comprise determining the potential reward from the reward distribution.

The method may comprise re-determining, adjusting or updating the reward distribution for an area following (e.g. immediately after) a survey of that area, e.g. using the reward for that area determined by the survey. The re-determined, adjusted or updated reward distribution for an area following the survey of that area may reflect the reward for that area determined by the survey. The reward distribution for the area determined by the survey may comprise or be dependent on the sum of the rewards of the objects of interest detected in the given area by the survey. The re-determination, adjustment or update of the reward distribution for the given area may comprise narrowing the reward distribution for that area, e.g. around the reward for that area determined by the survey. The re-determined, adjusted or updated reward distribution for an area following the survey of that area may reflect the reward for that area determined by the survey and the uncertainty in the reward for the area immediately after a survey may be low, which may be reflected by the narrowing of the reward distribution. The potential reward of an area immediately following the survey of that area may be equal to the reward for that area identified by the survey.

The method may comprise changing the reward distribution with time since the last survey. The method may comprise broadening the reward distribution with time since the last survey. The method may comprise changing (e.g. progressively changing) the reward distribution for an area with time since the last survey, e.g. towards a steady-state model and/or a steady-state reward distribution for that area. The spread of the instantaneous reward distribution for an area may be greater than the spread of the reward distribution immediately after an area has been surveyed, which may be indicative of an increased uncertainty in the reward. The re-determined, adjusted or updated reward distribution for an area following the survey of that area may comprise a delta or other narrow function around the value of the reward for the area determined by the survey. The steady state reward distribution may comprise a Gamma-Poisson distribution. The spread of the instantaneous reward distribution for an area may be decayed over time since the last survey from the re-determined, adjusted or updated reward distribution based on the survey towards the steady state reward distribution.

The method may comprise learning the steady state reward distribution for an area based on successive observations of the area. The steady state distribution of the birth-death process for an area, assuming the birth-death model has fixed parameters, may be Poisson, and therefore completely described by its mean. This mean value may be equal to an average arrival rate multiplied by a mean lifetime, wherein the average arrival rate and the mean lifetime may be associated with the birth-death model. Therefore, learning a steady state reward distribution may be equivalent to learning the mean value of the steady state reward distribution; and this in turn may be equivalent to learning the average arrival rate, where the mean lifetime may be held fixed. The steady state mean reward of the birth death process may initially be unknown and its possible range of values may be modelled, e.g. according to a distribution. The method may comprise specifying a broad initial distribution for this quantity, to be replaced by successively narrower distributions as information about the rewards associated with the area accumulate, e.g. based on surveys of the area. Thus, the method may comprise, each time an area is surveyed, updating the parameters that define the distribution of the steady state mean of the birth death process for that area, e.g. based on the results of the survey.

The method may comprise storing, for each area, parameters that define the distribution of the steady state mean reward of the birth death process for that area, e.g. in the form of a Gamma distribution. The method may comprise, for each area, storing parameters for the Gamma distribution called the shape parameter and the scale parameter. We call the shape parameter alpha, and we call the scale parameter beta, in line with the naming convention used within the Wolfram Mathematica language; although a person skilled in the art will appreciate that the parameters could be represented by any symbols without changing the intent of the invention. The parameter names used by the Wolfram Mathematica language are described within the documentation for that language, here: https://reference wolfram.corn/lanquaqe/refiGammaDistribution.html the contents of which are incorporated by reference.

The method may comprise, each time an area is surveyed, updating the parameters of the steady state reward distribution, e.g. alpha and beta, for that area based on the results of the survey, e.g. to reflect improved knowledge of the likely range of steady state mean values associated with that area.

The update process thus described may constitute a learning process in which the sensor manager learns which areas are most valuable to observe. This longer-term learning process stands in contrast to the shorter-term process by which each reward distribution decays towards its steady state. Thus, the method may provide both long- and short-term memory. The long-term memory may comprise the parameters of the Gamma distribution, indicating which areas of ground typically deliver high rewards. The short-term memory may comprise the instantaneous reward distribution which decays over time towards a steady state (the steady state itself based upon the long-term memory). An observable benefit is that it may be known that a particular area often contains high rewards (long term memory), so the particular area will be revisited more often; and if the area is observed and nothing is found, the method results in that area being ignored for a while (short term memory), but it will tend to return once enough time has passed for new targets to have appeared (short term memory giving way to long term memory).

Further, the method may encourage visits to un-explored areas by initialising the statistical model so that the potential reward associated with an area is initially high, until the area has been observed, i.e. 'optimistic initialisation' (see httplAwfwacs.ucl.ac.uktstaff/d.silvertwebiTeachinq files/XX.ndf, the contents of which are incorporated by reference as if set out in full herein). For embodiments in which the steady state mean reward is modelled by a Gamma distribution, the method may comprise setting the initial values of alpha and beta so that the value of their product (alpha multiplied by beta), being the expected value of the steady state mean reward, is initially high, e.g. high with respect to what the measured values are likely or anticipated to be.

The method may comprise operation of a surveillance system, which may comprise one or a plurality of surveillance platforms. The sensor may be comprised in one of the surveillance platforms. The surveillance platform may comprise or be comprised in an aerial vehicle, such as an unmanned aerial vehicle, for example a drone. The method may comprise selectively surveying one or more areas from a plurality of different areas.

The method may comprise controlling the sensor to perform the survey with the sensor, e.g. to survey the determined area. The sensor may be, comprise or be comprised in an electro-optical/infrared sensor (E0/IR sensor).

The method may comprise receiving target detection reports from each target detection module that is part of a sensor module having its sensor on a respective surveillance platform. The target detection reports may be indicative of objects of interest identified by the respective target detection module. The method may comprise computing the values of the potential reward based on the target detection reports. The method may comprise providing potential reward values to each sensor manager that is part of a sensor module having a sensor on the respective surveillance platform.

The method may comprise transmitting target detection reports from a situation awareness manager on a platform to the situation awareness managers mounted on other surveillance platforms, e.g. by means of a communications subsystem. The method may comprise computing, at least in part, an estimate of the potential reward based on the target detection reports received from the situation awareness managers mounted on other surveillance platforms.

The method may comprise providing the potential reward values to a route manager. The method may comprise using the route manager to compute a route for the surveillance platform based on maximising the potential reward reported by the situation awareness manager.

The method may comprise operating a centralised route manager to compute routes for at least one surveillance platform. For example, the centralised route planner might be located at a ground control station. The method may comprise operating a SAM co-located with the centralised route manager to provide the potential reward values to the centralised route manager. The method may comprise using the centralised route manager to compute a route for one or more surveillance platforms based on maximising the potential reward reported by the SAM. The surveillance platforms may travel the route computed by the centralised route manager The method may comprise implementing the exploration -exploitation algorithm using the situation awareness manager and/or sensor manager. The method may comprise probabilistically modelling a reward distribution of each area and determine the potential reward based on the modelled reward distribution.

The method may comprise issuing instructions to perform the surveillance. The method may comprise instructing the sensor controller to control the sensor to perform the surveillance. The method may comprise issuing an instruction to the sensor controller to observe a patch of ground chosen by the sensor manager. The method may comprise issuing an instruction to the sensor controller to set the field of view of the sensor. The method may comprise instructing the sensor controller to control the sensor to observe an entire area, using what is termed a wide field of view, or to observe one target, using what is termed a narrow field of view. The method may comprise directing the sensor by controlling a pan-tilt-zoom (PTZ) apparatus to direct the sensor towards a patch of ground selected to be surveyed by the sensor manager.

The method may comprise detecting targets or objects of interest within the surveyed area. The method may comprise communicating images of the surveyed area from the sensor and/or sensor controller and/or the sensor manager and/or the sensor module to the target detection module. The method may comprise detecting targets on the basis of their motion and/or appearance. The method may comprise comparing consecutive images or video frames of the surveyed area. The method may comprise detecting changes in consecutive images or video frames of the surveyed area. The method may comprise compensating for changes in the consecutive images or video frames of the surveyed area due to movement of sensor. The method may comprise using image recognition or pattern matching techniques or comparison with specifications or templates to identify targets and/or objects of interest.

The method may comprise detecting target or object motion using video moving target indication (VMTI). The method may comprise aligning successive images of a surveyed area, for example to compensate for movement of the surveillance system during the surveillance of the surveyed area. The method may comprise performing background subtraction from successive images. The method may comprise performing target or object detection followed by tracking. The method may comprise determining a target or object of interest type, for example using image recognition. The method may comprise surveying the area with the largest potential reward which is within the range and/or field of regard and/or within a proximity of the surveillance platform and/or the sensor. The method may comprise surveying the area with the largest potential reward towards which the sensor can be directed, for example without moving the surveillance platform or without changing the current navigation path of the surveillance platform. The method may comprise selecting, for surveying, an area with the largest potential reward towards which the sensor can be directed using one or more or each of pan, tilt and zoom functions of the sensors. The method may comprise using the position of the surveillance platform to determine which areas are within the range and/or field of regard and/or within proximity of the surveillance system and/or the sensor. The method may comprise using the position of the surveillance platform to determine where to direct the sensor to survey the area of largest potential.

The method may comprise evaluating the potential reward of each area, and may comprise storing the information regarding the detected targets or objects. For example, the information regarding the detected targets or objects may include one or more of: target or object position, target or object track, target or object type (e.g. person, vehicle etc.), target or object identity, target or object rewards.

The method may comprise communicating the results of the survey of the area between surveillance platforms and/or other sensor managers and/or other situation awareness managers.

According to a third aspect of the present disclosure there is provided a computer program product that, when run on a processing system and/or a surveillance system of the first aspect of the present disclosure, causes the processing system or surveillance system to implement the method of the second aspect of the present disclosure. The computer program product may be embodied on a non-transient computer readable medium.

The computer program may comprise computer-executable instructions that, when executed by a processor, enable a computer comprising the processor to perform the method of the second aspect of the present disclosure.

According to a fourth aspect of the present disclosure is a processing system comprising at least one processor, data storage and a communications system, the processing system being configured to implement the method of the second aspect.

According to a second embodiment of the present disclosure, the computer program may comprise, or may be comprised in, the computer program disclosed in GB patent application 1806410.5. GB patent application 1806410.5 is herein incorporated in its entirety by reference.

According to a fifth aspect of the present disclosure, which corresponds to the second embodiment described below, is a sensor manager for selecting a task to be carried out by a sensor, the sensor manager comprising a task list store operable to store a list of candidate tasks that can be carried out by the sensor; wherein the task list store is operable to store a description of each respective task, the description of each respective task defining an expected reward stream representing the expected rewards likely to be obtained as a function of the time for which the task is carried out; wherein the task list store comprises: a first class of tasks representing tasks that are time-sensitive, wherein a time-sensitive task that is put into effect succeeds with a certain probability or fails, wherein if the task succeeds, it delivers a stream of reward while the task is carried out, the expected rewards being characterised by the expected reward stream stored in respect of the task by the task list store, wherein the probability that the task succeeds reduces exponentially according to the time not spent in operation since a point in time identified by a timestamp stored in respect of the task; and a second class representing tasks that are non-time-sensitive, wherein a non-time-sensitive task that is put into effect delivers a stream of reward while the task is carried out, the expected rewards being characterised by the expected reward stream stored in respect of the task by the task list store; the sensor manager being operable to select one of the candidate tasks on the basis of maximising the total expected discounted reward that would be accrued if the sensor manager were to continue forever to select tasks from the list of candidate tasks currently stored in the task list store; the sensor manager being further operable to control the sensor to put the selected task into effect At least one of the tasks, e.g. a wide field of view search and track task, comprises the method of the second aspect.

It should be understood that the individual features and/or combinations of features defined above in accordance with any aspect of the present disclosure or below in relation to any specific embodiment of the disclosure may be utilised, either separately and individually, alone or in combination with any other defined feature, in

any other aspect or embodiment of the disclosure.

Furthermore, the present disclosure is intended to cover apparatus configured to perform any feature described herein in relation to a method and/or a method of using or producing, using or manufacturing any apparatus feature described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

At least one embodiment of the disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 is a schematic of a surveillance system; Figure 2 is a schematic of a sensor module of the surveillance system of Figure Figure 3 is a schematic of a sensor manager for a second embodiment, of the sensor module of Figure 2; Figure 4 is a schematic of a time sensitive task module of the sensor manager of Figure 3; Figure 5 is a schematic of a non-time sensitive task module of the sensor manager of Figure 3; Figures 6A and 6B show a region discretised into areas, and how the potential of those areas may change following surveillance of those areas; Figure 7A to 7D shows the region of Figures 6A and 6B, and how the potential of those areas may change following the detection of targets; Figures 8A and 8B show how the potential of an area may change with surveys of that area; Figure 9 shows a state diagram for the birth-death rate model; Figure 10 shows how the potential of an area changes with time following a survey of that area; and Figure 11 is a flow chart showing a simplified overview of the process for determining which of the areas of Figures 6A and 6B to survey.

DETAILED DESCRIPTION OF THE DRAWINGS

The disclosure relates to a surveillance system which is configured to autonomously or semi-autonomously choose between surveying particular areas in a region to track targets within those areas, and surveying different areas to search for new targets within the region. The autonomous or semi-autonomous choice is made using the results of the surveys of the areas and any user preferences and input.

The surveillance system comprises at least one, and preferably more than one, surveillance platforms, in this example in the form of unmanned aerial vehicles (UAVs, also known as drones). When the surveillance system includes multiple UAVs, the UAVs share the results of their surveys with each other, such that each UAV is able to optimise their next survey using all available data. The surveillance system is controlled from the ground by a user using a ground control station.

It is desirable for the surveillance system to know where targets are located within the region the surveillance system is surveying, and if possible, to track those targets. If the region is too large to be completely surveyed at all times, as is often the case, there is a compromise to be made between tracking known targets and searching for new targets.

Initially, the surveillance system may survey each area in turn, to discover the reward associated with each area. The surveillance system can then use the reward associated with each area to prioritise which areas to further survey, such that the targets within those areas can be tracked. The higher the reward obtained in previous observations of an area, the more likely it is that the surveillance system will continue to survey that area.

However, there is also value to the user of the surveillance system in resurveying areas from which little reward was previously obtained. This is because targets may have appeared in these areas since these areas were last surveyed. The longer the period of time since an area was surveyed, the higher the uncertainty in the reward present in each of those areas, and therefore the greater the value to the system in re-surveying those areas.

The surveillance system therefore uses the results of surveys of each area and the length of time since areas were previously surveyed to prioritise areas for surveying. In this way the surveillance system is able to balance the value to the user in continually surveying particular areas to track known targets within those areas, and surveying different areas to search for new targets.

The surveillance system uses a modelling system to track the number of targets which may be present in each area. In particular, each target (i.e. object of interest) may be associated with a reward and the surveillance system determines a reward distribution for each area. The reward for an area may be dependent on the number of targets in the area. The reward distribution comprises a distribution of the probability of the reward for that area. In other words, for each of a plurality of areas, the surveillance system stores and maintains a set of parameters that define the reward distribution for the area, wherein the reward distribution is a mathematical function whose argument is a reward value and whose result is a probability value, indicating the probability of obtaining the value of reward passed as argument.

The model may initially assume the expected reward in each area is high, to initially prioritise surveying each area (as the areas haven't previously been surveyed).

As each area is surveyed, the model for that area is updated using the reward obtained from the survey, because at the moment of the survey, the reward within that area becomes known. In this case, immediately after a survey of an area, the spread of the reward distribution for that area is low (i.e. the distribution is narrow as the uncertainty is low) and the distribution is based around the reward for that area determined by the survey. The reward distribution is then varied with time since the last survey towards a steady state distribution. The varying of the reward distribution with time therefore generally comprises increasing the spread of the distribution to reflect the increased uncertainty, and adjusting the mean of the distribution toward the steady state mean. For example, if an area is surveyed and no targets are found, then the potential reward associated with the area immediately after the survey is conducted, is zero; and subsequently, as time passes, the uncertainty regarding the reward that might be obtained from the area increases, and this is reflected in an increased potential reward for that area. Eventually, the modelled potential reward is sufficiently high for the system to prioritise re-surveying that area, and the model is again updated with the results of the re-survey.

In this way, the model includes a steady-state condition for each area, which is the steady-state distribution of modelled reward within that area. As time passes following a survey of an area, the model changes from the known reward in that area at the time of the survey to the steady-state distribution of reward within that area.

The steady-state distribution is also updated using the results of the survey.

This means that if an area repeatedly returns no targets, the steady-state mean will move towards zero, and the area will become less of a priority for re-surveying. Likewise, the steady-state mean of an area which repeatedly returns a high reward will tend towards a high value, and as such the area will more often be selected for surveying.

Figures 1 and 2 show schematics of a first embodiment of the surveillance system and of components thereof, Figure 6 and 7 describe the concept of the model used to prioritise areas for surveying, and Figures 8 to 11 describe a particular model. Figures 3 to 5 show additional schematics for a second embodiment, such that the second embodiment may be associated with all of figures 1 to 11.

Figure 1 shows a surveillance system 105 which comprises a control station 110 and one or more (i.e. N, where 1\11) surveillance platforms 115. In this example, each surveillance platform 115 comprises an aerial vehicle such as a drone, but it will be appreciated that the techniques described herein are also applicable to other surveillance platforms such as land based and aerial automated and manually controlled systems. In this case, although only one surveillance platform 115 is shown, it will be appreciated that the surveillance system 105 may comprise any suitable number of surveillance platforms 115 that communicate with the control station 110 via a wireless communications link 120 such as a radio communications link.

The control station 110 provides a GUI 125 that provides an interface for one or more operators (not shown) to provide manual control commands to the surveillance platforms 115 and to receive data such as images therefrom. The control station 110 further comprises a communications module 130, such as a wireless communications module for communicating with the surveillance platforms 115 via the communications link 120. In this example, the communications module 130 is configured to communicate wirelessly using radio communications, but it will be appreciated that other communications techniques could potentially be used.

Each surveillance platform 115 (e.g. each aerial drone) may comprise a corresponding communications module 135 for receiving communications (e.g. comprising commands) from the control station 110 and for providing data such as images to the control station 110 via the communications link 120.

Each surveillance platform 115 further comprises a situation awareness manager (SAM) 140, a sensor module 145 and a flight module 150. The flight module 150 controls the flight of the surveillance platform 115. The SAM 140 receives target detection reports from the sensor module 145, the target detection reports being indicative of objects of interest (i.e. targets) identified by the sensor module 145. The SAM 140 computes the values of the potential reward for the areas based on the target detection reports. The SAM 140 in turn provides potential reward values to the sensor module 145.

Commands are communicated from the operator and/or the GUI 125 to the platforms 115 along the communications link 120. Images, information and reports are sent from the platform 115 to the GUI 125 back along the communication link 120. Commands, such as target priorities, are communications from the operator and/or the GUI 125 to the SAM 140 of the platform 115, and/or to a sensor manager 155 of the sensor module 145 (see Figure 2), along the communications link 120.

Target detections, rewards and/or potentials of surveyed areas are communicated from the SAM 140 to the GUI 125 along the communications link 125.

The ground control station 110 is optionally in communication and/or control of more than one platform 115. For example, a single operator using a single ground control station may control twenty platforms 115.

The communications module 135 can also be used to share target detections, rewards and/or potentials of surveyed areas between different platforms 115 operating cooperatively and/or controlled from the ground control station 110.

The sensor module 145 is illustrated in more detail in Figure 2. In particular, the sensor module 145 comprises a sensor manager 155, sensor controller 160, sensor 165, image handler 170 and target detector 175.

The sensor controller 160 is configured to operate the sensor 165 and in particular control the sensor 165 to point to a particular location. The sensor manager 155 is in communication with the sensor controller 160 to send commands to the sensor controller 160. The commands include coordinates of a point at which the sensor 165 should be directed. The sensor 165 is a pan-tilt-zoom (PTZ) camera comprising a PTZ unit and an electro-optical/infrared (E0/IR) camera.

The sensor controller 160 is configured to receive a navigation report 180 (see Figure 1) from the flight module 150, and to use the information in the navigation report 180 to determine the relative position of the surveillance platform 115 (i.e. the aerial vehicle) to the coordinates at which the sensor manager 155 has commanded the sensor controller 160 to point the sensor 165 (i.e. the PTZ camera). The sensor controller 160 is configured to send a PTZ command to the PTZ unit, which is configured to direct the camera towards the coordinates. The sensor controller 160 is configured to command the PTZ unit in a manner which responds to the pitch and roll of the aerial vehicle, such that the PTZ camera can be directed towards the same coordinates on the ground as the aerial vehicle 115 moves.

The images of the coordinates captured by the sensor 165 (i.e. camera) are processed by the image handler 170 and then sent to the target detector 175. The target detector 175 is configured to perform image analysis, specifically Video Moving Target Indication (VMTI), on the received images to detect any targets at the coordinate. Although VMTI is provided as a useful example of an object detection process, it will be appreciated that other suitable object detection algorithms could be used. The sensor module may optionally further comprise a classifier (not shown), to classify the targets, e.g. by type, identity, and the like. Optionally, different reward values can be associated with different types of target and the classifier can be used to determine the type of target so that the appropriate reward value for the target can be determined. The target detector 175 is configured to send information regarding the detected targets to the sensor manager 155. The sensor manager 155 is configured to send the information regarding the detected targets to the SAM 140, which can then use the information on the detected targets to update the parameters of the reward distribution of the area which was surveyed. Following this, the sensor manager 155 may obtain a new list of potential rewards from the SAM, comprising a potential reward for each area presently surveyable by the sensor, and issue a new command to direct the sensor controller 160 to point the sensor at an area (which may potentially be the same area previously surveyed).

As shown in Figures 1 and 2, a communication channel is provided from the sensor controller 160, directly through the communications module 135 to the GUI 125 via the communications link 120, by which live video can be streamed, and by which the operator can take direct control of the sensor 165 if required. In addition, a local image data store 185 is provided on the platform 115 and accessible by the sensor controller 160 to store imagery developed from the sensor 165 for later download upon request. This helps avoids the situation where multiple platforms 115 swamp the communications link 120 with too much imagery.

The flight module 150 shown in Figure 1 comprises a route manager and a navigation module. The flight module 150 is configured to control the flight of the surveillance platform 115. The route manager and the navigation module are configured to communicate with each other. The navigation module is configured to send the navigation report 180 to the sensor module 145. The sensor manager 155 can then use the navigation information in the navigation report 180 received from the navigation module to determine where to direct the sensor 165. The navigation reports from the navigation module are also passed to the sensor controller 160.

The situation awareness manager (SAM) 140 is coupled to the sensor manager 155 and the route manager of the flight module 150. The situation awareness manager 140 is configured to maintain the surveillance platform's 115 knowledge of the location and reward of each target. The situation awareness manager 140 is further configured to store the parameters of a reward distribution for each of a plurality of non-overlapping areas on the ground. The situation awareness manager 140 is available to all sensors 165 on the surveillance platform 115, as well as to the flight module 150 and/or the route manager. Target detections and reward levels can be communicated from the situation awareness manager 140 to the sensor manager 155 of the sensor module 145 along a suitable communication link.

Figure 1 shows the surveillance system 105 comprising more than one platform 115, in this case N platforms, where NW. As noted above each of the platforms 115 in this example is an aerial vehicle, each of which comprises a sensor module as shown in Figure 2 that in turn comprises the sensor controller 160 for that platform 115 connected to the EO/IR sensor 165 of the platform 115. Although the platform 115 is preferably an aerial vehicle it could alternatively be ground based.

The platform 115 may optionally comprise a radar sensor module (not illustrated), in addition to or in place of the EO/IR sensor module 145, wherein the radar sensor module comprises a radar, and wherein the radar sensor module is configured to detect objects of interest on the ground, and wherein the radar sensor module is configured to send the information regarding the detected targets to the SAM 140.

The surveillance system is configured to be controlled by an operator using the control station 100, which comprises a control system that is implemented on a computer system. The SAMs 140 of the surveillance platforms 115 are configured to communicate with each other and to the computer system of the control station 110 via a wireless network that includes the wireless communications link 120.

The EO/IR sensors 165 of platforms 115 are able to survey areas of a region, as described above. The radar is able to perform radar detection of the region, or of all or some of the areas of the region, being surveyed by the platforms 115. Information defining the areas sensed by the radar, and any targets detected therein, can be sent to the SAM 140 on the platform 115 carrying the radar; and the SAM 140 may send this information to any other SAMs 140 carried by other surveillance platforms 115 with which it can communicate via the wireless network; and the SAMs 140 that receive this information may use it to update the reward distribution of the areas fully or partially sensed by the radar, to improve the surveillance of the region performed by platforms 115.

Although, in the present example, a sensor controller 160, sensor manager 155 and SAM 140 are preferably provided on each platform 115, other arrangements are possible. For, in another example, a sensor controller 160 is provided on each platform 115 but the sensor manager 155 and SAM 140 are provided at the ground control station 110. This latter arrangement may be particularly beneficial for retrofitting to certain legacy systems that may have insufficient computational capacity. However, having the sensor controller 160, sensor manager 155 and SAM 140 on each platform may result in a more robust system that can better cope with communication interruptions between the platform 115 and control centre 110.

A method of controlling a swarm of drones is explained with reference to Figure 6A and 6B, which shows a region 605 that is discretised into non-overlapping, tessellating, regular hexagons. Each hexagon represents an area 610.

In Figure 6B, the brighter an area 610 is coloured, the larger the potential of that area 610. The darker an area 610 is coloured, the lower the potential of that area 610.

Figure 6A represents the region 605 before it has been surveyed. All areas 610 have a high potential, as the uncertainty in the number of targets located within each area is high, and the areas are therefore correspondingly denoted white. As the areas 610 have not been surveyed, no information is available regarding the contents of each area 610. Each area 610 may therefore contain a large number of targets, and so the initial potential of each area 610 is set to a high value. The sensor manager 155 of the system 105 preferentially surveys areas with the highest potential and so the surveillance system 105 will initially survey unsurveyed areas 610 over previously surveyed areas 610.

Figure 6B represents the region 605 after a first pass over the region 605 by one of the surveillance platforms 115 (e.g. an aerial vehicle). The aerial vehicle 115 moves across the region 605 from left to right. As the aerial vehicle 115 moves across the region 605, it first surveys areas 615 and, in this example, detects no targets. The reward for these areas 615 is therefore set to zero, and the potential of areas 615 is also set to zero, as the potential of finding any targets in those areas 615 at that time is now known to be zero.

As the aerial vehicle 115 continues to move across region 605, the aerial vehicle 115 then surveys areas 620 and, in this example, detects no targets. The potential of areas 620 is therefore set to zero. The situation awareness manager 140 is configured to increase potential with time since an area 610 was last surveyed such that, by this point in time, the potential of areas 615 has risen slightly, as some time has passed since areas 615 were last surveyed, and targets may have appeared in areas 615 since areas 615 were surveyed.

As the aerial vehicle 115 continues further across region 605, the aerial vehicle 115 surveys areas 625, and detects no targets. The potential of areas 625 is therefore set to zero. Again, as the situation awareness manager 140 is configured to increase potential with time since an area 610 was last surveyed, by this point in time, the potential of areas 620 has risen slightly, as some time has passed since areas 620 were surveyed, and the potential of areas 615 has risen further, as targets may have appeared in areas 615 and areas 620 since areas 615 and areas 620 were surveyed. Following the pass of the aerial vehicle 115 across the region, areas 625 are coloured black, to represent that they have recently been surveyed, and contain no targets; areas 620 are coloured dark grey, to represent that areas 620 were recently surveyed and contained no targets, but there is potential for areas 620 to now contain targets; and areas 615 are coloured light grey, to represent that areas 615 were surveyed some time ago, and the potential for areas 615 to now contain targets has risen further than the potential of areas 620.

On a second pass over the region 605, the aerial vehicle 115 continues to survey previously unsurveyed areas 610, as they have the highest potential. Once the aerial vehicle 115 has passed over the region 605 a sufficient number of times to survey the whole region 605, the aerial vehicle 115 surveys areas 610 which have not been surveyed in the longest time (starting with areas 615), or areas in which targets were detected, whichever has the highest potential.

Figures 7A-5D show a region 705 which has been discretised into non-overlapping, tessellating, regular hexagons. Each hexagon represents an area 710. The region 705 is being surveyed by a surveillance system.

In Figure 7A, the white areas 710 around the outside of the region have yet to be surveyed, and therefore have a high potential. The brighter an area 710 is coloured, the larger the potential of that area 710. The darker an area 710 is coloured, the less the potential of that area 710.

Areas 715 have been surveyed, no targets were detected, and the potential of each area 715 is set to zero. Areas 720 (marked with an 'A') have been selected by the operator of the surveillance system 105 as areas of interest, meaning that any targets found within the area are to receive an elevated reward. During the course of the surveillance, targets have in fact been found within areas 720 and so the situation awareness manager 140 associates those targets with an elevated reward. The situation awareness manager 140 sums the target rewards within the areas 720 to compute a total reward and uses that to update the reward distribution associated with the areas 720. The situation awareness manager 140 thereby increases the potential reward of the areas 720 and this is indicated in Figure 7A by the lighter colour of areas 720. The surveillance system will therefore preferentially survey areas 720 over other, previously surveyed areas, for which any detected targets attract little reward.

In Figure 7B, the surveillance of the region 705 has continued, and all areas 710 have now been surveyed. The areas 725 around the outside of the region 705 which were previously not surveyed have now been surveyed, and their potentials set accordingly. A target was located in area 730 (marked '13'), and so the situation awareness manager 140 sets the potential of area 730 to be larger than the potential of the surrounding areas 725. A longer period of time has passed since areas 715 were surveyed, and so the situation awareness manager sets areas 715 to have a larger potential than areas 725, which were more recently surveyed (areas 715 are a lighter shade of grey than areas 725).

The target detection module 175 identifies a target was in area 735. The system 105 decides that the target was one of those previously observed within areas 720, the situation awareness manager 140 making this decision on the basis of the motion of the target, and/or on the basis of its appearance. The situation awareness manager 140, when computing a total reward for area 735, does so on the basis that there is one target and it has an elevated reward. Next, the situation awareness manager 140 updates the reward distribution for area 735 on the basis of the summed reward for that area; since the reward is elevated, this leads to an increased value for the potential reward within that area.

Figure 7C represents a later stage in the survey of region 705. The target detection module determines that most areas 710 did not have any targets in them, and the surveillance system has learnt from this, and the potential of these areas 710 is therefore uniformly low (most areas 710 are the same shade of grey). Area 730 is surveyed again, and no targets are detected. The situation awareness manager 140 sets the reward distribution to a delta function with all probability mass located at zero, representing the fact that were the area to be surveyed again without waiting, a reward of zero would be expected; accordingly, the potential reward associated with area 730 is zero. As time passes, the situation awareness manager 140 adjusts the reward distribution so as to drift towards its steady state. However, since there has been a historic detection in area 730, the steady state mean is higher than in the surrounding areas 710, so the potential reward is also higher (and accordingly area 730 is illustrated as a slightly lighter shade of grey). This means that once the short-term effect of observing no targets has decayed, the surveillance system is biased towards surveying areas that have historically contained targets over areas that have not. From Figure 70 it is clear that the areas of largest potential are areas 720 and 735, which the surveillance system 105 will preferentially survey. This is an example of short-term memory giving way to long term memory, discussed earlier.

Figure 7D shows the results of the surveillance of areas 720 and 735. Since areas 720 and 735 have the highest potential, they are surveyed much more than any other area 710. This means targets 740 detected in areas 720 and 735 can be tracked. Figure 7D shows the detected targets 740 and the path 745 along which the targets 740 have moved whilst they have been surveyed.

Figures 8A, 8B, 9, 10 and 11 illustrate the use of a particular model in order to select areas for surveying. In particular, Figures 8A and 8B illustrate the process of learning the steady state reward distribution; Figure 9 shows a state diagram for the birth-death rate model; Figure 10 shows how the potential of an area changes with time following a survey of that area; and Figure 11 is a flow chart showing a simplified overview of the process for determining which of the areas of Figures 6A and 6B to survey.

A simplistic overview of the method is shown in Figure 11, which can be performed by the system illustrated in Figures 1 to 5.

As per step 1105 of Figure 11, the parameters defining the reward distribution for each area 610, 710 are initialised. In particular, the shape parameter (alpha) and scale parameter (beta) described previously are initialised. They are initialised such that their product (alpha multiplied by beta), which is the expected value of the steady state mean reward, is high compared with the reward one expects to find by surveying an area; this encourages the exploration of un-explored areas as discussed previously.

In step 1110, the areas currently surveyable by the sensor (i.e. the areas currently in range of the sensor) are identified and the potential reward of each such area is computed. The procedures for computing the potential rewards are described later.

In step 1115, as described above in relation to Figure 6A-D and 7A-D, the area having the greatest potential reward, e.g. one of areas 720, is selected for surveying. In step 1120, the parameters that control the short-term reward distribution for the surveyed area are updated. Because the sensor has been tasked to observe an entire area, this will be an uncensored observation (discussed later) and the parameters that are stored comprise the reward value and the time of the survey. If objects are observed in one or more adjacent areas, as a result of the sensor footprint overlapping into the adjacent areas, then the short-term reward distribution for those areas will also be updated; this will be a censored observation (discussed later), requiring the storage of the reward value of the censored observation and the time of the survey. For censored observations, the first and second moments of the reward distribution, denoted mh and (see later), can usefully be computed and stored during this step, since these quantities will later be needed for computing the potential reward (see later); and computing them during step 1120 means that they will not need to be re-computed each time the potential rewards are computed at step 1110. If nth and are stored at this stage, it is not necessary to also store the censored reward value, since and m.,t, are sufficient for the purpose of computing the potential reward.

Step 1125 performs a test to determine whether to update the parameters of the steady state reward distribution. This step is beneficial because the update rule for updating the steady state reward distribution requires that the data used to perform the update be independent of any previous data used to perform such an update. The test comprises determining whether the time elapsed since the steady state reward distribution for the surveyed area has exceeded a fixed threshold. If sufficient time has elapsed then the algorithm moves to step 1130, otherwise it moves to step 1110.

Step 1130 updates the parameters of the steady state reward distribution for the surveyed area, comprising the shape parameter (alpha) and scale parameter (beta) of the Gamma distribution for the steady state mean reward. The equations governing this update will be described later. The effect of this update will be discussed below with respect to Figures 8A, 8B, 9 and 10.

Figure 9 shows the state transition diagram for the birth-death model. The birth-death model gives the system 105 short term memory and the updating of the steady state reward distribution (i.e. the steady state mean) gives long term memory. The appearance and disappearance of targets in each area is modelled using a birth-death model. Targets arrive in an area according to a given arrival function and also have a lifetime of the targets in the area according to a given lifetime function.

For example, targets arrive in an area following a Poisson process, which is equivalent to saying that the time between target arrivals follows an Exponential distribution. Each target has a lifetime that also follows an Exponential distribution. The arrival process is parameterised by rate A_ The mean inter-arrival time of targets in each area is 1/2 and the probability density for the inter-arrival time (the time between consecutive targets appearing) is p(t) = A exp(-At). The departure process is parameterised by rate p. The mean lifetime is 1/p and the probability density for the lifetime of a target is p(t) = p exp(-tit).

The model thus accounts for the total number of targets visible in an area, depicted in Figure 9 by the number in the circles, where each circle represents a state of an area. The arrows between states show the rates at which transitions take place. The downward rate increases with the number of targets, because the more targets there are, the greater the chance of one departing.

The model does not attempt to account for where each target arose from or how it came about that they were no longer visible. The rationale is that in a built-up area, targets could arise from within buildings or vehicles, and could disappear the same way; they are not obliged to enter from a neighbouring area.

All targets are initially given a default reward rate of 1, so the total reward in an area is identical to the number of targets. Targets having an elevated reward may be represented in the birth-death model by multiple targets each having a reward rate of 1.

It is possible to construct a state transition matrix for this model. Since the transitions occur at random times, the state transition matrix is a function of time.

Suppose that at time s an area is in state n, and we are interested in the distribution of possible states at a future time s + t. Let the probability of being in state k at time s + t be Pmk(t). This quantity is an element of the transition matrix. As the system has the Markov property, the transition probabilities do not depend on s, and it may be set to zero for convenience. The distribution of rewards at time t is represented by the vector of values: v( t) An expression for the value of Pk(t) exists, but it is not computationally convenient. More useful is the moment generating function for vn(t): M(x) = E[e] = e-9(1-ex)(1-e-i4t)(1-6-12t(1-ex)r A person skilled in the art will know that a moment generating function is an alternative representation of a probability distribution. Here, n is the state observed at time s = 0, k is the state observed at time t, A is the appearance rate of targets, p. is the reciprocal of the mean lifetime, and x is the argument of the moment generating function. The variable B = 1141 is introduced because it has an important meaning, being the mean of the steady state distribution, explained shortly.

It will be seen that as t tends towards infinity, representing the case where the area is not observed for a long period of time, the moment generating function for the birth death process tends towards the steady state, Alsteady = e8(1') and a person skilled in the art will recognise this as the moment generating function of a Poisson distribution with mean 0. This illustrates that the steady state distribution of the birth-death process is a Poisson distribution with mean 0. Further, this steady state distribution is entirely described by its mean value, 0 = A41.

We now explain the process of updating the parameters of the steady state reward distribution in step 1130. The model for the steady state reward distribution is regarded as parameterised by the value of 0, wherein the value of A is computed from the value of 9 according to 9 = A/it and wherein bt is set to a constant value.

The system begins with an initial guess for the value of 0 in each area, these being gradually improved as the surveillance proceeds. As an area is surveyed, the accuracy of the system's estimate for 51 improves. To express this increase in confidence, 9 is modelled as being a random variable drawn from a Gamma distribution. The Gamma distribution is constructed so that its variance is initially large, and the variance reduces as surveys are done.

The Gamma distribution was chosen for computational convenience, because it is the conjugate prior of the Poisson, and this leads an efficient rule for updating parameters, described as follows. Consider the case where the situation awareness manager decides to update the distribution of 0 of an area, based on the reward obtained by nth observation of the area, which we denote x". So in this case, xn is a value for the reward obtained by observing the area. The value of B is modelled as a Gamma distribution which, at the time of the update, is parameterised by shape parameter an and scale parameter fin. Assuming enough time has passed since the previous update for the reward distribution to have converged on its steady state, the reward x" on surveying the area arises from Poisson distribution parameterised by 0, wherein 8 is drawn from a Gamma distribution having parameters a, and f3,. Observation xn gives us new information and this allows us to update an and fin to compute new values a",1 and fin Applying Bayes rule leads to the result that the posterior distribution for 0 when the new observation x", is taken into account, is a Gamma distribution parameterised by an+ fin-F 1 ± fil" This update process is computationally convenient because (1) the posterior distribution fore comes from the same family of distributions as the prior, namely they are both Gamma distributions, and (2) the equations governing the updated parameters an+, and fin+, are simple to implement.

The update rule demands that the survey x" be un-correlated with previous surveys. This is done by updating the model on surveys that are separated in time by at least %. The mean target lifetime for the model is 14, so after several mean lifetimes have elapsed, the model is considered de-correlated.

These modelling decisions are illustrated in Figure 8. For the purpose of the illustration an initial guess is assumed for the steady state mean reward of E[O] = 10. This is a fairly high value, relative to the default reward of 1 for an individual target, and is chosen to encourage the system to survey areas that have not been previously surveyed. This is known as 'optimistic initialisation'.

The Gamma distribution 805 that expresses the initial model for 0 is shown in Figure 8A (a = 10"6 = 1). After five updates of the Gamma distribution parameters, if fin the observed rewards all had the value 10, we would arrive at the dotted curve 810 (a = 110,fl = 1/11). The latter (dotted) distribution is significantly narrower than the former (solid) distribution. This illustrates that the model initially exhibits uncertainty about the value of the steady state mean, and this uncertainty diminishes as more surveys of the area are conducted.

Since the distribution of the steady state mean 0 is described by a Gamma distribution, the steady state distribution of the reward in each area is described by a Gamma-Poisson mixture. That is, the uncertainty associated with the steady state mean 0 modifies the distribution of reward under the birth-death model, replacing the Poisson steady state distribution that would pertain if 0 were fixed, with a Gamma-Poisson steady state distribution. Further, if the parameters of the Gamma distribution are repeatedly updated, as described above, then the uncertainty associated with the steady state mean tends to reduce, and the steady state distribution tends towards the Poisson. These are illustrated in figure 8B.

Figure 8B shows the Gamma-Poisson distributions associated with the example of Figure 8A. The initial Gamma-Poisson distribution 815 is the steady state reward distribution corresponding to the initial distribution of 0, labelled 805 in figure 8A. The distribution after five updates is shown 820, corresponding to the distribution of 0 labelled 810 in figure 8A. Over a great many updates, the distribution tends to converge on the Poisson distribution 825. The latter distribution 825 is the Poisson distribution for a mean of 10, this being the reward distribution that would be obtained if the steady state mean reward were known to be exactly 10, rather than being drawn from a Gamma distribution.

These plots confirm that the model begins with a broad reward distribution 815, and this narrows 820 as surveys are made; and the reward distribution tends towards the Poisson 825 as the number of surveys increase.

The practical advantage of this is that the surveillance system is initially inclined to select areas that have not been explored, because those areas exhibit great uncertainty in their reward distribution and therefore they may contain large rewards.

But as an area is repeatedly observed, the system becomes more confident in its knowledge about what further reward might be obtained there, and so the surveillance system becomes less inclined to observe the area unless significant rewards have been observed there in the past.

The moment generating function for the composite Gamma-Poisson distribution may be derived: Mc(x) = (1-e-191-ex])"(1+ fl[1- ex])' wherein, as before, a and fl are the shape and scale parameters respectively for the Gamma distribution of the steady state mean of the area; and n is the state (that is, the reward) observed at a previous time s; and t is the time that has elapsed since time s, such that the current time is s + t; and it is the reciprocal of the mean lifetime as described above; and x is the argument of the moment generating function. From this we are able to compute the reward distribution for the area, also called a probability mass function, denoted vn(t). The procedure is to substitute x = co into the moment generating function Mc(x) to obtain the characteristic function, sample the characteristic function, and compute the inverse FFT. An illustrative example of this is given below.

For this illustration, the model parameters are a = 10, ,3 = 1. This indicates a steady state mean reward of 10, as an initial value not yet supported by surveys. p = 1/45 indicating that targets have a half-life of 45Log(2), which is about 30 seconds (where Log(...), represents the natural logarithm).

At time s = 0, the area (hexagon) is surveyed and no targets are found, i.e. it = 0. For the purpose of this illustration the parameters of the steady state mean distribution are not updated. Figure 10 shows how the reward distribution varies with time following this observation. At time t = 0, the model believes that if the area were to be surveyed again, the result would be the same i.e. no targets would be found. This is represented algorithmically by a reward distribution 1005 that has a probability of 1 for a reward of zero (i.e. a reward of zero is certain), and a probability of 0 for every other value of reward (i.e. no other reward value is possible). As time passes, the situation awareness manager 140 becomes gradually less confident that there are no targets, since new targets may have arrived; and so the situation awareness manager 140 varies the reward distribution varies over time as illustrated by distributions 1010, 1015, 1020. The reward distribution converges over time on the steady-state distribution 1025, where t = co, which has mean 10.

The potential reward for each area is an approximate upper quantile for the reward. From Figure 10 it is clear that an upper quantile of the distribution would tend to rise following the survey, converging to a value that depends on the steady state mean. Therefore if the area is left unsurveyed, it will tend to become more interesting to the surveillance system over time, especially if the area has previously contained targets.

We will now describe how the situation awareness manager 140 computes the potential reward of an area, based on a previous survey of the area. Suppose a survey covering the whole of a certain area delivered an observation of reward value n at time s. The potential reward at a later time s + t is estimated based on the mean and variance of the reward distribution at time s + t. The mean and variance are conveniently available, because the moment generating function for the composite distribution provides a closed-form expression for the first-and second-moments of the distribution. The mean and variance at time s + t are given by: Mean = e-mtn + (1 -e-lit)afl Variance = (1 - + afl)+ (1 -em192a,32 The potential reward is estimated based on the mean and variance using: Potential Reward = Mean + 3 x VVariance The situation awareness manager may also be configured to compute the potential reward based on a partial survey of the area. Although not essential to the invention, this is a useful feature since a partial survey often arises in practice, such as where a sensor observing one area happens to observe targets in an adjacent area. A partial survey can also arise where a target is observed in a narrow field of view, so observing one target within an area, but not the whole area. Further, a complete surveillance system may contain multiple sensors of unrelated types that are capable of reporting targets, but not capable of surveying a complete area as known to the situation awareness manager. In the language of statistics, an observation of a whole area is described as 'uncensored', in contrast to an observation of part of an area, which is 'censored'.

The situation awareness manager computes the potential reward of an area at time s + t, based on the following: a censored reward of value c at time s (this tells us that the reward in the part of the hex that was observed was c, so the reward in the whole hex must have been at least that); and an uncensored reward of value nu, at a previous time u = s -A. If it happens that there has never been an uncensored observation of this hex, then u = -oo and nu is undefined; in this case, in the computation that follows, nu has no effect on the outcome because it is multiplied by zero wherever it appears; it is therefore convenient to set it to an arbitrary value such as 0.

A potential reward is calculated using the mean and variance of the reward distribution at time s + t, based on the moments of the distribution at that time. The expressions for the moments were derived from the moment generating function for the composite Gamma-Poisson distribution, and are summarised as follows.

Let in be the rth moment of the reward distribution at some time x. If there is an observation at time x, we denote the moment just prior to the arrival of the observation n2._ and just after m. Let AT" denote a random variable representing the reward of the area at some time x, and let rtx denote a specific number for N. The situation awareness manager computes the following: ?nut+ nu 1713.-F = (nu)2 TO, = afl mos = + )6' + a,3) 774 = emamm;,-trq. = (6-4)2 + 6-4(1-e-al-)(7744 + 2in?,±m1G-0 + + (1 -e-A-1)27n, 7ns+ P(Ns CINu = nu) 77Is_ ( 1

-

rts=0P(Al, = nsiNu = c-1 77Is+ -P(N, CINu = nu)(m52- P (AT = nsiAlu = nu)ns2 n,=0 mlkt = + (1-e-190 nts2±t = in + (1 -e-in)2M,2), + e-lit (1 -ut)(mh + 2mm + ni.L) Mean = mht Variance = n2,52+t -(msh32 Potential Reward = Mean + 3 x \ Variance To compute the probability mass function PO = nsiNu = nu), the situation awareness manager uses the method described earlier in which the characteristic function for the composite Gamma-Poisson distribution is sampled, and then an inverse FFT is computed. The potential reward can also be defined using Chernoff bounds.

Within the present embodiment, censored survey results are not used to update the steady state mean distribution, so they only ever have a short-term effect. This is still valuable, since a censored survey may cause the surveillance system to observe an area (hex) that contains an important target.

A single censored observation is accommodated per area. When a second censored observation arrives for an area that already has one, we only accept the new observation if it leads to an increased potential reward.

+ (1 -e-A-4)772L c-1 Discussed below is a second embodiment which builds upon the forgoing and incorporates many of the same principles and features, but the sensor manager is capable of supporting a wider range of behaviours. For this embodiment, some of the functionality of the sensor manager 155 may be similar to that of the sensor manager described in GB1806410.5, the contents of which are incorporated by reference in their entirety as if set out in full herein. For this embodiment the sensor manager 155 is illustrated in Figure 3.

For the second embodiment, the sensor controller 160 is configured to perform the following types of task: 1. tracking an individual target using a narrow field of view, 2. surveying an area of ground using a wide field of view, 3. powering down the sensor to conserve power and the sensor manager 155 is configured decide which task to perform, and to command the sensor controller to perform the task. The second embodiment is an extension of the first, in that the task of surveying an area using a wide FOV, similar to that described above, is only one of the tasks that can be performed in the second embodiment.

For the second embodiment, each sensor manager 155 comprises a task list store 200 that administers lists of candidate tasks, storing tasks for the corresponding sensor 165 in two modules 205, 210 depending on time sensitivity, as explained below.

Similarly to the first embodiment, the sensor manager 155 is configured to send command messages, indicating where the sensor 165 will survey. The sensor manager 155 is also configured to receive potential reward values for areas from the SAM 140.

In addition, for the second embodiment, the sensor manager 155 is configured to receive target reports from the SAM, indicating the location of a target, the time at which it was detected, and the reward associated with the target; and the sensor manager 155 is configured to receive commands from the control station 110, for example, a ground control station (GCS), commanding the sensor manager to perform a particular task, or to cease performing a particular task, or to cancel either of the foregoing commands so that the sensor manager returns to its usual mode of autonomous operation.

The sensor manager 155 comprises a selector 215 that determines which candidate task, stored and managed by the task list store 200, should be selected for execution by the sensor controller 160. The selector 215 determines when a decision should be taken, and which task from the task list store 200 is selected. It conveys the decision to a switch 220 by passing an identifier of the selected task to the switch 220, which mediates between the selected task, stored in the task list store 200, and the sensor controller 160. The switch 220 is in communication with the sensor controller 160 that controls the sensor 165 in order to issue the control commands for controlling the sensor 165 to the sensor controller 160 via a communications channel 225. In other words, the switch 220 connects a selected task to the sensor controller 160. As noted above, the task list store 200 comprises two modules which manage tasks of specific types and together comprise the task list store. A time-sensitive task module 205 maintains a list of a first class of tasks that have a time-sensitive acquisition process. A non-time-sensitive task module 210 maintains a list of a second class of tasks (that is, the remaining tasks). The tasks are partitioned in this way because the selector 215 makes use of a selection index, calculated for each task, to determine which task to select on the basis of the expected reward associated with performing each task. The manner in which the selection index is calculated for time- sensitive tasks is different from the manner in which it is calculated for non-time-sensitive tasks, as described in GB1806410.5. The two modules 205, 210 comprising the task list store 200 maintain a task reward description for each task, where the task reward description differs for the time-sensitive and non-time-sensitive tasks.

The time sensitive task module 205, illustrated in Figure 4, stores a list of time sensitive candidate tasks that might be selected and put into effect by the sensor manager 155. One type of time-sensitive task that may be stored by the time-sensitive task module 205 is a tracking task. This type of task involves slewing the EO/IR sensor 165 towards a target on the ground and continuously tracking it with a narrow field of view as it moves. The process of slewing the sensor 165 towards the target, and finding the target in the resulting imagery, is called 'acquisition'. A reward is only obtained if the acquisition process succeeds in finding the correct target. Targets are often mobile objects such as ground vehicles. Accordingly, the confidence of acquiring the correct target depends on how long ago it was detected and localised. If the sensor 165 does not start executing the tracking task soon after the target was detected, it may fail to acquire the target, or acquire the wrong target. It is for this reason that tracking tasks are said to be 'time-sensitive'.

The non-time sensitive task module 210, illustrated in Figure 5, stores a list of non-time-sensitive candidate tasks that might be selected and put into effect by the sensor manager 155. An example of a non-time-sensitive task is a search task that operates as described herein. The search task performs the steps described in Figure 11, with the exception that the search task only controls the sensor when authorised by the selector 215. When put into effect by the selector, the search task commands the sensor controller 160 to make observations of a series of areas of ground with a wide field of view (i.e. wide enough to encompass the whole area, and wider than that used for the target tracking described above); and a target detector 175, such as a VMTI algorithm or a target classifier, is used to detect objects of significance or interest within the images; and the system is configured to analyse the detected objects of interest using an exploration -exploitation algorithm to determine where to survey with the sensor, e.g. similarly to that described above.

A second example of a non-time-sensitive task is a 'power down' task which, when put into effect, commands the sensor controller 160 to temporarily power down the sensor 165 to conserve energy; this may be advantageous because the useful flight time of a surveillance platform 115 may be limited by the rate at which energy is used, especially where the equipment carried by the surveillance platform 115 is powered by a battery; and so a feature that powers down the sensor 165 at times when it is likely to deliver little reward, may prolong the useful flight time of the surveillance platform 115. It will be appreciated that other types of non-time-sensitive tasks could be stored by the non-time-sensitive task module 210; for example, a different type of search task could be included.

In applications, the notion that some tasks are non-time-sensitive is only an approximation; in reality all tasks are time sensitive to a greater or lesser degree. As described in GB1806410.5, alternative embodiments are possible wherein the time sensitive' and 'non-time-sensitive' tasks might be replaced by 'first' and 'second' classes of tasks wherein both are time sensitive and wherein the tasks of one class are more (or less) time sensitive than the tasks of the other. This alternative embodiment could comprise the search algorithm described herein.

We now describe how the search task described herein is integrated into the sensor manager 155 of within the context of the second embodiment.

The search task, in common with all tasks held by the non-time-sensitive task module 210, is operable to compute on demand a 'second Gittins index' as defined in equations (10)-(12) of GB1806410.5 and explained in the text accompanying those equations. The selector 215 uses these values to select the best non-time-sensitive task as described in GB1806410.5. Computation of the second Gitfins index also involves the computation of a stopping time; the stopping time is likewise sent to the selector and is used in the process of deciding when to schedule the next decision time.

The search task, in common with all tasks held by the non-time-sensitive task module 210, is operable to compute on demand a 'fourth Gittins index' as defined in equation (22) of GB1806410.5 and explained in the text accompanying that equation.

The selector 215 uses this value to choose between the best time-sensitive task and the best non-time-sensitive task, as described in 031806410.5. Computation of the fourth Gittins index also involves the computation of a stopping time; the stopping time is likewise sent to the selector and is used in the process of deciding when to schedule the next decision time.

The surveillance system computes the second and fourth Gittins indices on the basis of the sequence of potential rewards that might be obtained should the search task be put into effect. That is, the search task obtains the sequence of potential rewards associated with tasking the sensor to observe the area having the highest potential reward, and then the second highest, and so on. The system also computes the timestamp for each observation on the basis of the length of time needed to perform each observation added to the length of time needed to slew the sensor from each area to the next.

The search task obtains the potential rewards from the SAM 140. For this purpose, the reward distribution for each area is held within the SAM 140 as described for the first embodiment. The search task causes the sensor manager 155 to send to the SAM 140 a request to compute the potential reward of one or more areas. The SAM 140 receives the request, computes the potential reward or rewards, and returns the potential reward or rewards to the search task within the non-time-sensitive task module 210. The SAM 140 updates its model for the reward distribution on the basis of observations made by the sensor 165, whether under the control of the search task or not; and each SAM 140 exchanges observation data with its peers on other surveillance platforms 115 as previously discussed.

For the purpose of computing the second and fourth Gittins indices, the potential rewards are multiplied by a scaling factor reflecting the value to the operator of searching for targets rather than tracking existing targets. Once the scaling factor is applied, the sequence is regarded as a discrete reward sequence as illustrated in Figure 5 of GB1806410.5. The discrete reward sequence may be readily manipulated to construct an accumulated reward sequence as illustrated in Figure 8 of GB1806410.5. To compute the second and fourth Gittins indices, equations (10)-(12) and (22) of GB1806410.5 are applied to the accumulated reward, denoted (,),(s), wherein s is the process time, being the time for which the task his been in effect. Since equations (10)-(12) and (22) of GB1806410.5 are concerned only with increments of reward from the current time, it is acceptable to define a local time datum such that s = 0 represents the current time, and Os) represents the accumulated reward that will be accrued at future times.

Beneficially, the surveillance system of the present disclosure can be incorporated into the system of GB patent application 1806410.5. The contents of GB patent application 1806410.5 are herein incorporated in its entirety by reference. For example, the sensor manager 155 of the present disclosure may comprise one or more features described in relation to the sensor manager! intelligent sensor management in GB 1806410.5. However, the system of GB 1806410.5 does not include the SAM 140 and does not use an exploration -exploitation algorithm in the manner described herein. The examples given herein relate to a search & track task, which is one of the tasks maintained by the non-time sensitive task module described in GB 1806410.5 (see also the description above relating to Figure 5). In this example, the search and track task uses the data supplied by the SAM 140, i.e. the potential reward values for each of the areas.

The above examples are provided by way of illustration only and a skilled person would appreciate that modifications to the above examples could be made. For example, although the above example uses a birth-death rate model, other models may be used to determine the potential number of targets in areas of a region for surveying.

Furthermore, although certain theoretical equations and approaches are described above, these are provided for understanding only and it will be appreciated that other equations or approaches may be used instead. For example, the upper quantile that serves as the 'potential reward' of an area may be calculated without approximation, by first computing the distribution of rewards in the form of a vector of probabilities.

One skilled in the art will understand how the methods described above may be adapted to survey different regions or types or region, or different areas or types of area.

One skilled in the art will understand that the functionality offered by the situation awareness manager (SAM) might be contained within a separate entity, or held within the sensor manager. The former arrangement offers the advantage that the data held within the SAM may be easier to share between sensor managers, the latter arrangement may be simpler to implement in the case where only one sensor to support on a platform. The scope of the invention covers both cases.

As such, the scope of the invention is not limited by the above examples but only by the claims.

Claims

CLAIMS1. A surveillance system that comprises a sensor manager, a target detection module, and a sensor; wherein: the target detection module receives imagery developed from the sensor and is configured to detect objects of interest within the imagery; and the sensor manager is configured to send command messages, indicating where the sensor will survey; and the system is configured to analyse the detected objects of interest using an exploration -exploitation algorithm to determine where to survey with the sensor.
2 The surveillance system of claim 1, wherein each object of interest is associated with a reward; and the system is configured with a goal of maximising or optimising the reward for detectable objects of interest within areas that are surveyable or presently surveyable using the sensor.
3. The surveillance system of claim 1 or claim 2, wherein the sensor manager is configured to determine where to survey with the sensor taking into account or balancing both searching for new objects of interest and observing already identified objects of interest.
4. The surveillance system of any preceding claim, wherein the system is configured to statistically model a reward distribution of each area and determine the potential reward based on the modelled reward distribution.
5. The surveillance system of claim 4, wherein the potential reward for an area comprises or is represented by an estimate of an upper quanfile of the reward distribution for the area.
The surveillance system of any of claims 4 or 5, configured to model the reward distribution of each area with a birth-death model which is updated based on the surveys of the respective area. 8. 9. 10. 11. 12.
The surveillance system of any preceding claim, wherein the analysis of the detected objects of interest using the exploration -exploitation algorithm comprises computing an estimate of potential reward associated with detecting targets in each of a plurality of areas; and the sensor manager is configured to identify and select an area having the greatest potential reward to be an area to be surveyed by the sensor, the potential reward being indicative of how large a reward for that area might potentially be.
The surveillance system according to claim 7 configured to compute the potential reward for a given area and time, which comprises: upon surveying a given area, adjusting the potential reward for the given area according to the results of the survey; and between surveys, varying the potential reward associated with an area towards a steady state value for that area depending on the time elapsed since the respective area was last surveyed.
The surveillance system according to claim 8, wherein adjusting the potential reward according to the results of the survey for the given area comprises setting the potential reward to be equal to or a function of the sum of the rewards for the objects of interest detected in the given area.
The surveillance system according to any of claims 7 to 9, wherein the initial potential reward of each area before the respective area is surveyed is set to a default value, or to a user defined initial value.
The surveillance system according to any of claims 7 to 10 that is configured to update the steady state potential reward of an area based on the sum of the rewards for the targets detected in the given area.
The surveillance system according to any preceding claim, further comprising a sensor controller, wherein the sensor manager is configured to communicate with a sensor controller, the sensor controller is configured to receive the command messages sent by the sensor manager and to control the sensor to perform the survey with the sensor responsive to the command messages.
13. The surveillance system according to any preceding claim, wherein the sensor is or comprises an electro-optical/infrared sensor (E0/IR sensor).
14. A surveillance system according to claim 12 or claim 13 wherein the sensor manager, sensor controller, target detection module and sensor are comprised in a sensor module and the system comprises one or more of the sensor modules.
15. The surveillance system according to claim 14 comprising one or more surveillance platforms wherein each sensor, being part of one of the sensor modules, is carried by one of the surveillance platforms.
16. The surveillance system of claim 15 dependent on any of claims 7 to 11, wherein each of the surveillance platforms comprises a situation awareness manager that is configured to: receive target detection reports from each target detection module that is part of a sensor module having its sensor on the vehicle, the target detection reports being indicative of objects of interest identified by the respective target detection module; compute the values of the potential reward based on the target detection reports; and provide potential reward values to each sensor manager that is part of a sensor module having a sensor on the vehicle
17. The surveillance system of claim 16, wherein each situation awareness manager mounted on one of the surveillance platforms transmits target detection reports to the situation awareness managers mounted on other surveillance platforms by means of a communications subsystem, and each of the respective situation awareness managers mounted on respective surveillance platforms computes an estimate of the potential reward based on target detection reports received from situation awareness managers mounted on other surveillance platforms by means of a communications subsystem.
18. The surveillance system of claim 17 further comprising a route manager wherein the situation awareness manager mounted on the surveillance platform provides the potential reward values to the route manager, and the route manager computes a route for the surveillance platform based on maximising the potential reward reported by the situation awareness manager, and the surveillance platform travels the route computed by the route manager.
19. The surveillance system of claim 4 or any claim dependent thereon, wherein the analysis of the detected objects of interest using the exploration -exploitation algorithm, the statistical modelling of the reward distribution of each area and the determination of the potential reward based on the modelled reward distribution are performed at least in part by the situation awareness manager or sensor manager.
20. The surveillance system of any preceding claim, wherein the situation awareness manager or sensor manager is configured to update the reward distribution of one or more area by receiving information from a sensor and/or sensor module which has surveyed an area, or part of an area, of the plurality of different areas
21. The surveillance system of any preceding claim, configured to control the situation awareness managers and/or the sensor modules using a control station, such as a ground control station
22. A sensor manager comprising a processing system and configured to access a data store, the processing system configured to: access a task list stored in the data store, the task list comprising a list of candidate tasks that can be carried out by the sensor, wherein the data store is operable to store a description of each respective task, the description of each respective task defining an expected reward stream representing the expected rewards likely to be obtained as a function of time for which the task is carried out; selecting one of the candidate tasks on the basis of maximising the total expected discounted reward that would be accrued if the sensor module were to continue forever to select tasks from the list of candidate tasks currently stored in the task list store; and controlling the sensor module of any of the preceding claims to put the selected task into effect.
23. A method of operating the surveillance system according to any of claims 1 to 21, the method comprising: receiving imagery, at the target detection module of the surveillance system, the imagery being developed from the sensor; detecting, using the target detection module, objects of interest within the imagery; and analysing the detected objects of interest using an exploration -exploitation algorithm to determine where to survey with the sensor.
24. A computer program configured such that, when implemented on a processing system causes the processing system to perform the method of claim 23.