WO2017060083A1

WO2017060083A1 - Integrated lighting and people counting system

Info

Publication number: WO2017060083A1
Application number: PCT/EP2016/072371
Authority: WO
Inventors: Ashish Vijay Pandharipande; David Ricardo CAICEDO FERNANDEZ
Original assignee: Philips Lighting Holding B.V.
Priority date: 2015-10-06
Filing date: 2016-09-21
Publication date: 2017-04-13

Abstract

In a people counting system, a plurality of vision sensors is arranged to provide sensor coverage of an area. Each vision sensor is arranged to provide individual sensor coverage of a portion of the area within its field of view. Each of a plurality of local image processors is connected to a respective one of the vision sensors. Each of the local image processors is configured to apply a local person detection algorithm to at least one image captured by its respective vision sensor, thereby generating a local presence metric representative of a number of people detected in the at least one image. A central processor is configured to estimate the total number of people in the area covered by the vision sensors by applying an aggregation algorithm to the local presence metrics generated by the local image processors.

Description

INTEGRATED LIGHTING AND PEOPLE COUNTING SYSTEM

TECHNICAL FIELD

The present invention relates to a people counting system, for example a connected lighting system having people counting functionality. BACKGROUND

A lighting system for illuminating an environment may comprise a plurality of luminaires, each of which, in turn, comprises a light source in the form of one or more lamps that emit configurable illumination into the environment. The lamps may for example be LED lamps, filament bulbs, gas discharge lamps etc.

The luminaires may be inter-connected so as to form a lighting network. In order to control the illumination, a gateway, such as a lighting bridge, may be connected to the network. The gateway can be used to communicate control signals via the network to each of the luminaires, for example under the control of a general-purpose computer device connected to the gateway.

The lighting network may have a mesh topology, whereby the luminaires themselves act as relays within the lighting network, relaying control signals between the gateway and other luminaires in the network. Alternatively, the network may have a star topology, whereby luminaires communicate with the gateway "directly" i.e. without relying on other luminaires to relay the control signals (though possibly via other dedicated network components). Generally, the network can have any suitable network topology e.g. based on a combination of star-like and mesh-like connections. The lighting network may for example operate in accordance with ZigBee protocols.

The luminaires, or more generally the lighting system, may also be equipped with sensor mechanisms. Historically, such sensor mechanisms have been relatively unsophisticated. For example, combinations of timers and motion sensors have been used to selectively activate luminaires in response to recently sensed movement in the environment. An example of such a motion sensor is a passive infra-red ("PIR") motion sensor, which uses infrared radiation emitted from moving bodies to detect their motion. More modern lighting systems can incorporate sensors into the lighting network, so as to allow the aggregation of sensor data from multiple sensors in the environment. Using suitable sensors, this allows the luminaires to share information on, say, occupancy, activity patterns, changes in temperature or humidity, daylight levels etc. This is sometimes referred to as connected lighting. Sensor signals may be communicated via the lighting network to the gateway, thereby making them available to (say) a general purpose computer device connected to the gateway.

SUMMARY

Existing people counting techniques have used a single camera by processing the entire image from that camera. Whilst such processing may be suitable in surveillance applications, it is not suitable for many applications, such as indoor applications where privacy aspects (both regulatory and perceived) are a concern. In other existing techniques, multiple cameras have been used, primarily in outdoor surveillance applications - again these techniques use entire images directly for people counting.

The present invention relates to automatic people counting, implemented by a central processor using data from a sensor system formed of multiple vision sensors.

Each vision sensor captures images of its local environment. However, the full images are not sent to the central controller: rather, each vision sensor only sends a limited amount of information to a central processor, in the form of a locally computed "presence metric". This information is extracted locally by local image processors, from at least one image captured by the vision sensor and is "limited" in that it is less information than the overall information content of those image(s). That is, information in the images that is superfluous in that it is not needed to estimate the people count is discarded, i.e. not included in the presence metrics.

Thus the count is estimated indirectly from the presence metrics, not from the images directly.

According to a first aspect of a present invention, a people counting system comprises: a plurality of vision sensors arranged to provide sensor coverage of an area, each arranged to provide individual sensor coverage of a portion of the area within its field of view (its "sensing area"); a plurality of local image processors, each connected to a respective one of the vision sensors; and a central processor; wherein each of the local image processors is configured to apply a local person detection algorithm to at least one image captured by its respective vision sensor, thereby generating a local presence metric representative of a number of people detected in the at least one image; and wherein the central processor is configured to estimate the total number of people in the area covered by the vision sensors by applying an aggregation algorithm to the local presence metrics generated by the local image processors.

This has a number of advantages. Firstly, the presence metric can be represented by significantly fewer bits than the image(s) from which it is computed, due to its lower information content. Thus communication between the local mage processors and central processor can be via a rate-limited communication channel.

Secondly, the information elements (i.e. presence metrics) sent by individual vision sensors conform to privacy constraints, whilst still allowing the central processor to determine the number of people over a desired area. Detected people need not be individually identifiable in the presence metrics (whereas they may be in the images themselves in some cases).

An indicator of the number of people may for example be stored in a memory accessible to the central processor and/or outputted to a user via a user interface (e.g. on a display) of the central processor. For example, the indicator may be a real-time indicator that is updated in real-time as the number of people in the area changes. Real-time means there is only a short delay (e.g. 10 seconds, or less) in updating the indicator in response to a change in the number of people in the area.

For example, in the described embodiments, only location information (e.g. in the form of a detection matrix) is transmitted to the central processor from the local image processors.

A further advantage is that the sensors are able to register the position and or presence of a person at full resolution but that there is no need to communicate the full- resolution images to the central system. Hereby, the accuracy of the people counting system can be high while the date rate and privacy sensitivity is low.

In accordance with the present invention, images obtained from the vision sensors by selectively illuminating luminaires (e.g. LED groups) are used to determine the overlap in sensing areas across the various vision sensors. That is, the luminaire

infrastructure of the lighting system is advantageously exploited to allow the amount of field of view (FoV) overlap between vision sensors to be determined automatically.

In preferred embodiments, each of the local image processors is collocated with (e.g. housed by) its respective vision sensor. In other cases, at least some of the vision sensors are not collocated by the luminaires, and are provided at some other location(s) from which the area is detectable.

Preferably the fields of view of at least two of the vision sensors overlap, wherein the central processor and/or the local image processor connected to one of those at least two vision sensors is configured to account for the overlap in applying its respective algorithm.

Such overlaps provide an effective way of eliminating sensor "blind spots" (i.e. gaps in sensor coverage within the system). The overlap is accounted for to prevent double counting of any people located in areas of sensor overlap.

Each of the local presence metrics may comprise:

one or more presence counts), each indicating a number of people detected in a respective image region of the at least one image (e.g. each may be a probability or a binary value), and/or

one or more presence scores, each indicating a likelihood that there is a person in a respective image region of the at least one image, and/or

a number of person location identifiers, each identifying a location of a person detected in the at least one image.

In a first of the described embodiments, the system operates as follows:

i. Each vision sensor communicates a block_pixel-by-block_pixel score matrix, along with its ID and a time stamp.

ii. The central processor uses knowledge of sensing region overlap of adjacent sensors to aggregate the scores reported by the vision sensors to count people, while avoiding double-counts, within a given time window.

In a second of the described embodiments, the system operates as follows: i. Each vision sensor communicates a location of people with respect to the sensor, along with its ID and a time stamp;

ii. The central processor uses knowledge of the locations of the vision sensors to aggregate the locations reported by the vision sensors to count people, while avoiding double-counts, within a given time window.

In a third of the described embodiments, the system operates as follows:

i. Each vision sensor communicates people count (scores) over non-overlapping regions as well as each sensing region that overlaps with another vision sensor, using knowledge of sensing overlap regions, along with its ID and a time stamp; ii. The central processor uses knowledge of which vision sensors overlap, thus avoiding double-counts, to aggregate the scores reported by the vision sensors to count people within a given time window.

Accounting for the overlap may comprise detecting that respective person location identifiers generated by the local image processors connected to the two vision sensors correspond to substantially the same physical location.

For example, the people counting system may comprise a memory configured to store a plurality of sensor location identifiers, each identifying a location of a respective one of the vision sensors, wherein the sensor location identifiers may be used to detect that the respective person location identifiers correspond to substantially the same physical location.

Alternatively or in addition, the people counting system may comprise a memory configured to store an indication of the overlap, which is used to account for the overlap.

At least one of the local presence metrics may comprise a plurality of components, each representative of a number of people detected in a respective image region of the at least one image. For example, the components may constitute a detection matrix.

For example, the local presence metric generated by the local image processor connected to one of the two vision sensors may comprise a plurality of components, at least one of which is representative of a number of people in an image region substantially within the field of view other of the two vision sensors, at least another of which is representative of a number of people in an image region substantially outside of the field of view of the other vision sensor.

Each local presence metric may be communicated to the central processor with an associated time stamp and/or an identifier of the respective vision sensor for use in applying the aggregation algorithm. The central processor may use the time stamp/identifier in applying the aggregation algorithm.

Each local presence metric may be generated from a plurality of images captured by the respective vision sensor over an interval of time, so as to filter out movements above a speed threshold.

Preferably the people counting system is a lighting system comprising a plurality of luminaires arranged to illuminate the area, wherein preferably each of the local image processors and its respective vision sensor are collocated with a respective one of the luminaires. In this case, the central processor may for example be a general purpose computer-device, or it may itself be formed by a further luminaire and collocated local processor. In this case, the further luminaire may also have a vision sensor and perform the same type of local image processing, so that additional luminaire acts as both the central processor and a sensor node in the system.

Collocating the vision sensors and local image processors with the luminaires allows the vision sensors to utilize the same network and power infrastructure that is used to control and power the luminaires respectively.

The portion of the area covered by each vision sensor may be directly below its respective luminaire.

A second aspect of the present invention is directed to a computer- implemented method of estimating the total number of people in an area covered by a plurality of vision sensors, each providing individual sensor coverage of a portion of the area within its field of view, wherein each of a plurality of local image processors is connected to a respective one of the vision sensors, the method comprising: receiving from each of the local image processors a local presence metric, the local presence metric having been generated by that local image processor applying a local person detection algorithm to at least one image captured by its respective vision sensor, wherein the local presence metric is representative of a number of people detected in the at least one image; and estimating the total number of people in the area by applying an aggregation algorithm to the local presence metrics received from the local image processors.

Embodiments of the method of the second aspect may comprise implementing any of the system functionality or features of the first aspect.

A third aspect of the present invention is directed to a computer program product comprising executable code stored on a computer readable storage medium and configured when executed to implement any of the methods or system functionality disclosure herein.

BRIEF DESCRIPTION OF FIGURES

For a better understanding of the present invention, and to show how embodiments of the same may be carried into effect, reference is made to the following figures, in which:

Fig. 1 is a schematic illustration of a lighting system;

Fig. 2 is a schematic block diagram of a luminaire; Fig. 3 is a perspective view of a pair of adjacent luminaires;

Fig. 3A is a plan view of part of a lighting system;

Fig. 4 is a schematic block diagram of a central processing unit for operating a lighting system;

Fig. 4A is a schematic block diagram illustrating an exemplary control architecture of a lighting system;

Figs. 5 and 5A illustrate how local image processors cooperate with a central processing unit to provide a people counting function;

Fig. 6 shows an exemplary image captured by a vision sensor in its processed form;

Figs. 7A-7C illustrate examples of different types of local presence metrics that can be computed by local image processors.

DETAILED DESCRIPTION OF EMBODIMENTS

Embodiments of the present invention are described below. These embodiments provide a vision sensing system for people counting in a connected lighting system.

A sensor system formed by multiple vision sensors in a connected system with a central processing unit offers data-enabled applications based on people counting.

Described below are (a) types of information element that can be used to be communicated from the vision sensors to the central processing unit, (b) meta-data elements that are made available at the central processing unit (and vision sensors), and (c) associated methods to derive people count based on the information elements. The system is a connected lighting system, comprising multiple luminaires, with the vision sensors at the luminaires that are connected to the central processing unit in order to count people in a given region. The vision sensors are connected to the central processing unit via a bi-directional communication link.

In a number of applications, the count of people over a particular area is desirable. For space optimization, a count of people in (pseudo) real time is needed to identify temporal and spatial usage patterns. Moreover, information regarding the number of people in an area can also be used to optimize lighting settings depending on the usage of the area, or to optimize other building systems such as HVAC (heating, ventilation and air- conditioning), or for planning and maintenance. As another example, in marketing analysis, people count may be needed as one of the input data for analysis. Traditional methods of counting people, for instance used in space

management, often employ a human to manually count the number of people in different rooms of a building. However, this method is inaccurate since it provides a time snapshot and is expensive due to the involved manpower. Other approaches utilize sensors, such as seat sensors to estimate the number of occupants seated, or infrared sensors to count the number of people moving across defined barrier regions. However, the use of such dedicated sensor modalities is expensive since extensive modification to the furniture or region is required. In the case of using infrared sensors, the system requires a narrow well-defined barrier. This approach is not suitable when there are groups of user movements, or in spaces like open- offices where such barriers are difficult to define. The other approach to people counting is using video surveillance. However such an approach is considered intrusive and unsuitable for certain applications.

Embodiments of the present invention provide a connected lighting system with a vision sensor in each luminaire. A vision sensor, optimized for cost and complexity, can still provide much richer data than a conventional PIR sensor that is commonly used in lighting systems. For privacy-preservation, each vision sensor does not provide entire images to a central processing unit when operating in real time, but only provides block pixel (soft/hard) presence decisions. Block pixel presence decisions can be provided in the form of block pixel by block pixel score matrices. A block pixel by block pixel score matrix may be an n by m matrix, wherein n and m are integers and wherein the dimension of the matrix are smaller than the dimensions of the original image. The elements in the block pixel by block pixel score matrices can be both absolute presence scores, l 's and 0's indicative of presence or no presence and probabilistic presence scores indicative of the probability of presence or no presence. The vision sensors may have overlapping field-of- views (FoVs) and sensing areas. The various information elements and associated methods to enable people counting in such a system are described below.

People counting information may then be used to enable applications such as space optimization, planning and maintenance, HVAC control, and data analytics driven marketing.

Figure 1 illustrates an exemplary lighting system 1, which comprises a plurality of luminaires 4 installed in an environment 2, arranged to emit light in order to illuminate that environment 2. A gateway 10 is shown, to which each of the luminaires 4 is connected. The gateway 10 effects control of the luminaires 4 within the lighting system 1, and is sometimes referred to as a lighting bridge. In this example, the environment 2 is an indoor space, such as one or more rooms and/or corridors (or part thereof), or a partially-covered space such as a stadium or gazebo (or part thereof). The luminaires 4 are ceiling-mounted, so as to be able to illuminate the ground (e.g. floor) below them. They are arranged in a grid along two mutually perpendicular directions in the plane of the ceiling, so as to form two substantially parallel rows of luminaires 4, each row being formed by multiple luminaires 4. The rows have an approximately equal spacing, as do the individual luminaires 4 within each row.

Multiple people 8 are shown in the environment, standing on the floor directly below the luminaires 4.

Figure 2 shows a block diagram of a luminaire 4, representing the individual configuration of each luminaire 4 in the lighting system 1. The luminaire 4 comprises at least one lamp 5 such as an LED-based lamp (one or more LEDs), gas-discharge lamp or filament bulb, plus any associated housing or support. The luminaire 4 also comprises a vision sensor 6 in the form of a camera, which is collocated with the lamp 5; a local processor (formed of one or more processing units, e.g. CPUs, GPUs etc.) 11; a network interface 7, and a local memory 13 (formed of one or more memory units, such as DMA and/or RAM units) connected to the local processor 11. The camera 6 may is able to detect radiation from the luminaires 4 when illuminating the environment, and is preferably a visible light camera. However, the use of a thermal camera is not excluded.

The vision sensor 6 is connected to supply, to the local processor 11 , raw image data captured by the vision sensor 6, to which a local person detection algorithm is applied by local image processing code 12a executed on the local processor 11. The local person detection algorithm can operate in a number of ways, examples of which are described in detail below. The local person detection algorithm generates "presence metrics", for use in a determining a person count centrally.

The local processor 11 is connected to the lamp 5, to allow local control code 12b executed on the local processor 11 it to control at least the level of illumination emitted by the lamp 5. Other illumination characteristic(s) such as color may also be controllable. Where the luminaire 4 comprises multiple lamps 5, these may be individually controllable by the local processor 11 , at least to some extent. For example, different colored lamps 5 may be provided, so that the overall color balance ban be controlled by separately controlling their individual illumination levels.

The network interface 7 may be a wireless (e.g. ZigBee, Wi-Fi, Bluetooth) or wired (e.g. Ethernet) network interface, and provides network connectivity, whereby the luminaires 4 in the lighting system 4 are able to form a lighting network and thereby connect to the gateway 10. The network can have any suitable network topology, for example a mesh topology, star topology or any other suitable topology that allows signals to be transmitted and received between each luminaire 4 and the gateway 10. The network interface 7 is connected to local processor 1 1 , so as to allow the local processor 11 to receive external control signals via the network. These control the operation of the local control code 12a, and thereby allow the illumination of the lamp 5 to be controlled externally. This connection also allows the local processor 11 to transmit images captured by the vision sensor 6, to which image quantization has been applied by the local image processing code 12a, to an external destination via the network. Image quantization is a compression technique wherein a range of values is compressed to a single quantum value.

Figure 3 shows a perspective view of a first and a second of the luminaires (4a, 4b), comprising first and second light sources 5 a, 5b and first and second vision sensors 6a, 6b, as described above. The first and second luminaires 4a, 4b are neighboring luminaires i.e. adjacent one another in the grid, along one of the directions of the grid or along one of the diagonals of the grid.

The respective lamp 5 a, 5b of each of the luminaires 4a, 4b is arranged to emit illumination towards a surface 29 (the floor in this example), thereby illuminating the surface 29 below the luminaires 4. As well as illuminating the environment, the illumination provided by the luminaires 4 renders the people 8 detectable by the vision sensors 6.

The respective vision sensor 6a, 6b of each luminaire 4a, 4b has a limited field of view. The field of view defines a volume of space, marked by dotted lines in figure 4, within which visible structure is detectable by that vision sensor 6a, 6b. Each vision sensor 6a, 6b is positioned to capture images of the respective portion (i.e. area) 30a, 30b of the surface 29 that is within its field of view ("sensing area"), directly below its respective luminaire 4a, 4b. As can be seen in figure 3, the fields of view of the first and second vision sensors 4a, 4b overlap in the sense that there is a region of space within which structure is detectable by both vision sensors 6a, 6b. As a result, one of the borders 3 OR of the sensing area 30a of the first sensor 6a is within the sensor area 32b of the second sensor 6b ("second sensing area"). Likewise, one of the borders 30L of the sensor area 32b of the second sensor 6b is within the sensor area 30a of the first sensor 6a ("first sensing area"). An area A is shown, which is the intersection of the first and second sensor areas 30a, 30b. The area A is the part of the surface 29 that is visible to both of the first and second sensors 6a, 6b ("sensor overlap"). Figure 3 A shows a plan view of a part of the lighting system 1, in which a 3x3 gird of nine luminaires 4a,...,4h is shown, each having a respective sensor area 30a,...,30h, which is the sensor area of its respective vision sensor as described above. The sensing area of each luminaire overlaps with that of each of its neighboring luminaires, in both directions along the gird and both directions diagonal to the grid, as shown. Thus every pair of neighboring luminaires (4a, 4b), (4a, 4c), (4a, 4d), (4b, 4c), ... has an overlapping sensor area. The overlapping FoVs/sensing areas of the vision sensors ensure that there are no dead sensing regions.

Although nine luminaires are shown in figure 3 A, the present techniques can be applied to lighting systems with fewer or more luminaires.

Figure 4 shows a block diagram of a central processing unit 20. The central processing unit is a computer device 20, such as a server, for operating the lighting system 1. The central processing unit 20 comprises a processor 21 (central processor), formed of e.g. one or more CPUs; and a network interface 23. The network interface 22 is connected to the central processor 21. The central processing unit 21 has access to a memory, formed of one or more memory devices, such as DMA and/or RAM devices. The memory 22 may be external or internal to the computer 20, or a combination of both (i.e. the memory 22 can, in some cases, denote a combination of internal and external memory devices), and in the latter case may be local or remote (i.e. accessed via a network). The processor 20 is connected to a display 25, which may for example be integrated in the computer device 20 or an external display.

The processor 24 is shown executing lighting system management code 24. Among other things, the lighting management applies an aggregation algorithm, to aggregate multiple local presence metrics received from different luminaires 4 so as to generate an estimate of the number of people 8 in the environment.

The network interface 23 can be a wired (e.g. Ethernet, USB, Fire Wire) or wireless (e.g. Wi-Fi, Bluetooth) network interface, and allows the central processing unit 20 to connect to the gateway 10 of the lighting system 1. The gateway 10 operates as an interface between the central processing unit 20 and the lighting network, and thus allows the central processing unit 20 to communication with each of the luminaires 4 via the lighting network. In particular, this allows the central processing unit 20 to transmit control signals (instigated by the controller 26) to each of the luminaires 4 and receive images from each of the luminaires 4 (used variously by the controller 26 and detector 25). The gateway 10 provides any necessary protocol conversion to allow communication between the central processing unit 20 and the lighting network.

Note that figures 2 and 4 are both highly schematic. In particular, the arrows denote high-level interactions between components of the luminaire 4 and central computer 20 and do not denote any specific configuration of local or physical connections.

Figure 4A shows an exemplary lighting system control architecture, in which the central processing unit 20 is connected to the gateway 10 via a packet basic network 42, which is a TCP/IP network in this example. The central processing unit 20 communicates with the gateway 10 via the packet based network 42 using TCP/IP protocols, which may for example be effected at the link layer using Ethernet protocols, Wi-Fi protocols, or a combination of both. The network 42 may for example be a local area network (business or home network), the Internet, or simply a direct wired (e.g. Ethernet) or wireless (e.g. Wi-Fi) connection between the central processing unit 20 and the gateway 10. The lighting network 44 is a ZigBee network in this example, in which the luminaires 4a, 4b, 4c,... communicate with the gateway 10 using ZigBee protocols. The gateway 10 performs protocol conversion between TCP/IP and ZigBee protocols, so that the central computer 20 can communicate with the luminaires 4a, 4b, 4c via the packet based network 32, the gateway 10 and the lighting network 44.

Note that this is exemplary, and there are many ways of effecting communication between the central computer 20 and the luminaires 4. For example communication between the computer 20 and gateway 10 may be via some other protocol, such as Bluetooth, or via some other a direct connection such as USB, Fire Wire or bespoke connection.

The memory 22 stores a database 22a. The database 22a contains a respective identifier (ID) of each vision sensor 6 (or each luminaire 4) in the lighting system 1 , which uniquely identifies that vision sensor 6 within the lighting system 1 , and an associated location identifier of that vision sensor 6; for example, a two dimensional (x,y) or three dimensional location identifier (x,y,z) (e.g. if the vision sensors are installed at different heights). The location identifier may convey only relatively basic location information, such as a grid reference denoting the position of the corresponding luminaire in the grid - e.g.

(m,n) for the mth luminaire in the nth row, or it may convey a more accurate location of the vision sensor 6 (or luminaire 4) itself, e.g. in meters or feet to any desired accuracy. The IDs of luminaires/vision sensors, and their locations, are thus known to the central processing unit 20. The memory 22 may also store additional metadata, such as an indication of the sensor overlap A, and any other sensor overlaps in the system. Alternatively or in addition some or all of the metadata 22b may be stored locally at the luminaires 4, as shown in figure 2. In this case, each luminaire 4 may only store part of the metadata that applies to that luminaire and its neighbors.

Figures 5 and 5 A illustrate how the central processor 20 and the luminaires 4 cooperate within the system 1. First, second and third luminaires 4a, 4b, 4c are shown, though this is purely exemplary.

The vision sensors 6a, 6b, 6c of each luminaire captures at least one image of its respective sensing area. The local processor 11a, 1 lb, 1 lc of that luminaire applies the local person detection algorithm to that image(s). That is, the local person detection algorithm is applied separately at each of the luminaires 4a, 4b, 4c, in parallel to generate a respective local presence metric 62a, 62b, 62c. Each of the local presence metrics 62a, 62b, 62c is transmitted to the central processing unit 20 via the networks 44, 44 and gateway 10. The images themselves are not transmitted to the central processing unit 20. In some cases, sensor overlap metadata 22b is used locally at the luminaires 4a, 4b, 4c to generate the local presence metrics.

The central processing unit 22 applies the aggregation algorithm to the presence metrics 62a, 62b, 62c in order to estimate the number of people 8 in the

environment. The aggregation algorithm generates an indicator of this number (people count)

64, which is outputted on the display 25 to user of the central processing unit 20 and/or stored in the memory 22 for later use.

The process may be real-time, in the sense that each local processor 1 la, 1 lb,

11c repeatedly generates and transmits local presence metrics as new images are captured. The people count 64 is updated as the new presence metrics are received, for example one every few (e.g. 10 or fewer) seconds. Alternatively, the process may be pseudo-real-time e.g. such that the people count 22 is updated every minute or every few minutes, or every hour

(for example), or it may be pseudo-static e.g. a "one-time" people count may be obtained in response to a count instruction from the user of the computer device 20, to obtain a snapshot of current occupancy levels manually. That is, each count may be instructed manually.

Each presence metric 62 may be generated over a time window i.e. based on multiple images within that time window. This allows movements above a certain speed, i.e. fast enough that the moving objects does not appear in all of those images, to be filtered out, i.e. so that they do not affect to the people count 64. Figure 5 A shows an exemplary image captured by the vision sensor 6a of the first luminaire 4a after the image has been processed (In this exemplary image the magnitude of the difference between a current image and a background image is shown. The darker regions indicate there is a large magnitude of difference usually due to movement and the les dark regions indicate there is a low magnitude of difference so there are no changes in the image). A larger version of the processed image 60 is shown in figure 6.

In this example, a single person 61 is detectable in the processed image 60. As discussed, the vision sensor 6a captures images of the part of the surface 29 directly below it, so the processed image 60 is a top-down view of the person 61, whereby the top of their head and shoulders are visible. Note that, in the case that the person 61 is in the sensor overlap area A, they would be similarly detectable in an image captured by the second luminaire 4b. That is the same person 61 would be simultaneously visible in images from both the first and second luminaires 4a, 4b, at different respective locations in those images.

In a first embodiment of the present invention, each vision sensor 6 (or rather the local image processor connected to that vision sensor 6) communicates a presence metric in the form of a block_pixel-by-block_pixel score matrix, along with its ID and a time stamp, to the central processing unit 20.

The block_pixel-by-block_pixel score matrix may for instance be a 10 by 10 matrix of binary values e.g. with each element a "1" or "0", indicative of presence or no presence; this choice ensures that the communication from the vision sensors to the central processing unit is low rate.

An example of this is illustrated in figure 7 A, for a 4x4 score matric 62(i). The processed image 60 is shown divided into 16 regions, which are blocks in a 4x4 grid. Each component (m,n) of a matrix corresponds to the mth portion in the nth row of the grid impose on the processed image 60, as illustrated for components (2,4) and (4,2) of the matrix 62(i). In this case, the single person 61 visible in the processed image 60 is contained mostly within the second image region in the fourth row, thus the component (4,2) of the matrix 62(i) is 1 and all other components are zero. Where a person overlaps multiple regions, the score of 1 may for example be assigned to the image region of which they occupy the greatest portion (and zero to the others), and/or to the region on which they are geometrically centered as determined by averaging pixel values.

In general, each component (m,n) may be a probability score indicative of the probability that an occupant exists in a block. The probability score may be computed over a time window, thus filtering out movements above a certain speed. The advantage of probability scores over absolute scores is that the full-resolution for determining presence or position is available at the central controller.

The central processing unit 20 collects such matrices 62(i) from all vision sensors associated with a region over which people count is of interest (e.g. all or part of the surface 29). Additionally, the central processing unit 22 has knowledge of sensing region overlap of the vision sensors, form the metadata 22b. It aggregates the individual vision sensor counts while avoiding double-counts over overlapping regions within a given time window.

The aggregation over overlapping sensing regions is performed by applying the aggregation algorithm to the presence metrics 62(i), to implement a suitable fusion rule.

As example, when soft probability scores are reported, the presence (local count) of a person in an overlapping block pixel from multiple vision sensors may be estimated using known statistical methods, e.g. maximum a posteriori (MAP).

In a second embodiment, each vision sensor 6 communicates relative location with respect to the same vision sensor of occupants, along with its ID and a time stamp, to the central processing unit 20, for example as a presence metric in the form of a location vector.

An example is illustrates in figure 7B, which shows how a single location vector 62(ii) is generated denoting the location of the single person 61 relative to the first vision sensor 6a that captures the processed image 60.

Additionally, a score associated to the occupant may be transmitted in association with each location vector 62(ii). This can be a binary value or more generally, a probability score indicative of the probability that an occupant exists at the said location. The probability score may be computed over a time window, thus filtering out movements above a certain speed.

The central processing unit 20 collects such location vectors from all vision sensors associated with the region over which people count is of interest. The central unit 20 has knowledge of the location of each vision sensor from the database 22a and therefore can translate the relative location of each occupant within each vision sensor into more general locations with the central processing unit, or some other common spatial origin.

The central processing unit 20 then aggregates the individual vision sensor counts while avoiding double-counts over overlapping regions within a given time window. For any people in sensor overlap areas, two or more of the local location vectors will correspond to the same location relative to the common origin (to within a radius threshold). This is detected by the central processing unit 20, such that multiple location vectors from different sensors 6 corresponding to the same physical location are counted once only. Note that it is also possible that the central processing unit 20 avoids double-counts based on knowledge on the regions of overlap for the individual sensors. The central processing unit then needs to know which pixels or block pixels of a first vision sensor overlap with which pixels or block pixels of a second vision sensor in order to avoid double counts accurately.

The aggregation over overlapping sensing regions is done adopting a suitable fusion rule over occupants that are in close proximity. As example, when soft probability scores are reported, the presence (local count) of a person in an overlapping region from multiple vision sensors may be estimated using known statistical methods e.g. MAP.

In a third embodiment, each vision sensor 6 has knowledge of the sensing area overlap with its neighboring vision sensors, from the locally stored metadata 22b. That is, overlap is accounted for locally by the local image processors 11 themselves to some extent.

For each non-overlapping and overlapping area, each vision sensor 6 determines a separate number of people in that area (only) and communicates this information, along with its ID and a time stamp, to the central processing unit.

An example is shown figure 7C, in in which a presence metric 62(iii) comprises separate counts for the sensor overlap region A between the first luminaire 4a and the second luminaire 4b. Note that this is a simplified example, which considers just two sensors - in general the sensing area of the first luminaire 4a can overlap with that of multiple other luminaires, and a separate count may be determined for each of these overlapping areas and for the remaining non-overlapping region. As in the first embodiment, where a person overlaps multiple sensor regions, they may for example only contribute to the count for the region of which they occupy the greatest portion.

Additionally, a score associated to the number of people in overlapping areas may be transmitted. This can be a binary value or more generally, a probability score indicative of the probability that an occupant exists at the said area. The probability score may be computed over a time window, thus filtering out movements above a certain speed.

The central processing unit 20 collects sensing results 62(iii) from all vision sensors 6 associated with the region over which people count is of interest. Additionally, the central processing unit 20 has knowledge of sensing region overlaps of the vision sensors, such as the overlap A between the first and second luminaires 4a, 4b, form the metadata 22b. It then aggregates the individual vision sensor counts while avoiding double-counts over overlapping regions within a given time window. The aggregation over overlapping sensing regions is performed using a suitable fusion rule. As example, when soft probability scores are reported, the presence (local count) of a person in an overlapping block pixel from multiple vision sensors may be estimated using known statistical methods e.g. MAP.

Whilst the above has been described with reference to an indoor lighting system, with ceiling mounted luminaires arranged in a grid, as will be apparent the techniques can be applied in general to any lighting system (indoor, outdoor or a combination of both), in which vision sensors are deployed. For example, in an outdoor space such as a park or garden. Whilst it can be convenient to collocate the sensors with the luminaires for reasons discussed, this is by no means essential, nor is there any need to have the same number of luminaires and sensors. Moreover, the techniques need not be applied in a lighting system at all.

Moreover, it should be noted for the avoidance of doubt that the above- described architecture is exemplary. For example, the techniques of this disclosure can be implemented in a more distributed fashion, e.g. without the gateway 10 or central processing unit 20. In this case, the functionality of the central processing unit 20 as described above may be implemented by the local processor 13 attached to one of the vision sensors 6 (which may or may not be collocated with a luminaire 4 in general), or distributed across multiple local processors 13.

Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Claims

CLAIMS:

1. A people counting system (1), which is a lighting system comprising:

a plurality of vision sensors (6) arranged to provide sensor coverage of an area, each arranged to provide individual sensor coverage of a portion of the area within its field of view;

a plurality of luminaires (4) arranged to illuminate the area;

a plurality of local image processors (11), each connected to a respective one of the vision sensors; and

a central processor (21);

wherein each of the local image processors is configured to apply a local person detection algorithm to at least one image (60) captured by its respective vision sensor, thereby generating a local presence metric (62) representative of a number of people detected in the at least one image;

wherein the central processor is configured to estimate the total number of people (8) in the area covered by the vision sensors by applying an aggregation algorithm to the local presence metrics generated by the local image processors; and

wherein each of the local image processors and its respective vision sensor are collocated with a respective one of the plurality of luminaires.

2. A people counting system according to claim 1, wherein the fields of view of two of the vision sensors overlap, wherein the central processor and/or the local image processor connected to one of those two vision sensors is configured to account for the overlap in applying its respective algorithm.

3. A people counting system according to any preceding claim, wherein each of the local presence metrics comprises:

one or more presence counts (62(i),62(iii)), each indicating a number of people detected in a respective image region of the at least one image, and/or

one or more presence scores, each indicating a likelihood that there is a person in a respective image region of the at least one image, and/or a number of person location identifiers (62(ii)), each identifying a location of a person detected in the at least one image.

4. A people counting system according to claims 2 and 3, wherein accounting for the overlap comprises detecting that respective person location identifiers generated by the local image processors connected to the two vision sensors correspond to substantially the same physical location.

5. A people counting system according to claim 4, comprising a memory (13, 22) configured to store a plurality of sensor location identifiers (22a), each identifying a location of a respective one of the vision sensors, wherein the sensor location identifiers are used to detect that the respective person location identifiers correspond to substantially the same physical location.

6. A people counting system according to claim 2 or any claim dependent thereon, comprising a memory (13, 22) configured to store an indication of the overlap (22b), which is used to account for the overlap.

7. A people counting system according to any preceding claim, wherein at least one of the local presence metrics comprises a plurality of components, each representative of a number of people detected in a respective image region of the at least one image.

8. A people counting system according to claim 2 or any claim dependent thereon, wherein the local presence metric (62(iii)) generated by the local image processor connected to one of the two vision sensors comprises a plurality of components, at least one of which is representative of a number of people in an image region substantially within the field of view other of the two vision sensors, at least another of which is representative of a number of people in an image region substantially outside of the field of view of the other vision sensor.

9. A people counting system according to claim 3, wherein each presence score is a probability or a binary value.

10. A people counting system according to any preceding claim, wherein each local presence metric is communicated to the central processor with an associated time stamp and/or an identifier of the respective vision sensor for use in applying the aggregation algorithm.

11. A people counting system according to any preceding claim, wherein each local presence metric is generated from a plurality of images captured by the respective vision sensor over an interval of time, so as to filter out movements above a speed threshold.

12. A people counting system according to any preceding claim, wherein each of the local processor is configured to control the output of the respective one of the luminaires.

13. A people counting system according to any preceding claim, wherein the portion of the area (30a, 30b) covered by each vision sensor is directly below its respective luminaire.

14. A computer-implemented method of estimating the total number of people (8) in an area covered by a plurality of vision sensors (6) and illuminated by a plurality of luminaires (4), each providing individual sensor coverage of a portion of the area within its field of view, wherein each of a plurality of local image processors (11) is connected to a respective one of the vision sensors and each of the local image processors and its respective vision sensor is collocated with a respective one of the luminaires, the method comprising:

receiving from each of the local image processors collocated with the respective luminaire a local presence metric (62), the local presence metric having been generated by that local image processor applying a local person detection algorithm to at least one image (60) captured by its respective vision sensor collocated with that luminaire, wherein the local presence metric is representative of a number of people detected in the at least one image; and

estimating the total number of people in the area by applying an aggregation algorithm to the local presence metrics received from the local image processors collocated with the luminaires.

15. A computer program product comprising executable code stored on a computer readable storage medium and configured when executed on a computer to implement the method of claim 14.