WO2020234757A1

WO2020234757A1 - System for detecting interactions with a surface

Info

Publication number: WO2020234757A1
Application number: PCT/IB2020/054720
Authority: WO
Inventors: Gavino PADDEU; Samuel Aldo IACOLINA; Alessandro SORO; Massimo DERIU; Carlino CASARI; Pietro ZANARINI
Original assignee: Centro Di Ricerca, Sviluppo E Studi Superiori In Sardegna Crs4 Srl Uninominale
Priority date: 2019-05-21
Filing date: 2020-05-19
Publication date: 2020-11-26
Also published as: EP3973376A1; IT201900007040A1

Abstract

System for detecting interactions with a surface, said surface being substantially flat, said interactions involving contact with or vicinity to the surface by one or more objects, said system comprising: - a number of optical sensors (C1,.....C5) aligned in front of one side of said surface (S), capable of generating view cones (FW) that are partially overlapped, independent and provided with computational capacity, generating a continuous detection zone (Z) in front of the surface, which is adapted to effect said interaction detection; - means in each independent sensor, adapted for analysing the signals coming from said optical sensors and configured for executing successive detection, recognition and event generation operations, so that said sensors can be grouped into independent modules adapted to be applied to the surface to be made interactive; - said detection operations comprising: pre-filtering, convolution and "feature- based" algorithms adapted to determine the position of said one or more objects within said continuous detection zone (Z) and to determine the type thereof by discerning among hands, fingers and objects entering a field of view of the sensors; - said recognition operations comprising: triangulation with windowing for computing the positions of the interactions among said positions, hierarchical clustering for determining the interaction type, tracking of said positions to define the variations of said positions within a period of time; - said event generation operations comprising: transformation of said positions and time variations into displayed events, thereby detecting said interactions.

Description

TITLE

“System for detecting interactions with a surface”

DESCRIPTION

Field of the invention

The present invention relates to a system for detecting interactions with a surface.

Background art

The object of the present invention concerns a system and a technology which permits making any surface become interactive and which has at least the following characteristics:

1) it permits making a surface become interactive;

2) it detects any object or body part in contact with or close to the surface, even a covered body part (e.g. a user may wear a glove);

3) it is scalable and allows covering large surfaces at will without any interruptions perceivable by touch or sight (edges, frames, or the like);

4) the topology of the system components is designed to obtain the least number of occlusions;

5) it performs sufficiently well to detect any objects being thrown against the interactive surface.

Several techniques exist which allow making a surface become interactive, and most of them are classified as touch or multi-touch techniques.

Capacitive surfaces permit detecting the touch of a finger due to a surface which is sensitive to variations in the electric field induced by current passing between the finger and the surface at the contact point. Because of the loss of signal along the electric paths, capacitive surfaces have a maximum size of approximately 2.5 metres per side.

Other technologies combine the use of multi -touch screens with large-scale projections to create big interactive screens in shared spaces, as described in the article by S. Iacolina et al.“Sinnova Social Wall: a low-cost multi-touch wall supporting visitors in a trade fair” (CHITALY 2015). Interaction is recognised by optical recognition, through a set of infrared-sensitive video cameras positioned at the bottom of the interactive surface and connected to a single controller that, through computer vision algorithms, determines the position of the fingers that come in contact with the surface. Recognition of interactions on the surface is achieved by defining a static horizon, which defines the interaction plane, for each optical sensor. Interaction is detected when an object of adequate size (greater than 2 cm) interrupts such horizon; therefore, the system can only perceive objects that cross or touch such horizon. The static horizon is positioned manually during installation, generally such that it corresponds to the half-plane near the surface to be made interactive. Recognition occurs by means of background subtraction operations: for each pixel, the corresponding horizon intensity value captured in the absence of any objects (referred to as background) is subtracted from the horizon intensity value captured by the sensors. After this subtraction operation on the linear horizon, those pixels are considered as foreground pixels (corresponding to the object that has crossed the horizon) whose subtraction exceeds a given static threshold value. For this reason, this solution is sensitive and reactive only if there is a strong contrast in the horizon between the object and the framed background, and therefore only if the intensity of the pixels corresponding to the object exceeds such given threshold. For better performance in environments without controlled brightness (poorly lit environments, variable ambient light), the system includes an infrared-light illuminator having a linear shape, positioned at the opposite edge, in front of the sensors. In this case, with the illuminator turned on, the horizon must be positioned manually above the linear segment corresponding to the lighted illuminator in the image of the video camera. DBscan is used as a clustering algorithm.

Other techniques envisage the use of an interactive tabletop integrated with a video camera and a user tracking process, in order to create interactive installations in museums and multi-user environments, as described in: Storz, M., Kanellopoulos, K., Fraas, C., and Eibl, M. Comfortable: “A tabletop for relaxed and playful interactions in museums”, in Proceedings of the Ninth ACM International Conference on Interactive Tabletops and Surfaces, ITS Ί4, ACM ( New York, NY, USA, 2014), 447-450.

A further optical approach defined as zerotouch builds a multi-touch frame by positioning infrared sensors in a rectangular frame, which can perceive the interactions occurring within.

The above-mentioned techniques, although falling within the multi-touch screen categories, cannot comply with the above-listed characteristics.

Capacitive surfaces do not fulfil requirements 1) and 2) because, in order to detect interaction, they need a current flow and direct contact between the surface and the contact part (finger or hand), so that the user may not wear gloves or any other insulating garments; for this reason, they cannot detect single ungrounded objects thrown towards the wall; moreover, their dimensions cannot exceed a certain limit because of problems in terms of signal propagation along the electric paths.

The technology described in CHITALY 2015 offers poor performance in terms of:

• configuration: since the definition of the interaction plane (horizon) is static, the topology definition method requires a calibration for each sensor and repeated calibration sessions when the sensor system becomes even only slightly misaligned or its topology changes.

• sensitivity: the DBscan clustering algorithm allows for object distinction only by analysing the dimension of the interrupted horizon, and does not permit a hierarchical classification and hence a distinction among fingers, hands and objects, which could be obtained by analysing other factors (e.g. shape, input and output speed, colour when using RGB sensors, etc.).

• modularity and scalability: the sensors are managed by a single controller, and cannot be scaled up beyond a certain limit (due to passband problems, there is a maximum limit as regards the number of sensors that can be connected to a single controller).

The zerotouch approach, while still being an optical system, does not comply with requirement 3) because, due to signal propagation problems, it is not possible to build frames exceeding approximately 3 metres in length.

Interactive tabletops and other optical systems using rear projection or a rear camera do not comply with requirement 1) because they employ sensors behind the surface, so that it is not possible to use such technologies to make an existing surface become interactive without modifying the original environment.

For better reliability, optical sensors preferably emit and receive at one frequency only, which is the same as that of the illuminator (in the infrared range...), so as to be independent of the type of environmental illumination and from environmental noise.

Each sensor sees the object from its own point of view, identifying an imaginary line running from the sensor position up to the point where the horizon has been interrupted. Repeating the process for all sensors, it is possible to obtain the positions of the objects by executing a triangulation operation (which will be described below) on the various straight lines passing through the contact point.

Summary of the invention

The present invention aims, therefore, to propose a system for detecting interactions with a surface, which can overcome all of the above-mentioned drawbacks.

In the present context, the term“surface” refers to a surface of any kind, even a large one, substantially flat, smooth or characterised by a certain roughness, whether of the active type, such as, for example, a display panel capable of autonomously projecting still or moving images, or of the passive type, i.e. capable of reflecting still or moving images projected from the outside, whether from the rear or from the front, with respect to the system installation position, or provided with an image or a pattern impressed thereon. Its orientation is preferably vertical, but other arrangements are possible as well, e.g. horizontal or oblique.

In the present context, the term“interaction” refers to contact with or vicinity to the surface by one or more objects, typically one or more fingers of one hand or of different hands of persons approaching the surface, but also different types of objects, normally having dimensions comparable with those of human fingers.

The system of the invention is based on a low-cost technology for making large interactive surfaces. Such technology, which is based on an optical approach, allows detecting the presence of multiple entities (objects or parts of the human body) near the surface and determining the position thereof, which may also be variable over time, with respect to the surface.

The system essentially comprises a series of optical sensors positioned in front of the surface plane, suitably aligned and oriented. In order to improve the sensitivity of the system, broader-band sensors are used, which operate in both the visible and the infrared ranges. In case of poor ambient lighting, it is possible to use two types of infrared-light illuminators:

• illuminator positioned on the opposite side of the surface with respect to the sensors. By way of example, the sensors are positioned in the top edge of the surface and oriented downwards, and the illuminator lies on the bottom edge; nevertheless, the opposite arrangement is possible as well. In this way, the horizon will appear bright, whereas the objects will appear dark (because they will create a shadow in the light coming from the illuminator and directed towards the video camera).

• infrared-light laser illuminator positioned in the same plane as the sensors and oriented, like the sensors, towards the surface to be made interactive. In this way, the objects will be bright, in that they will reflect the light of the illuminator, while the background will be dark.

In operation, the system detects an object (or a person’s fingers) that is touching or approaching the surface by analysing the images coming from each sensor and discerning between the outline of the object and the background. The contact-point detection procedure is thus facilitated by the high contrast between the object and the b ackground/horizon .

The sensors (and the illuminator, if any) create a thin, continuous detection zone (or beam) in front of the surface.

Detection of the contact point occurs via several image processing operations that allow:

• detecting when an object approaches the surface (as soon as it enters the field of view of the sensors)

• recognising when such object touches the surface and then crosses a linear horizon (recognised automatically), which identifies, in perspective, the surface plane.

The main goals of the invention are the following:

• offering a multi-touch, multi-user environment useful for building interactive surfaces (walls, floors and tabletops);

• detecting objects touching the surface and objects approaching the surface;

• recognising and distinguishing objects interacting with the surface, discerning among fingers, hands and objects;

• allowing the creation of large interactive surfaces by means of a modular architecture;

• offering a simple installation procedure via independent modules to be arranged on one side of the surface;

• minimizing installation costs through the use of easily available, small hardware (e.g. video cameras, microcontrollers and microcomputers).

The following will briefly describe the main innovative differences in comparison with prior-art systems described in some patents taken into consideration.

EP2122416 describes an optical sensor realized by means of an algorithmic image- processing approach to interaction computation.

In the detecting phase, sensing sub-phase, the interactive-component computation algorithm of the present invention employs a “background subtraction” algorithm for determining those image portions which correspond to objects considered to be interactive. Each sensor determines a distinction between “ foreground” and “background” by computing the image difference between a background image calculated as the average of the last n-frames and the current image.

The above-mentioned patent uses background produced by an atomic image obtained at a given moment (with the illuminator off). The present invention uses an adaptive background produced by an average of images, while the illuminator, if present, is always on.

The system of the above-mentioned patent employs two optical sensors to be used at distinct times: one sensor is used while the illuminator is on (the contact points are lit), whereas the other sensor is used while the illuminator is off (the scene framed with the illuminator off represents the background to be subtracted from the image coming from the first sensor). The difference between these two images permits computing the background. On the other hand, in the system of the present invention:

- each sensor is independent; one distinct image per sensor (and hence one background per sensor) are considered;

- there are multiple sensors positioned differently; in fact, there are N sensors located at one edge of the surface;

- there are no light/darkness phases caused by the illuminator being switched on/off: the latter, if present, is always on.

EP2443481 describes an array of infrared-light emitters and an array of sensors (sensitive to infrared light) which are used for computing the position of objects detected as interactive, with an optical approach.

The system described in such patent utilizes an illuminator only for computing the occlusions and producing a final image where the shadow point is considered as an interaction point. The system of the present invention considers the whole scene framed by the video cameras.

The system described in the above-mentioned patent uses a synchronism between the illuminator and the sensor to produce the image containing the light occlusions (interaction points). In the system of the present invention there is no synchronism, and background computation is entrusted to an adaptive“background subtraction” algorithm.

The system described in the above-mentioned patent uses optical sensors and emitters arranged along all four edges of the surface, operating in a synchronized manner: one sensor is turned on while all the other sensors are off. Conversely, in the system of the present invention:

The sensors are arranged along one edge of the surface;

The infrared-ray emitter, if present, can be positioned at will, whether on the surface side opposite to the sensors or on the same side as the sensors;

Each sensor is independent; one distinct image per sensor (and hence one background per sensor) are considered;

There are no sensor on/off phases since the sensors are always on.

EP2443472 describes a system for sensing the direction of a light source within a sensing region.

The system described in the above patent determines the position of light sources, whereas the system of the present invention detects the interaction of objects which are not light sources (i.e. they do not project self-generated light, but can reflect or block a luminous radiation);

The system described in such patent uses two distinct sensor arrays, whereas the system of the present invention uses modules containing only one sensor type (video camera sensitive to visible and infrared light).

The system described in such patent uses two distinct sensor arrays and interactions between them for calculating the position of the light source, whereas the system of the present invention uses a triangulation algorithm and the resulting hierarchical clustering for calculating the position of the object that has initiated the interaction.

EP2487624 generally describes a system for detecting the position of objects that reflect a luminous radiation, which are recognised among a predefined set of objects.

The system of the present invention envisages to recognise objects touching the surface (touch), pre-touching the surface (pre-touch) and also to recognise certain types of gestures by means of a system for recognising movements over time (touch-start, touch- end, touch-move).

The system described in the above patent employs a training algorithm for the recognition of objects, which uses a polynomial model representing a set of training points in a multi-dimensional space, whereas the system of the present invention uses a recognition algorithm based on a state machine, feature-based algorithms and a threshold system.

The present invention relates to a system for detecting interactions with a surface, said surface being substantially flat, said interactions involving contact with or vicinity to the surface by one or more objects, said system comprising:

- a number of optical sensors (Cl, . C5) aligned in front of one side of said surface

(S), capable of generating view cones (FW) that are partially overlapped, independent and provided with computational capacity, generating a continuous detection zone (Z) in front of the surface, which is adapted to effect said interaction detection;

- means in each independent sensor, adapted for analysing the signals coming from said optical sensors and configured for executing successive detection, recognition and event generation operations, so that said sensors can be grouped into independent modules adapted to be applied to the surface to be made interactive;

- said detection operations comprising: pre-filtering, convolution and “feature- based” algorithms adapted to determine the position of said one or more objects within said continuous detection zone (Z) and to determine the type thereof by discerning among hands, fingers and objects entering a field of view of the sensors;

- said recognition operations comprising: triangulation with windowing for computing the positions of the interactions among said positions, hierarchical clustering for determining the interaction type, tracking of said positions to define the variations of said positions within a period of time;

- said event generation operations comprising: transformation of said positions and time variations into displayed events, thereby detecting said interactions.

It is a particular object of the present invention to provide a system for detecting interactions with a surface as set out in the claims, which are an integral part of the present description. Brief description of the drawings

Further objects and advantages of the present invention will become apparent from the following detailed description of a preferred embodiment (and variants) thereof and from the annexed drawings, which are supplied merely by way of non-limiting example, the annexed figures highlighting some illustrative embodiments of the system of the invention.

In the drawings, the same reference numerals and letters identify the same items or components.

Detailed description of some embodiments of the invention

The system of the present invention is based on a technology conceived for making a surface become interactive. Such technology consists of a subsystem of optical sensors for interaction detection (input), a multi -projection subsystem for image visualisation (output), and a subsystem for managing interactive applications.

The system may essentially comprise the following parts:

• Multi-Sensor Subsystem

• Multi -Proj ector Sub sy stem

• Manager Subsystem

Multi-Sensor Subsystem

This subsystem uses optical sensors (Cl, . C5, Figure 1) arranged along one edge of the surface S, adjacent to and oriented towards such surface.

Each sensor C comprises an infrared-sensitive video camera VC and a microcomputer mR which, by analysing the images, can determine the position of objects and body parts (typically fingers) approaching and/or touching the surface (detecting) (see also Figure 4).

In order to increase the system’s performance, in case of strong ambient lighting it is preferable to use an infrared illuminator I oriented towards the centre of the surface and located on the side opposite to the sensors with respect to the surface (Figure 3).

Each sensor sends information about what has been detected to another microcomputer dealing with the recognition phase (recognition): a hierarchical clustering algorithm (which will be described below) identifies the real interactions by discerning them from noise and recognises the interaction events. The interaction events that can be recognised are the following: touch (touch-start, touch-move, touch-end), hand, object-hit, pre-touch. Such events are subsequently sent to the driver, which, within the kernel space of the operating system, takes care of generating the events and entering them into the event queue.

The following will describe an example of the components in use and of the phases that define the behaviour of the multi-sensor subsystem (Figures 2 and 3).

The components of the subsystem are the following:

• n optical sensors: each sensor (Cl, . C5) is composed of an infrared-sensitive video camera and a microcomputer mR that analyses the images, determining the positions where interaction has occurred and sending such information to the next component;

• 1 microcomputer (mR): it analyses what has been detected by the sensors and recognises events;

• n network devices: they provide component interconnection;

• n power supply units: they supply power to the devices;

• 1 computer: computer/workstation whereon the driver and the interactive software applications have been installed;

• power cables: they provide the power connections to the components;

• network cables: for interconnecting and exchanging data among the components;

The layout of the components is the following:

• modules: the components are organized into modules M (Figure 2), each one containing: 4 sensors, a network device NS (switch), a 5V power supply unit P (for supplying power to the 4 sensors and the switch), network and power connections;

4 optical sensors: the sensors C are arranged along one edge of the surface S, directed towards the centre, with the axis of the view cone FoV (Field of View) (Figures 1, 3) perpendicular to the surface edge, and creating a thin and continuous detection zone (or beam) FW in front of the surface.

1 network device: inside the module box, on the side opposite to the sensor with respect to the surface;

• 1 power supply unit: inside the module box, on the side opposite to the sensor with respect to the surface; • power cable:

• network cable: computer/workstation whereon the driver and the interactive software applications have been installed;

In the system, the number of optical sensors employed depends on the chosen field of view FW (aperture angle of the video camera). In order to optimize the multi-finger or multi-object discrimination power, it is preferable that the field of view FW be limited. When applying the method of the invention, in order to recognise N fingers N+l optical sensors are de facto necessary. For example, if one wants to be able to identify 100 fingers, 101 optical sensors must be included, depending on the size of the wall to be used (5 optical sensors per metre will be optimal). This will overcome the limitations of capacitive surfaces, where, as aforementioned, the signal cannot propagate over long distances.

The phases (and subphases) of the subsystem are the following:

• detecting: the sensors detect the interactions occurring on the surface;

o pre-processing: pre-filtering algorithms are applied to provide image processing for noise reduction and detail enhancement, by applying a convolution mask NxN with a Gaussian kernel (as will be described below); o sensing: a convolution is applied which determines the position where an object or a body part enters the field of view of the sensor; the information (size, position, shape) detected in this phase, which is still raw, will be subsequently cleaned and analysed in the next phase (recognition); o network: the information is sent to the component that will carry out the recognition;

• recognition: the sensors recognise the interactions occurring on the surface;

o triangulation: a triangulation algorithm (described above) calculates the positions of the interactions, transforming the information from the reference space of the sensor (camera) to the reference space of the surface;

o hierarchical clustering and sensing: a hierarchical clustering algorithm is applied (which will be described below) in order to determine the type of interaction, classifying it on the basis of several parameters (size, interaction time, shape, position taken over time); the triangulation and clustering operations are carried out by considering a window of n sensors, predefined during the calibration phase according to the topology of the subsystem; this approach makes it possible to speed up the calculations, with each window being processed in parallel in an independent manner;

o tracking: a tracking algorithm analyses the variations occurring in the position of each recognised object over the interaction time, thereby defining the events; o network: the information is sent to the component that will generate the events; • event generation: the computer, whereon the interactive applications are in execution, receives the events from the network, translates such information into events of the operating system, and enters them into the event queue;

o driver: the events are generated and entered into the event queue of the operating system;

o network: the events that the operating system cannot handle are sent to the interactive application via web socket;

The events that can be recognised (atomic interactions recognised by the system) are the following:

• standard Events: events compatible with acknowledged standards;

o touch-start: a finger begins interacting with the surface;

o touch-move: the finger moves and continues the interaction while remaining in contact with the surface;

o touch-end: the finger is no longer in contact with the surface;

• custom Events: events not compatible with the standards:

o hand-start: a hand begins interacting with the surface;

o hand-move: the hand moves and continues the interaction while remaining in contact with the surface;

o hand-end: the hand is no longer in contact with the surface;

o object-hit: an object hits the surface (atomic, not detected for a time interval);

o pre-touch: a finger approaches the surface.

Multi-Projector Subsystem

In a variant embodiment, as highlighted in Figure 7, image visualization is entrusted to a subsystem capable of projecting the images of n projectors PR1....PRn, calibrated and rectified, on a wall. Other on-surface visualization methods are also possible, such as, for example, self-projecting display or fixed pattern. The goal is to produce a series of images from the multi -projection system, generating a resulting image which will lie as much as possible within a perfect rectangle on the total interactive surface and which will show no visible discontinuity between adjacent projectors.

The components of the subsystem are the following:

• n projectors: each projector controls a portion of the interactive surface;

• 1 workstation: the workstation, whereon the interactive applications reside, takes care of dividing the total image, sending it to each projector, and controlling the deformation (warping) and the overlap of two adjacent projections (blending);

The phases of the subsystem are the following:

• blending: the workstation that handles the multi -projection process computes the image that will have to be displayed by each projector, calculating the shape of the edges and the colour intensity of the pixels in the overlapping zone; the system unites the projectors’ images through a blending algorithm that overlaps the lateral strips of adjacent projectors by approximately 20% and executes an interpolation in the overlapping zone according to the formula:

a * pow((2 * x), p) where

■ x is the normalized pixel value from 0 to 1

• 0 zone of total shadow

• 1 zone of total light

■ a (from 0 to 1) is the total blending contribution

_■ p is the (exponential) contribution of the intensity scale to the formula

The blending operates by colour band (red, green, blue) for the purpose of obtaining a better image efficiency. As a consequence, the parameters a and p will be different according to the colour and will be calibrated according to the projector type, since projectors of different brands will have different efficiencies.

• warping: for each projector, the workstation computes a transformation grid for rectifying the image that will have to be projected on the surface by using a triangle tessellation pattern. Such grid will be used for deforming the images coming from the operating system in order to solve the warping and for computing the images that, once projected, will be rectified. Manager Subsystem

This is the subsystem G (Figure 2) that must manage the events, manage the audio peripherals, allow the software applications to receive such events and, in general, interact with the application layer.

The components of the subsystem are the following:

• workstation: it handles the events that are generated by the driver and the software applications;

• multi-channel audio: a multi-channel audio system ensures that multiple users can make use of the information at the same time.

The phases of the subsystem are the following:

• event-handling: the workstation handles the events and sends them to the interactive applications. Two different behaviours are envisaged for handling two event types;

o standard event: events that comply with the touch standard (touch start, touch end, touch move) are managed natively by the operating system. Therefore, such events are directly entered into the event queue;

o non-standard event: events that cannot be managed by the operating system (hand, object hit, pre-touch) are sent to the application via web socket;

multi-user spatial audio: provided through a special API that handles audio contents (audio files, video files). For each user, the volume percentage of each loudspeaker ARI . .ARh (Figure 8) is computed on the basis of the position of the user on the surface. For example, if a user displays a video in the centre of the surface, the audio will be reproduced at the highest volume by the loudspeaker closest to the user, while the lateral loudspeaker will be muted. Vice versa, as shown in Figure 8, when two users are interacting with the system, the active loudspeakers will be those in proximity to the users, so that both users will be able to display different contents with no sound interference.

The following will describe more in detail the innovative features of the present invention.

The operations that illustrate the functioning of the system are essentially the following:

- Detection of the discontinuities (shadows) caused by the objects on the pixels that identify the interaction parts on the horizon.

- Calculation of the intersections between lines running from the position of each camera towards the points where the horizon is interrupted.

- Hierarchical clustering of the intersections to identify the position of the contact points, discerning movements made by the whole hand from those made by individual fingers.

- Tracking, reconstruction and smoothing of the trajectories followed by the objects over time.

A calibration procedure is carried out in order to determine all those system data which are still unknown, such as, for example, the position of each sensor in the surface plane during the first installation, and to update the data whenever the system is altered by external perturbations. Some examples of perturbations are micro-variations in the position of the sensors or in the position of the illuminator. The calibration procedure comprises the following steps:

Analysis of the image of each sensor in order to determine the horizon.

- Detection of the position of the sensors.

In brief, the innovative aspects of the invention comprise those described below.

Modular and scalable architecture.

The architecture of the system follows a modular approach. The system is divided into modules that can be arranged side by side to cover surfaces of any size.

Dots per centimetre / DPI (dots per inch).

In order to make the geometric calibration calculations and determine some parameters useful for parallel calculation and triangulation, it is necessary to know how many pixels fall within one centimetre (or, alternatively, one inch, DPI). In the case of projected surfaces, this value is obtained by dividing the resolution in x at the projector’s output (e.g. 1920 for a FullHD projector) by the centimetres of projection width (e.g. 220 cm).

Hierarchical clustering and sensing

The numerous intersections produced by the triangulation step are processed by a hierarchical clustering algorithm, which groups the contact points that fall within a near area. See, for example, Figure 5. The algorithm continues to group by increasingly large areas, discriminating between smaller and bigger objects, distinguishing the fingers of a hand and identifying the exact position thereof. This procedure is also useful for determining any false contact points due to the triangulation procedure. The points 51 falling within poorly populated clusters are, in fact, labelled as noise and automatically discarded.

1. The clustering algorithm is of the bottom-up type.

2. Each recognised contact point is classified as belonging to the cluster 52.

3. For each cluster (A), the cluster is analysed by repeatedly comparing it with all the others (B) for the purpose of grouping the points and classifying the group according to a link criterion; for example, the one used herein is Average Linkage

where the distance d(a,b) is calculated on the basis of the system’s topology:

- if cluster B fulfils the criterion of vicinity to A, then B is entered into the cluster of A and the centroid 54 is updated (Cartesian barycentre of all points belonging to the cluster);

- if the cluster does not fulfil the vicinity criterion, then the procedure goes on.

For performance maximisation, a parallel calculation is adopted by spatially dividing the contact points along x (the abscissa in pixels) into“windows”, since far points in x will belong to different clusters. The window separation threshold for parallel calculation is dependent on the distance in cm between the points: e.g. 50cm. In order to execute this step, it is therefore necessary to know first how many pixels are contained in one centimetre, and then make the division of pixels by dots per centimetres.

Thanks to this spatial subdivision, handled by the algorithm via a kd-tree, the complexity remains within the value 0(n x log(n)).

4. All clusters are analysed linearly:

a. if the cluster is formed by at least 4 points and complies with the system’s topology, it is classified as a hand;

b. if the cluster is formed by just one point:

i. if it is within a noise zone 51, then it is discarded;

ii. otherwise, it is considered as a finger 53.

By analysing the information received from the various sensors Cl ...Cn, (shape, size), the data are correlated in order to detect the type of the element coming in contact with the surface, detecting whether such element is an object or a hand. Each sensor, through the use of the already known feature-based SIFT (Scale Invariant Feature Transform) algorithm, analyses the images and computes the features for each object detected. These features are then correlated by the clustering algorithm, which, upon receiving the features of the objects detected by the sensors, will associate the type with the detected cluster, distinguishing between two different cluster sets: object cluster and hand cluster.

Automatic horizon determination.

One of the calibration goals is to determine the luminous horizon in the image coming from the sensors. The standard procedure still requires manual intervention: the user must manually provide two points in the form of pixel coordinates x and y, corresponding to the start and end points of the horizon, for the image of each sensor. The novelty introduced by the present invention lies in the fact that this step is automated, assuming that both the surface and the horizon have a regular and linear shape in the sensor’s field of view. The automatic recognition of the horizon occurs at preset periodic intervals, typically every 30 seconds, by means of smoothing filters and the“HoughLines” algorithm, which is per se known, e.g. as described in (http://www.ai.sri.com/pubs/files/tn036-duda71.pdf).

For each colour band (R,G,B):

1. A Gaussian noise reduction filter is applied.

2. An adaptive binary threshold filter is applied by means of a convolution mask: dst(x,y)=

a. max Value if src(x,y)>T(x,y)

b. 0 otherwise

The threshold value T(x,y) used for computing the threshold is a weighted sum of a 7x7 size matrix (kernel or matrix) observing a Gaussian distribution G (Gaussian window).

3. The pixels forming the image of the threshold filter are considered as a binary mask: anything above the threshold is considered as foreground.

4. The HoughLines algorithm is applied in order to determine the lines in the images, the longest one parallel to the image base being considered as the horizon (corresponding to the wall as viewed from the camera’s perspective).

The horizon determination procedure is an automatic procedure which is necessary upon the first configuration and whenever the system’s topology changes. Contact-point detection.

During the detecting phase, anything that crosses the horizon is considered as a contact point. The crossing of the horizon is computed by means of an adaptive background subtraction algorithm:

1. It is necessary to know the pixels that form the horizon, as computed in the previous step.

2. A horizon average is computed in a predefined time window of 5 seconds (i.e. approx. 500 frames, considering 100 frames per second, according to the selected specifications of the optical sensors and video cameras). Each pixel of a new frame will update the average:

a. if the pixel differs from the average value within a normalized value of 0.1, then it will be considered as an ambient brightness variation and will update the average by a weight of (1/500).

b. if the pixel differs from the average value in excess of a normalized value of 0.1, then the neighbourhood will be analysed:

i. if less than 20 adjacent pixels differ from the average, then it will be considered as an intersection of an object with the horizon.

ii. if more than 20 adjacent pixels differ from the average, then it will be considered as an ambient brightness variation (e.g. light turned on) and will update the average by a factor of (400/500).

Tracking.

During the tracking phase, certain information is correlated over time in order to identify the clusters over time by associating an identifier (id) with each cluster. In other terms, the trajectory of each cluster is identified by analysing the following properties of the consecutive frames:

• position

• speed

• acceleration

• number of near clusters (within a distance threshold d)

• vicinity of near clusters (mean cluster distance d-mean)

The identifier associated with each cluster is a number that is incremented every time a cluster ends its trajectory (disappears) and a next cluster is identified (appears). For each frame, the following sets are considered:

• D: set of clusters identified by the clustering algorithm

• T: set of tracked clusters (with which an id has been associated)

For each frame, the tracking algorithm must analyse the information in order to modify the set T by:

• Updating the information of the clusters that have moved

• Removing from the set those clusters which have completed their trajectory and are no longer present (when the hand is no longer in contact with the surface).

• Adding new clusters to the set (when the hand initially touches the surface).

The following steps are considered:

At frame 0, set D and set T are empty. At frame t, the clustering algorithm has identified n clusters and the tracking algorithm must analyse the n clusters to update set T, associating each cluster of set D with the clusters of set T of frame t-1. For each cluster A of set D, the following threshold formula (with weights) is applied, comparing such cluster A with each cluster B of set T at frame t-1

0.8 * diffDist * 0.5 * diffVel * 0.2 * diffAcc * 0.3 diffNCluster * 0.2 * diffDistCluster where diffDist, diffVel, diffAcc, diffNCluster, diffDistCluster are, respectively, the differences between the distance, speed, acceleration, number of near clusters and mean distance of the two clusters being compared. If the application of this formula returns a value greater than a given threshold (typically 18.456), then cluster A will be considered as not belonging to the trajectory of cluster B. The threshold value can be modified during the calibration phase. Conversely, if the comparison value is below the threshold, then cluster A will be considered as the next position of cluster B. Therefore, as aforementioned, three different cases may occur:

• Cluster A is entered into set T as the new position of cluster B, and the information (position, speed, acceleration, etc.) is updated, while the identifier id remains unchanged (the identifier of cluster B is kept, in that it has been detected that the same cluster has taken a new position over time).

• After the comparison, cluster A is not associated with any cluster of set T. In this case, it is considered as a new cluster and entered into set T with a new identifier, incrementing by one unit the last assigned identifier. This case occurs when the hand initially touches the surface. • After the comparison, cluster B is not assigned to any cluster of set D. In this case, cluster B is removed from set T, ending its trajectory. This case occurs when the hand is no longer in contact with the surface.

In the above-mentioned cases, the clustering algorithm associates three event types, respectively: touch-move, touch-start, touch-end. In addition, a further event called finger hold is considered, which is generated when a hand cluster remains in the same position for longer than two seconds.

Knowing, for each cluster, information about the type of element in contact with the surface (object or hand), as analysed by the sensing algorithm, the tracking algorithm applies the same calculations while keeping the object cluster set and hand cluster set separate, so that in addition to the above-mentioned events there will also be the following event types: object-move, object-start, object-end, object-hold.

Due to storage in the kd-tree structure, the comparison between set D and set T is made with a complexity of 0(N*log(N)).

Guided calibration.

The purpose of the calibration is to compute the position of all sensors Cl ...Cn (Figure 6) by analysing the frames belonging to the various sensors and determining the global reference system in pixel coordinates (x, y) of the screen starting from the reference system of the individual sensors in pixel coordinates of the image coming from the individual camera. The guided procedure envisages, for example, a single manual intervention for calibrating the entire system. Subsequent interventions may be required in special cases only. The single intervention consists of touching with a finger some points displayed on the screen (two per sensor) to allow the system to automatically determine the position of the sensors.

1. A series of points RI . .Rh arranged in a grid pattern are shown on the screen S (projection), offset from the edge by approx. 10 cm. The distance between one point and another in x and y is, for example, 50 cm.

2. For each point in the grid, starting from the Top-Left point to the Bottom-Right point:

a. the point is highlighted with respect to all the other ones (by changing the colour of the point on the interface), and the user is asked to place a finger on the centre of that point for 3 seconds. b. after 3 seconds, the point disappears and the next one appears.

3. When all points of the grid have been processed, the algorithm:

a. computes the angles, starting from the positions of the contact points on the horizon of the camera, in order to determine the orientation of the same video camera.

b. during the contact-point detecting phase, this information is used in order to compute the imaginary line running from the video camera up to the contact point. The imaginary line will subsequently be used by the calibration algorithm.

In summary, the object of the invention consists of a multi -touch interactive system using a multi -projection system combined with optical sensors to create a large, scalable, multi-user interactive surface.

In particular, the following innovative features must be highlighted:

• layout using a distributed system of sensors with computational capacity grouped into a sensor array;

• triangulation and hierarchical clustering algorithm for the recognition of finger and hand on a surface with the described topology: intersections are computed by positioning the sensors along one edge of the surface, oriented towards the surface that must be made interactive;

• object recognition: the system can recognise objects positioned on the interactive surface and distinguish them based on different characteristics thereof (colour, shape, size);

• definition of a multi-user environment through user events associated with specific users: by recognising and discerning between two or more users, the system allows multiple users to work independently;

• definition of a seamless, large-scale, scalable, interactive system through the application of a sensor array to a multi -projection blending system:

o large dimensions: over 6 metres, with no upper limit;

o scalable: a distributed computing system operating by exploiting the sensors’ computational capabilities makes it possible to recognise interactions and users through a real-time analysis of the large amount of data produced by the sensors; • multi-user audio:

o the information about the position of the user during the use of the interactive surface is used for calculating the volume levels of the loudspeakers positioned on the same interactive surface, providing sound spatiality.

• Tilt sensor: the system incorporates an acceleration sensor (accelerometer) to be installed on the surface near the modules, which can detect the vibrations caused by the users on the surface during use. This technology is employed for detecting shocks on the surface, generating a tilt event when the forces applied by the users are too strong. The event, captured by the software application, is used in order to freeze the interface while issuing a warning (when displays - monitors or projectors - are used) to prompt the user to apply a lighter pressure; The“tilt” event is added to the event set.

• Proximity detection: through the adoption of a depth sensor (depth camera) oriented towards the user, the system can detect the presence of proximal users (up to 5 metres away from the surface). This functionality is used in order to freeze the interface when no user is present (screensaver) and release the interface when users are detected in front of the surface. Two events,“user-detected” and“no-user”, are added to the event set.

The present invention can advantageously be implemented by means of a computer program, which comprises coding means for implementing one or more steps of the method when said program is executed by a computer. It is understood, therefore, that the protection scope extends to said computer program and also to computer-readable means that comprise a recorded message, said computer-readable means comprising program coding means for implementing one or more steps of the method when said program is executed by a computer.

The above-described example of embodiment may be subject to variations without departing from the protection scope of the present invention, including all equivalent designs known to a man skilled in the art.

The elements and features shown in the various preferred embodiments may be combined together without however departing from the protection scope of the present invention. From the above description, those skilled in the art will be able to produce the object of the invention without introducing any further construction details.

Claims

1. System for detecting interactions with a surface, said surface being substantially flat, said interactions involving contact with or vicinity to the surface by one or more objects, said system comprising:

2. System for detecting interactions with a surface as in claim 1, comprising one or more infrared illuminators capable of generating an infrared light beam within the reception frequency range of said sensors, said illuminators being:

• positioned on the surface, on that side of said surface which is opposite to the side whereon said sensors are positioned, said system detecting shadows; and/or

• positioned on the same side as said sensors, so as to illuminate an object, said system detecting the light reflections on objects close to the surface.

3. Electronic system for detecting interactions with a surface as in claim 1, comprising a system of projectors (PRl ....PRn) configured for determining said displayed events, so as to project calibrated and rectified images on said surface (S).

4. Electronic system for detecting interactions with a surface as in claim 3, wherein said system of projectors (PRl ....PRn) is configured for executing the following operations:

• blending: computation of an image that will have to be displayed by each projector, by computing the shape of the edges and the colour intensity of the pixels in the overlapping zone; union of the images of said projectors through a blending algorithm that overlaps the lateral strips of adjacent projectors and executes an interpolation in the overlapping zone according to the formula:

a * pow((2 * x), p) where

■ x is the normalized pixel value from 0 to 1

• 0 zone of total shadow

• 1 zone of total light

■ a (from 0 to 1) is the total blending contribution

_■ p is the (exponential) contribution of the intensity scale to the formula

said blending operating by colour band (red, green, blue), said parameters a and p being different according to the colour and being calibrated according to the projector type;

• warping: for each projector, computation of a transformation grid for rectifying the image that will have to be projected on said surface by using a triangle tessellation pattern, thereby obtaining a deformation of the images to obtain said calibrated and rectified images.

5. Electronic system for detecting interactions with a surface as in any one of the preceding claims, comprising calibration means configured for executing operations in order to determine initial or update data, comprising:

analysis of the image of each one of said sensors for automatically determining the horizon as a segment corresponding to the surface made interactive, viewed from the sensors’ perspective;

detection of the position of said sensors.

6. Electronic system for detecting interactions with a surface as in claim 5, wherein said automatic determination of the horizon occurs at periodic intervals set by means of smoothing filters and the“HoughLines” algorithm, such that, for each colour band R,G,B of the image coming from the sensors:

• a Gaussian noise reduction filter is applied;

• an adaptive binary threshold filter is applied by means of a convolution mask: dst(x,y)=

o max Value if src(x,y)>T(x,y)

o 0 otherwise

o The threshold value T(x,y) used for computing the threshold is a weighted sum of a 7x7 size matrix (kernel or matrix) observing a Gaussian distribution G (Gaussian window);

• the pixels forming the image of the threshold filter are considered as a binary mask: anything above the threshold is considered as foreground;

• said HoughLines algorithm is applied in order to determine the lines in the images, the longest one parallel to the image base being considered as the horizon.

7. Electronic system for detecting interactions with a surface as in claim 1, comprising a multi-user spatial audio system, comprising means for calculating the volume percentage of each loudspeaker (API ....APn) of said multi-user spatial audio system based on the position of the user on said surface, reproducing the audio at the highest volume from the loudspeaker that is closest to the user and muting the loudspeakers that are distant from the users.

8. Electronic system for detecting interactions with a surface as in claim 1, wherein said recognition operations comprise the following operations:_.

- the intersections produced by said triangulation are processed by a hierarchical clustering algorithm, which groups the contact points that fall within a near area on said surface, continuing to group by increasingly large areas, and discriminating between smaller and bigger objects, distinguishing the fingers of a hand and identifying the exact position thereof, while also determining any false contact points due to the triangulation procedure;

- said clustering algorithm being of the bottom -up type:

- each recognised contact point is classified as belonging to a cluster (52);

- for each cluster (A), the cluster is analysed by repeatedly comparing it with all the others (B) for the purpose of grouping the points and classifying the group according to a link criterion; - if the cluster (B) fulfils the criterion of vicinity to A, then B is entered into the cluster of A and a centroid (54) is updated as a Cartesian barycentre of all points belonging to the cluster;

- if the cluster does not fulfil the vicinity criterion, then the procedure goes on;

- all clusters are analysed linearly:

if the cluster is formed by at least 4 points and complies with the system’s topology, it is classified as a hand;

if the cluster is formed by just one point:

if it is within a noise zone (51), then it is discarded;

otherwise, it is considered as a finger (53).

9. Electronic system for detecting interactions with a surface as in claim 8, wherein said tracking operations comprise the correlation of information over time in order to identify said clusters over time by associating an identifier (id) with each cluster, identifying the trajectory of each cluster by analysing the following properties in consecutive frames:

• position

• speed

• acceleration

• number of near clusters (within a distance threshold d)

• vicinity of near clusters (mean cluster distance d-mean);

said identifier (id) associated with each cluster being a number that is incremented every time a cluster ends its trajectory and disappears, and a next cluster is identified and appears.