WO2010038218A9

WO2010038218A9 - Method and system of interaction between actors and surfaces through motion detection

Info

Publication number: WO2010038218A9
Application number: PCT/IB2009/054326
Authority: WO
Inventors: Duarte Felipe Oliveira Duque
Original assignee: Exva - Experts In Video Analisys, Lda
Priority date: 2008-10-03
Filing date: 2009-10-02
Publication date: 2011-03-17
Also published as: PT104205A; WO2010038218A1

Abstract

The present invention comprehends a system (5) of interaction between people (9) and any device where it is possible to exhibit (8) multimedia content (sound (7) and video (6) ) which react accordingly to the intensity and direction angle of the motion monitored by video cameras. The interaction between actors and the multimedia content is made possible by the use of techniques for detecting and tracking moving objects observed through one or more video cameras ( 1 ).

Description

Title of Invention: METHOD AND SYSTEM OF INTERACTION BETWEEN ACTORS AND SURFACES BY THE DETECTION OF

MOVEMENT

Field of the Invention:

[1] The present invention relates to an image processing and analysis system.

The invention consists of a system of interaction between actors (people) and a device, set of devices, or surface where it is possible to display multimedia content.

Background of the Invention:

[2] The interaction between an individual and any electronic device, also

Known for human-machine interaction, it is a research area with intense development and has produced numerous technologies over the years. More recent examples are touch screens, control gloves, motion sensors, among others.

[3] Despite significant progress in this area, there is still a need today for

The use of devices connected to or handled by individuals as the sole means of measuring their actions and generating their control commands over the equipment to be interacted with.

[4] A large number of devices that allow this interaction naturally (eg body movement) are already presented, but such systems do not yet allow full and free interaction without the help of any electronic or mechanical element connected to the actors. . Some solutions are based on the use of sensors, such as the use of accelerometers or sensors.

piezoelectric, actuated by direct physical contact between them and the actors.

[5] A new stream of research has focused its resources on developing solutions that enable free human-machine interaction, ie without any device connected to the user.

[6] Such an approach takes advantage of information obtained through camcorders and where, using image processing and analysis techniques, it is possible to extract descriptive information of the observed motion from a sequence of images.

[7] There are already several patent documents offering new solutions: WO 02/100094 is a system where only one user interacts with the surface. The system uses lighting and shadows (infrared cameras and infrared lighting), and the technique used for The segmentation of the individual in relation to the background is the use of a light source with wavelength in the infrared or ultraviolet range, ie outside the visible spectrum.

[8] The camcorder filters the spectrum of visible light by passing only the infrared and ultraviolet ranges.

[9] US 7,274,803 refers to a system that assumes pattern detection

(not just movements) that act as a model of interaction between an object / actor and a surface. To control the system it is necessary for the actor to use a body part where skin is detected to be recognized by the camera and thus interact with the screen (existing cursor). The system will only function through skin detection technology unlike this invention which does not use skin detection as technology to cause interaction between the actors and the surfaces.

[10] In addition to this limitation, although the system allows other people to be in front of the screen, only a single person can control the cursor.

[11] WO 99/07153 describes a system and method for controlling

software through video analysis and interpretation, and three different configurations are described.

[12] In the first embodiment of the invention, the system user monitored by a camcorder can interact with a computer generated object. The interaction between the virtual object and the user occurs whenever there is a collision between the region of the computer generated object and a region where the user's movement has been detected, information obtained by calculating the difference between images. It is therefore a simple interaction without any direction of movement information.

[13] In the second configuration, an object having a distinctive feature, such as a predefined color, is used. In this mode, by identifying the color of the object and calculating its contours, it is possible to identify the region occupied by the object as well as its orientation and may trigger an action (e.g. firing a virtual weapon).

[14] The third configuration concerns an interactive kiosk where it is possible for the

user selects an option from the on-screen menu. Selection is made by occluding the area occupied by the menu option with the user's hand area. Again, there is no information about the direction and range of motion performed by the user, thus departing from the proposed invention.

[15] US 4,843,568 discloses a system in which the human body is segmented using two techniques using a neutral background: the first technique generates the neutral background using a illuminated wall placed behind a person so that the camera can distinguish the contours of the person by subtracting the known background.

[16] The second technique is to apply a surface with a colored

It is known and uniformly placed on a table thereby distinguishing the user's hands from the known background.

[17] Finally, WO 02/43352 relates to a system and method for

identification of an object and characterization of its behavior by using images from a camcorder or storage unit. This system identifies, segmentates and classifies behaviors of a single object. It uses an object location and feature extraction technique in which the object region is identified by selecting the largest area region resulting from the background subtraction process. This system requires manual training of the system as well as providing a set of information about the object to be monitored.

[18] The present invention differs from the documents found in several aspects, as follows:

possibility of interaction with more than one user;

does not use devices to connect users to the system; nor to any known background or any source of non-visible spectrum frequency radiation, such as infrared or ultraviolet;

Camcorders do not require any filter for spectrum selection to monitor.

General description:

[19] The present invention is a system that allows users to create reactive and interactive surfaces with users, transforming the environment, resizing it and giving digital depth to spaces, thus making them interesting, reactive and interactive, that can be used in recreational, advertising or informational activities.

[20] The movement performed by the actors is observed by one or more camcorders (1), up to sixteen cameras in a preferred color or black / white embodiment. Images monitored by cameras are received by a video digitization unit (2) so that they can be used by a software component, called a motion detection and recognition module (3), which runs on a computer. (5).

[21] This module makes use of a moving object segmentation and tracking algorithm that analyzes sequences of video images. The motion analysis technique evaluates each actor's range of motion as well as its direction and then averages it. of all the interveners. As a result of Motion detection and recognition module generates for each new image acquired a displacement vector containing direction and intensity of movement of the observed set of actors.

[22] The displacement vector is transmitted to another software component, the

multimedia module (4), which may run on the same computer or on another networked computer with the former. The multimedia module generates and acts on virtual objects, reacting the environment.

[23] Depending on the nature of the application, for example game or directory, the multimedia module causes an action on the multimedia content that will be displayed to the set of actors interacting with the system. As an example, if the app consists of an audience game, the scroll vector may cause a virtual object to move, such as a basket, platform, car, animated character, or avatar. If it is a directory, the scroll vector will move a cursor to select items from the set of displayed options.

[24] Interactive experiences produced (multimedia content) are projected onto a surface of your choice (movie screen, storefront, table, floor, wall, glass or acrylic) or displayed on a screen such as CRT, TFT, LCD, LED or OLED The connection between the computer running the multimedia module (4) and the projection / display equipment (6) can be for example via VGA, S-VIDEO, USB, HDMI or DVI. This computer (5) may also control a set of actuators, for example lighting, temperature or sound control, allowing their action on the environment in which the installation is located. The connection between the computer and the control actuators may be via for example TCP / IP computer network, RS-232 network, RS-485, or Parallel Port (LPT). The sound equipment (7) is connected to the processing unit by a monaural TS or stereo or monaural TRS connector.

Brief Description of the Figures:

[25] For an easier understanding of the invention, attached are figures which represent preferred embodiments of the invention which, however, are not intended to limit the scope of the present invention.

[26] FIGURE 1. Block diagram in which (1) corresponds to a camcorder whose monitored images are sent to a computer (5). This computer comprises a video digitization unit (2) which digitizes the images and is then sent to a motion detection and recognition module (3) which analyzes the intensity and direction of movement and transmits this result to the multimedia module. (4). Sound output (7) and video projection (6) devices are also connected to the computer (5). [27] FIGURE 2. Diagram of the system placed in a movie theater where the camcorders (1) monitoring the actor group (9) are connected to a computer (5) running the detection and recognition module. of movement. The computer (5) is connected via a computer network, wired or wireless, to another computer (5) running the multimedia module, transmitting the multimedia content via a projector (6) and the speakers (7) . Projected images are displayed on the screen (8).

[28] FIGURE 3. System diagram in which a group of actors (9) is monitored by a single camcorder (1) connected to the computer (5). This computer runs the motion detection and recognition module and the multimedia module. The multimedia contents generated by the computer are displayed by a screen (10) with the aid of loudspeakers (7).

[29] FIGURE 4. Example of using the invention for multi-user gaming with

simultaneous monitoring of two groups of actors (9), so that interaction between the two groups on the same multimedia content is possible. In this configuration, a camcorder (1) monitors an actor group (9), which displays the multimedia content displayed by a projector (6) through the projection screen (8), receiving the sound through a speaker (7) . In turn, another camcorder (1) monitors a second group (9), which displays the multimedia content displayed by a projector (6) through the screen (8), while still receiving the sound through a speaker (7) .

[30] FIGURE 5. Representation of the motion detection and recognition module wherein the motion detector (14) uses the current image (12) and the image acquired at the previous time instant (11) to calculate the motion mask. (16). The contour detector (13) receives the previous image (11), generating a contour mask (15). The contour mask (15) and the motion mask (16) are processed by the displacement detector (17) which calculates the observed motion displacement vector between two consecutive images.

[31] FIGURE 6. Exemplification of the displacement vector calculation with motion pixel representation (18), source pixels (19), and partial displacement vectors (20).

Detailed Description of the Invention:

[32] The present invention will now be described in detail using the figures set forth herein. The components constituting the invention are identified by numbers in the respective figures.

[33] FIGURE 1 presents the block diagram identifying the hardware (1) (2) (5) (6) (7) and software (3) (4) components of the present invention.

Note that all hardware components used are standard equipment, that is, unmodified or manufactured for the purpose of being specifically used in the present invention.

[34] The system object of the invention is characterized in that it contains one or more color or black / white camcorders (1), which are connected to a video digitization unit (2) coupled to a computer (5). ). The acquired images are supplied to the motion detection and recognition module (3) which analyzes with each new image the movement carried out by the actors, generating a displacement vector, containing the amplitude and direction of this displacement. This vector is transmitted to the multimedia module (4) acting on computer generated virtual objects. Multimedia content is displayed to the actors via video projection equipment (6) and sound equipment (7) connected to the computer (5).

[35] As an example of a possible hardware configuration, an analog camcorder (1) can be connected to one of the inputs of a video acquisition card (2) via an RG-59 cable with BNC connectors on both. the ends. The video acquisition card, which converts analog video to digital (scan) format, is attached via a PCI connector to a motherboard equipped with RAM, a processor, and a hard drive where the system resides. as well as the two software modules (3) (4) proposed in this invention. The computer (5) further comprises a graphics processing unit, included on the motherboard itself or implemented on a graphics card, that has at least one of the following video outputs: VGA, HDMI, DVI or S-VIDEO. The graphics card is associated with the motherboard via a PCI-E or AGP connector. Video display equipment may consist of a projector (6) connected to the computer (5) via a VGA extension cable. The projector (6) should be oriented to project images onto a light-toned wall with a smooth and even surface. The camcorder (1) should be positioned close to the projection surface, facing away from the projection, ie towards the wall to the video projector, so that the set of people can be viewed from the front. that are displaying the projected images. The sound equipment (7) may consist of speakers, which are connected to the computer (5) through an audio cable with stereo TRS connector.

[36] After identifying the hardware components and defining the connections between the various devices, the detailed exposure of the software modules is continued.

[37] In this document, I _t is defined as the digital image obtained at time t.

It is further understood that a digital image is a two-dimensional representation of an image in a finite set of elements that take discrete values, organized in an array of M by N elements. Each of these elements, which stores the value of the light intensity and color characteristics of the image at that coordinate, is called a pixel. Thus, in the case of a greyscale image (obtained for example by a black / white camcorder) the intensity of one pixel in the x and y coordinates of the image is defined by I, (x, y), where 0 <x <M, 0 <y≤N and 0 <I _t (x, y) <255. The limit of 255 refers to a preferred 8-bit embodiment. In the case of a color image, it is defined by the set of the various color components, such that an image in the RGB color space is defined by I _t = {IR _t , IG _t , ffi, where IR _t (x, y), IG _t (x, y) and IB _t (x, y) represent respectively the value of the intensity of the red, green and blue components for the pixel defined by the coordinates (x, y) of the digital image. acquired at time t. The present invention is equally applicable in this situation, with the necessary adaptations, obvious to one skilled in the art.

[38] FIGURE 5 shows the motion detection and recognition module (3).

This module receives a sequence of digital images spaced by fixed and predefined time intervals, for example 40ms for PAL format. At each new capture, the current image I, (12) is recorded in memory and the image captured at the previous time I _t (11) is still kept in memory. The image I _t-1 is transferred to the contour detector (13) which applies, for each pixel of the image, the algorithm defined by the following equation:

CONTOUR (I _t-1 (x, y)) = 1 if,

11 ,. _! (x, y) - I _t-1 (xl. yl) 1> T, and

!_! (x, y) - I _t-1 (x-1, y + 1) l> T, and,

II _t -i (x, y) - It-i (x + 1, yl) 1> T, and,

H. -i (x, y) - It-i (x + 1, y + 1) l> T, and,

CONTOUR (I _t _ _! (X, y)) = 0, otherwise.

[39] Where T can be "any value between 0 and 255.

[40] As a result of applying the contour detector (13) results a contour mask (15) which has a value of 1 (one) in each image pixel if a contour is detected at that coordinate and a value of 0 (zero). ) otherwise.

[41] In parallel to the contour calculation task, the motion detector (14) is executed. The motion detector uses the current image I _t (12) and the previous image I _t-] (11) to produce a motion mask (16) that identifies pixels where a significant difference in intensity or color occurs between two consecutive images. . Motion is defined to exist in a given pixel if:

[42] with T taking a value between 0 and 255 in a preferred embodiment.

[43] Motion (16) and contour (15) masks are subsequently used by the displacement detector (17) to determine the vector specifying the

Translation (Rule 12.2.b) (i) amplitude and the steering angle of the observed movement.

[43] The displacement detector (17) initially overlaps the

motion mask to contour mask. An example of such an operation is shown in FIGURE 6. After overlapping both masks, the displacement detector performs identification of regions designated by source pixels (19), generated whenever simultaneous motion is identified on the same pixel. and contour.

[44] In the next step, for each source pixel (19), the partial horizontal and vertical shift vector (20) is calculated. The partial horizontal shift vector measures the distance, to the same value as y, defined by the number of motion pixels (18) contiguous and delimited between source pixels or between a source pixel and a non-moving pixel. Similarly, the partial vertical displacement vector is calculated, while keeping the value of x fixed.

[45] Finally, all the partial displacement vectors are summed, resulting from this operation the displacement vector, consisting of the amplitude (in pixels) and direction of the total displacement observed.

[46] The multimedia module (4), which receives the scroll vector, is a software component that can be implemented by any game-oriented programming language or three-dimensional animation, such as XNA, DirectX, Direct3D. , Flash, or OpenGL. The purpose of this module is to provide system users with two-dimensional or three-dimensional computer-generated animation, as well as to provide sound and light effects that react according to the type of movement performed by the actors.

[47] In order to facilitate the understanding of the present invention, three figures illustrating different modes of application are presented herein.

[48] FIGURE 2 demonstrates a possible embodiment of the invention which could be implemented in a movie theater where actors (9) are monitored by two camcorders (1) connected to a computer (5) in charge of them. the execution of the motion detection and recognition module. The result of the motion analysis is transmitted to another computer (5) via computer network (wired or wireless). This computer (5) runs the multimedia module which may contain a game which acts on the movement of a virtual object, eg an avatar, a basket or a car, which is displayed on the screen surface (8) through the projector. video (6) that is connected to the computer (5). By moving the arms left, right, up and down, the actors control the movement of the virtual object. The loudspeaker speakers (7) connected to the computer (5) are intended to provide the actor group (9) with a set of sound effects which also react in accordance with their movements. [49] FIGURE 3 presents another embodiment of the present invention wherein the actors

(9) are monitored by only one camcorder (1), connected to a computer (5) which is responsible for the execution of the motion detection and recognition module and the multimedia module. Through body movements (up, down, left and right) the system acts on the multimedia content, transmitted the images generated by the multimedia module to a screen.

(10) showing them to the actors (9). A set of speaker speakers (7) is also connected to the computer (5) for the purpose of transmitting the sound effects generated by the system.

[50] FIGURE 4 shows a use of the invention where it is possible to simultaneously monitor two groups of actors (9), observed by the camcorders (1), so that interaction between the two groups on the same multimedia content is possible; for example a multi-group game. Such a system configuration could be used, for example, in a football stadium where one group (9) would be represented by supporters of one team and another group (9) by supporters of the opposing team. In this configuration, each group of actors (9) is monitored by a camcorder (1). Both cameras are connected to the computer (5) running the motion detection and recognition module and the multimedia module. Connected to this computer (5) are speaker speakers (7), where one speaker column (7) emits sound effects to one of the groups (9) and the other speaker column (7) emits sound effects. to the other group (9). The video projectors (6) are connected to the computer (5), projecting the images generated by the multimedia module to the screens (8). The content of the images projected by (6) may differ so that each group of actors has its own perspective on the virtual object it controls.

Claims

Method of interaction between actors and surfaces by motion detection comprising the following steps:

The. sequentially capturing said actors (9) through one or more video cameras (1);

B. scanning said images (2), except if the camcorders (1) have already done so;

ç. detecting and identifying (3) the direction of movement by generating displacement vectors from each of the observed actors;

d. act (4) on virtual objects, or initiate predefined actions via said offset vectors.

Method according to the preceding claim, characterized in that said movement detection and identification (3) further comprises the steps of:

The. calculating (14) a motion mask (16) through the current image (12) compared to the previous image (11);

B. calculating (13) a contour mask (15) through the previous image (11);

ç. calculate (17) one or more displacement vectors from said

Method according to the preceding claim, characterized in that in said displacement vector calculation (17) further comprises the steps of:

The. overlapping movement masks (16) and contours (15);

B. identifying source pixels (19), whenever the same existence of motion and contour is identified in the same pixel; ç. For each source pixel (19), the horizontal displacement partial vector (20) and the vertical displacement partial vector are calculated by the horizontal or vertical distance, respectively, defined by the number of contiguous motion pixels (18). and delimited between source pixels or between a source pixel and a non-moving pixel;

d. sum all the partial displacement vectors, resulting in the displacement vector, consisting of the amplitude, in pixels, and direction of the total displacement observed.

Method according to claim 2 or 3, characterized in that the contour mask (15) is calculated by means of equation (I): CONTOUR (I _tl (x, y)) = 1 if,

II _t -i (x, y) - I _t -i (xl, yl) l> T, and,

Ht-i (x, y) - I _t -i (xl, y + l) l> T, and, (I)

II _t -i (x, y) - I _t- (x + 1, yl) l> T, and,

ll _t -i (x, y) - l _t- i (x + 1, y + 1) l> T

CONTOUR (I _tl (x, y)) = O, otherwise;

wherein T can take any value between 0 and the maximum intensity of a pixel, and which current image is represented by I _t (12) and the image captured at the previous time instant is represented by I _t-1 (11). Method according to either claim 2 or claim 3, characterized in that the motion mask (16) is calculated by equation (II):

wherein T may take any value between 0 and the maximum intensity of a pixel, which is represented by the current image I _t (12) the captured image and the previous time point _t is represented by I _i (11). System of interaction between actors and surfaces by motion detection characterized by performing the method referred to in

claim 1 and comprising:

The. one or more video cameras (1);

B. optionally a digitizer module (2) of the images;

ç. a detector and identifier module (3) of displacement vectors of each of the observed actors;

d. an actuator module (4) on virtual objects or triggering predefined actions.

A system according to the preceding claim comprising: one or more computers (5), a screen (9) and one or more camcorders (1).

System according to either of the preceding claims, characterized in that said motion detection and recognition module (3) calculates the displacement vector of the movement carried out by the actors, generating a value of displacement amplitude and direction angle.

System according to any one of the preceding three claims, characterized in that said actuator module (4) which receives a displacement vector, containing the amplitude and angle of direction of movement, performs according to said one. information, a particular action on multimedia content. System according to the preceding claim, characterized in that it changes the path of computer generated objects or selects multimedia content.

System according to the preceding claim, characterized in that it acts on virtual objects by rotational movements, or by translating them or by both actions.

System according to any one of claims 6 to 11, characterized in that it has a video or screen projection equipment connected to the computer running the multimedia module, where the generated multimedia contents are displayed to the actors.

System according to any one of claims 6 to 11, characterized in that it comprises an on-off control module connected to the computer running the multimedia module which acts on the lighting.