WO2000030023A1 - Stereo-vision for gesture recognition - Google Patents

Stereo-vision for gesture recognition Download PDF

Info

Publication number
WO2000030023A1
WO2000030023A1 PCT/US1999/027372 US9927372W WO0030023A1 WO 2000030023 A1 WO2000030023 A1 WO 2000030023A1 US 9927372 W US9927372 W US 9927372W WO 0030023 A1 WO0030023 A1 WO 0030023A1
Authority
WO
WIPO (PCT)
Prior art keywords
subject
interest
volume
images
sensor
Prior art date
Application number
PCT/US1999/027372
Other languages
French (fr)
Other versions
WO2000030023A9 (en
Inventor
Allen Pu
Yong Qiao
Nelson Escobar
Michael Ichiriu
Original Assignee
Holoplex, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Holoplex, Inc. filed Critical Holoplex, Inc.
Priority to AU19161/00A priority Critical patent/AU1916100A/en
Publication of WO2000030023A1 publication Critical patent/WO2000030023A1/en
Publication of WO2000030023A9 publication Critical patent/WO2000030023A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm

Definitions

  • the present invention relates generally to gesture recognition, and more specifically to using stereo-vision for gesture recognition.
  • the subject's background should be removed.
  • One way to remove the background is to erect a wall behind the subject. After images of the subject are captured, the fixed background—the wall—is removed from the images before the gestures are identified.
  • the present invention identifies gestures of a subject without the need of a fixed background.
  • One embodiment is through stereo-vision with a sensor capturing the images of the subject. Based on the images, and through ignoring information outside a volume of interest, a computing engine analyzes the images to construct 3-D profiles of the subject, and then identifies the gestures of the subject through the profiles.
  • the volume of interest may be pre-defined, or may be defined by identifying a location related to the subject.
  • the sensor can capture the images through scanning, with the position of the sensor changed to capture each of the images. In analyzing the images, the positions of the sensor in capturing the images are taken into account.
  • the subject is illuminated by a source that generates a specific pattern. The images are then analyzed considering the amount of distortion in the pattern caused by the subject.
  • the images are captured simultaneously by more than one sensor, with the position of at least one sensor relative to one other sensor being known.
  • the subject has at least one foot, with the position of the at least one foot determined by a pressure-sensitive floor mat to help identify the subject's gesture.
  • the subject can be illuminated by infrared radiation, with the sensor being an infrared detector.
  • the sensor can include a filter that passes the radiation.
  • the volume of interest includes at least one region of interest, which, with the subject, includes a plurality of pixels.
  • the computing engine calculates the number of pixels of the subject overlapping the pixels of the at least one region of interest.
  • the position and size of the at least one region of interest depend on a dimension of the subject, or a location of the subject.
  • the present gesture of the subject depends on its prior gesture.
  • FIG. 1 shows one embodiment illustrating a set of steps to implement the present invention.
  • FIG. 2 illustrates one embodiment of an apparatus of the present invention capturing an image of a subject.
  • FIG. 3 shows different embodiments of the present invention in capturing the images of the subject.
  • FIGS. 4A-C show one embodiment of the present invention based on the distortion of the specific pattern of a source.
  • FIG. 5 illustrates one embodiment of the present invention based on infrared radiation.
  • FIG. 6 shows different embodiments of the present invention in analyzing the captured images.
  • FIG. 7 shows one embodiment of a pressure-sensitive mat for the present invention.
  • FIG. 8 shows another embodiment of an apparatus to implement the present invention.
  • FIGS. 1-8 Same numerals in FIGS. 1-8 are assigned to similar elements in all the figures. Embodiments of the invention are discussed below with reference to FIGS. 1-8. However, those skilled in the art will readily appreciate that the detailed description given herein with respect to these figures is for explanatory purposes as the invention extends beyond these limited embodiments. DETAILED DESCRIPTION
  • the present invention isolates a subject from a background without depending on erecting a known background behind the subject.
  • a three dimensional (3-D) profile of the subject is generated with the subject's gesture identified.
  • the embodiment ignores information not within a volume of interest, where the subject probably is moving inside.
  • FIG. 1 shows one approach 100 of using an apparatus 125 shown in FIG. 2 to identify the gestures of the subject 110.
  • At least one sensor 116 such as a video camera, captures (step 102) a number of images of the subject for a computing engine 118 to analyze (step 104) so as to identify gestures of the subject.
  • the computing engine 118 does not take into consideration information in the images outside a volume of interest 112. For example, information in the images too far to the sides or too high can be ignored, which means that certain information is removed as a function of distance away from the sensor. Based on the volume of interest 112, the subject is isolated from its background. The gesture of the subject can be identified without the need of a fixed background.
  • FIG. 3 shows different embodiments of the present invention in capturing the images of the subject.
  • One embodiment depends on using more than one sensor (step 150) to capture images simultaneously.
  • the position of at least one sensor relative to one other sensor is known.
  • the position includes the orientation of the sensors. For example, one position is pointing at a certain direction, and another position is pointing at another direction.
  • the computing engine, 118 using standard stereo-vision algorithms, analyzes the captured images to isolate and to generate a 3-D profile of the subject. This can be done, for example, by comparing the disparity between the more than one image captured simultaneously, and can be similar to the human visual system.
  • the stereo-vision algorithm can compute 3-D information, such as the depth, or a distance away from a sensor, or a location related to the subject. That location can be the center of the subject. Information in the images too far to the sides or too high from the location can be ignored, which means that certain information is removed as a function of distance away from the sensors. In this way, the depth information can help to set the volume of interest, with information outside the volume not considered in subsequent computation. Based on the volume of interest, the subject can be isolated from its background.
  • the sensor captures more than one image at more than one position (step 152).
  • the sensor is a radar or a lidar, which measures returns.
  • the radar can capture more than one image of the subject through scanning. This can be through rotating or moving the radar to capture an image of the subject at each position of the radar.
  • the computing engine 118 takes into consideration the position of sensor when it captures an image. Before the subject has substantially changed his gesture, the sensor would have changed its position and captured another image. Based on the images, the 3-D profile of the subject is constructed. The construction process should be obvious to those skilled in the art. In one embodiment, the process is similar to those used in the synthetic aperture radar fields.
  • the image captured to generate the profile of the subject depends on illuminating the subject by a source 114 that generates a specific pattern (step 154).
  • the light source can project lines or a grid of points.
  • the computing engine considers the distortion of the pattern by the subject.
  • FIGS. 4A-C show one embodiment of the present invention depending on the distortion of the specific pattern of a source.
  • FIG. 4 A shows a light pattern of parallel lines, with the same spacing between lines, generated by a light source. As the distance from the light source increases, the spacing also increases.
  • FIG. 4B shows a ball as an example of a 3-D object.
  • FIG. 4C shows an example of the sensor measurement of the light pattern projected onto the ball.
  • the distance of points on the ball from the sensor can be determined by the spacing of the projected lines around that point. A point in the vicinity of a smaller spacing is a point closer to the sensor.
  • the source 114 illuminates (step 160 in FIG. 5) the subject with infrared radiation
  • the sensor 116 is an infrared sensor.
  • the sensor may also include a filter that passes the radiation. For example, the 3dB bandwidth of the filter covers all of the frequencies of the source. With the infrared sensor, the effect of background noises, such as sunlight, is significantly diminished, increasing the signal-to-noise ratio.
  • FIG. 6 shows different embodiments of the present invention in analyzing the captured images.
  • the volume of interest 112 is predefined, 170.
  • the computing engine 118 independent of the images captured, the computing engine 118 always ignores information in the captured images outside the same volume of interest to construct a 3-D profile of the subject.
  • the computing engine 118 can determine the subject's gestures through a number of image-recognition techniques.
  • the subject's gestures can be determined by the distance between a certain part of the body and the sensors. For example, if the sensors are in front of the subject, a punch would be a gesture from the upper part of the body. That gesture is closer to the sensors than the position of the center of the body. Similarly, a kick would be a gesture from the lower part of the body that is closer to the sensors.
  • the volume of interest 112 includes more than one region of interest, 120.
  • Each region of interest occupies a specific 3-D volume of space.
  • the computing engine 118 determines the gesture of the subject based on the regions of interest occupied by the 3-D profile of the subject.
  • Each region of interest can be for designating a gesture.
  • one region can be located in front of the right-hand side of the subject's upper body.
  • a part of the 3-D profile of the subject occupying that region implies the gesture of a right punch by the subject.
  • One embodiment to determine whether a region of interest has been occupied by the subject is based on pixels.
  • the subject and the regions of interest can be represented by pixels distributed three dimensionally.
  • 118 determines whether a region is occupied by calculating the number of pixels of the subject overlapping the pixels of a region of interest. When a significant number of pixels of a region is overlapped, such as more than 20%, that region is occupied.
  • Overlapping can be calculated by counting or by dot products.
  • the gesture of the subject is identified through edge detection, 173.
  • edges of the 3-D profile of the subject are tracked. When an edge of the subject falls onto a region of interest, that region of interest is occupied. Edge detection techniques should be obvious to those skilled in the art, and will not be further described in this application.
  • One embodiment uses information on at least one dimension of the subject, such as its height or size, to determine the position and size of at least one region of interest. For example, a child's arms and legs are typically shorter than an adult's. The regions of interest for punches and kicks should be smaller and closer to a child's body than to an adult's body. By scaling the regions of interest and by setting the position of the regions of interest, based on, for example, the height of the subject, this embodiment is able to more accurately recognize the gestures of the subject.
  • This technique of modifying the regions of interest based on at least one dimension of the subject is not limited to three dimensional imaging.
  • the technique can be applied, for example, to identify the gestures of a subject in two dimensional images.
  • the idea is that after the 2-D profile of a subject is found from the captured images, the positions and sizes of two dimensional regions of interest can be modified based on, for example, the height of the profile.
  • Another embodiment sets the location of at least one region of interest based on tracking the position, such as the center, of the subject. This embodiment can more accurately identify the subject's gesture while the subject is moving. For example, when the computing engine has detected that the subject has moved to a forward position, the computing engine will move the region of interest for the kick gesture in the same direction.
  • This reduces the possibility of identifying incorrectly a kick gesture when the body of the subject, rather than a foot or a leg, falls into the region of interest for the kick gesture.
  • Identification of the movement of the subject can be through identifying the position of the center of the subject.
  • This technique of tracking the position of the subject to improve the accuracy in gesture recognition is also not limited to three dimensional imaging.
  • the technique can be applied, for example, to identify the gestures of a subject in two dimensional images. The idea again is that after the 2-D profile of a subject is found from the captured images, the positions of two dimensional regions of interest can be modified based on, for example, the center of the profile.
  • the computing engine takes into consideration a prior gesture of the subject to determine its present gesture.
  • a punch gesture may be detected when a certain part of the subject is determined to be located in the region of interest for a punch.
  • the computing engine may identify the gesture of a punch incorrectly. Such confusion may be alleviated if the computing engine also considers the temporal characteristics of gestures. For example, a gesture is identified as a punch only if the upper part of the subject extends into the region of interest for a punch. By tracking the prior position of body parts over a period of time, the computing engine enhances the accuracy of gesture recognition.
  • This technique of considering prior gestures to identify the current gesture of a subject again is not limited to 3-D imaging.
  • the technique is that after the 2-D profile of a subject is found from the captured images, the computing engine identifies the current gesture depending on the prior 2-D gesture of the subject.
  • FIG. 7 shows one embodiment of a pressure-sensitive floor mat 190 for the present invention.
  • the floor mat further enhances the accuracy of identifying the subject's gesture based on the foot placement.
  • the sensor 116 can identify the gestures.
  • there might be false identification For example, if there is only one sensing element, and the subject is standing directly in front of it, with one leg directly behind the other leg. Under this situation, it might be difficult for the computing engine to identify the gesture of the other leg stepping backwards.
  • the pressure-sensitive mat 190 embedded in the floor of the embodiment 125 solves this potential problem.
  • the pressure sensitive mat is divided into nine areas, with a center floor-region (Mat A) surrounded by eight peripheral floor-regions (Mat B) in four prime directions and the four diagonal directions.
  • the location of the foot does not have to be identified very precisely.
  • a floor-region is stepped on, a circuit is closed, providing an indication to the computing engine that a foot is in that region.
  • stepping on a specific floor-region can provide a signal to trigger a certain event.
  • the volume of interest 112 is not predefined.
  • the computing engine 118 analyzes the captured images to construct 3-D profiles of the subject and its environment.
  • the environment can include chairs and tables. Then, based on information regarding the characteristics of the profile of the subject, such as the subject should have an upright body with two arms and two legs, the computing engine identifies the subject from its environment. From the profile, the engine identifies a location related to the subject, such as the center of the subject. Based on the location, the computing engine defines the volume of interest, 172. Everything outside the volume is ignored in subsequent computation. For example, information regarding the chairs and tables will not be in subsequent computation.
  • FIG. 8 shows one embodiment 200 of an apparatus to implement the present invention. More than one infrared sensor 202 simultaneously capture images of the subject, illuminated by infrared sources 204.
  • the infrared sensors have pre-installed infrared filters.
  • a computing engine 208 analyzes them to identify the subject's gestures. Different types of movements by the subject can be recognized, including body movement, such as jumping, crouching, leaning forward and backward; arm movements such as punching, climbing, and hand motions; and foot movements such as kicking, moving toward, and backward. Then, the gestures of the subject can be reproduced as the gestures of a video game figure shown on the screen of a monitor 206.
  • the screen of the monitor 206 shown in FIG. 8 is 50 inches in diameter. Both sensors are of the same height from the ground and are four inches apart horizontally.
  • a pre-defined volume of interest is 4 feet wide, 7 feet long and 8 feet high, with the center of the volume being located at 3.5 feet away in front of the center of the sensors and 4 feet above the floor.
  • the present invention can be extended to identify the gestures of more than one subject.
  • the two subjects may play a game using an embodiment similar to the one shown in FIG. 8. As each subject moves, its gesture is recognized and reproduced as the gesture of a video game figure shown on the screen of the monitor 206.
  • the two video game figures can interact in the game, controlled by the gestures of the subjects. Techniques using, such as radar, lidar and cameras, have been described.

Abstract

A method and an apparatus to identify a gesture of a subject without the need of a fixed background. The apparatus includes a sensor and a computing engine. The sensor captures images of the subject. The computing engine analyzes the captured images to determine 3-D profiles of the subject, and the gestures of the subject. Information in the images not within a volume of interest is ignored in identifying the gesture of the subject.

Description

STEREO-VISION FOR GESTURE RECOGNITION
The present invention relates generally to gesture recognition, and more specifically to using stereo-vision for gesture recognition.
To identify the gestures of a subject, typically, the subject's background should be removed. One way to remove the background is to erect a wall behind the subject. After images of the subject are captured, the fixed background—the wall—is removed from the images before the gestures are identified.
It should be apparent from the foregoing that the wall increases the cost of the setup and the complexity to identify the gestures of the subject.
SUMMARY OF THE INVENTION
The present invention identifies gestures of a subject without the need of a fixed background. One embodiment is through stereo-vision with a sensor capturing the images of the subject. Based on the images, and through ignoring information outside a volume of interest, a computing engine analyzes the images to construct 3-D profiles of the subject, and then identifies the gestures of the subject through the profiles. The volume of interest may be pre-defined, or may be defined by identifying a location related to the subject.
Only one sensor may be required. The sensor can capture the images through scanning, with the position of the sensor changed to capture each of the images. In analyzing the images, the positions of the sensor in capturing the images are taken into account. In another approach, the subject is illuminated by a source that generates a specific pattern. The images are then analyzed considering the amount of distortion in the pattern caused by the subject. In one embodiment, the images are captured simultaneously by more than one sensor, with the position of at least one sensor relative to one other sensor being known.
In another embodiment, the subject has at least one foot, with the position of the at least one foot determined by a pressure-sensitive floor mat to help identify the subject's gesture.
The subject can be illuminated by infrared radiation, with the sensor being an infrared detector. The sensor can include a filter that passes the radiation.
In one embodiment, the volume of interest includes at least one region of interest, which, with the subject, includes a plurality of pixels. In analyzing the images to identify the gesture of the subject, the computing engine calculates the number of pixels of the subject overlapping the pixels of the at least one region of interest. In another embodiment, the position and size of the at least one region of interest depend on a dimension of the subject, or a location of the subject.
In yet another embodiment, the present gesture of the subject depends on its prior gesture.
Other aspects and advantages of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the accompanying drawings, illustrates by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows one embodiment illustrating a set of steps to implement the present invention.
FIG. 2 illustrates one embodiment of an apparatus of the present invention capturing an image of a subject.
FIG. 3 shows different embodiments of the present invention in capturing the images of the subject.
FIGS. 4A-C show one embodiment of the present invention based on the distortion of the specific pattern of a source. FIG. 5 illustrates one embodiment of the present invention based on infrared radiation.
FIG. 6 shows different embodiments of the present invention in analyzing the captured images.
FIG. 7 shows one embodiment of a pressure-sensitive mat for the present invention.
FIG. 8 shows another embodiment of an apparatus to implement the present invention.
Same numerals in FIGS. 1-8 are assigned to similar elements in all the figures. Embodiments of the invention are discussed below with reference to FIGS. 1-8. However, those skilled in the art will readily appreciate that the detailed description given herein with respect to these figures is for explanatory purposes as the invention extends beyond these limited embodiments. DETAILED DESCRIPTION
In one embodiment, the present invention isolates a subject from a background without depending on erecting a known background behind the subject.
A three dimensional (3-D) profile of the subject is generated with the subject's gesture identified. The embodiment ignores information not within a volume of interest, where the subject probably is moving inside.
FIG. 1 shows one approach 100 of using an apparatus 125 shown in FIG. 2 to identify the gestures of the subject 110. At least one sensor 116, such as a video camera, captures (step 102) a number of images of the subject for a computing engine 118 to analyze (step 104) so as to identify gestures of the subject.
In one embodiment, the computing engine 118 does not take into consideration information in the images outside a volume of interest 112. For example, information in the images too far to the sides or too high can be ignored, which means that certain information is removed as a function of distance away from the sensor. Based on the volume of interest 112, the subject is isolated from its background. The gesture of the subject can be identified without the need of a fixed background.
FIG. 3 shows different embodiments of the present invention in capturing the images of the subject. One embodiment depends on using more than one sensor (step 150) to capture images simultaneously. In this embodiment, the position of at least one sensor relative to one other sensor is known. The position includes the orientation of the sensors. For example, one position is pointing at a certain direction, and another position is pointing at another direction. Based on the images captured, the computing engine, 118, using standard stereo-vision algorithms, analyzes the captured images to isolate and to generate a 3-D profile of the subject. This can be done, for example, by comparing the disparity between the more than one image captured simultaneously, and can be similar to the human visual system. The stereo-vision algorithm can compute 3-D information, such as the depth, or a distance away from a sensor, or a location related to the subject. That location can be the center of the subject. Information in the images too far to the sides or too high from the location can be ignored, which means that certain information is removed as a function of distance away from the sensors. In this way, the depth information can help to set the volume of interest, with information outside the volume not considered in subsequent computation. Based on the volume of interest, the subject can be isolated from its background.
In another embodiment, only one sensor is necessary. In one approach, the sensor captures more than one image at more than one position (step 152). For example, the sensor is a radar or a lidar, which measures returns. The radar can capture more than one image of the subject through scanning. This can be through rotating or moving the radar to capture an image of the subject at each position of the radar. In this embodiment, to generate the 3-D profile of the subject, the computing engine 118 takes into consideration the position of sensor when it captures an image. Before the subject has substantially changed his gesture, the sensor would have changed its position and captured another image. Based on the images, the 3-D profile of the subject is constructed. The construction process should be obvious to those skilled in the art. In one embodiment, the process is similar to those used in the synthetic aperture radar fields.
In another embodiment, the image captured to generate the profile of the subject depends on illuminating the subject by a source 114 that generates a specific pattern (step 154). For example, the light source can project lines or a grid of points. In analyzing the images, the computing engine considers the distortion of the pattern by the subject. FIGS. 4A-C show one embodiment of the present invention depending on the distortion of the specific pattern of a source. FIG. 4 A shows a light pattern of parallel lines, with the same spacing between lines, generated by a light source. As the distance from the light source increases, the spacing also increases. FIG. 4B shows a ball as an example of a 3-D object. FIG. 4C shows an example of the sensor measurement of the light pattern projected onto the ball. The distance of points on the ball from the sensor can be determined by the spacing of the projected lines around that point. A point in the vicinity of a smaller spacing is a point closer to the sensor. In another embodiment, to enhance the ability in isolating the subject from the unknown background, the source 114 illuminates (step 160 in FIG. 5) the subject with infrared radiation, and the sensor 116 is an infrared sensor. The sensor may also include a filter that passes the radiation. For example, the 3dB bandwidth of the filter covers all of the frequencies of the source. With the infrared sensor, the effect of background noises, such as sunlight, is significantly diminished, increasing the signal-to-noise ratio.
FIG. 6 shows different embodiments of the present invention in analyzing the captured images. In one embodiment, the volume of interest 112 is predefined, 170. In other words, independent of the images captured, the computing engine 118 always ignores information in the captured images outside the same volume of interest to construct a 3-D profile of the subject.
After constructing the profiles of the subject, the computing engine 118 can determine the subject's gestures through a number of image-recognition techniques. In one embodiment, the subject's gestures can be determined by the distance between a certain part of the body and the sensors. For example, if the sensors are in front of the subject, a punch would be a gesture from the upper part of the body. That gesture is closer to the sensors than the position of the center of the body. Similarly, a kick would be a gesture from the lower part of the body that is closer to the sensors.
In one embodiment, the volume of interest 112 includes more than one region of interest, 120. Each region of interest occupies a specific 3-D volume of space. In one approach, the computing engine 118 determines the gesture of the subject based on the regions of interest occupied by the 3-D profile of the subject.
Each region of interest can be for designating a gesture. For example, one region can be located in front of the right-hand side of the subject's upper body. A part of the 3-D profile of the subject occupying that region implies the gesture of a right punch by the subject.
One embodiment to determine whether a region of interest has been occupied by the subject is based on pixels. The subject and the regions of interest can be represented by pixels distributed three dimensionally. The computing engine
118 determines whether a region is occupied by calculating the number of pixels of the subject overlapping the pixels of a region of interest. When a significant number of pixels of a region is overlapped, such as more than 20%, that region is occupied.
Overlapping can be calculated by counting or by dot products. In another embodiment, the gesture of the subject is identified through edge detection, 173.
The edges of the 3-D profile of the subject are tracked. When an edge of the subject falls onto a region of interest, that region of interest is occupied. Edge detection techniques should be obvious to those skilled in the art, and will not be further described in this application.
One embodiment uses information on at least one dimension of the subject, such as its height or size, to determine the position and size of at least one region of interest. For example, a child's arms and legs are typically shorter than an adult's. The regions of interest for punches and kicks should be smaller and closer to a child's body than to an adult's body. By scaling the regions of interest and by setting the position of the regions of interest, based on, for example, the height of the subject, this embodiment is able to more accurately recognize the gestures of the subject.
This technique of modifying the regions of interest based on at least one dimension of the subject is not limited to three dimensional imaging. The technique can be applied, for example, to identify the gestures of a subject in two dimensional images. The idea is that after the 2-D profile of a subject is found from the captured images, the positions and sizes of two dimensional regions of interest can be modified based on, for example, the height of the profile. Another embodiment sets the location of at least one region of interest based on tracking the position, such as the center, of the subject. This embodiment can more accurately identify the subject's gesture while the subject is moving. For example, when the computing engine has detected that the subject has moved to a forward position, the computing engine will move the region of interest for the kick gesture in the same direction. This, for example, reduces the possibility of identifying incorrectly a kick gesture when the body of the subject, rather than a foot or a leg, falls into the region of interest for the kick gesture. Identification of the movement of the subject can be through identifying the position of the center of the subject. This technique of tracking the position of the subject to improve the accuracy in gesture recognition is also not limited to three dimensional imaging. The technique can be applied, for example, to identify the gestures of a subject in two dimensional images. The idea again is that after the 2-D profile of a subject is found from the captured images, the positions of two dimensional regions of interest can be modified based on, for example, the center of the profile.
In yet another embodiment, the computing engine takes into consideration a prior gesture of the subject to determine its present gesture. Remembering the temporal characteristics of the gestures can improve the accuracy of gesture recognition. For example, a punch gesture may be detected when a certain part of the subject is determined to be located in the region of interest for a punch. However, if the subject kicks really high, the subject's leg might get into the region of interest for a punch. The computing engine may identify the gesture of a punch incorrectly. Such confusion may be alleviated if the computing engine also considers the temporal characteristics of gestures. For example, a gesture is identified as a punch only if the upper part of the subject extends into the region of interest for a punch. By tracking the prior position of body parts over a period of time, the computing engine enhances the accuracy of gesture recognition.
This technique of considering prior gestures to identify the current gesture of a subject again is not limited to 3-D imaging. For example, the technique is that after the 2-D profile of a subject is found from the captured images, the computing engine identifies the current gesture depending on the prior 2-D gesture of the subject.
FIG. 7 shows one embodiment of a pressure-sensitive floor mat 190 for the present invention. The floor mat further enhances the accuracy of identifying the subject's gesture based on the foot placement. In the above embodiments, the sensor 116 can identify the gestures. However, sometimes in situation, from the perspective of the sensor, when a certain part of the subject occludes another part of the subject, there might be false identification. For example, if there is only one sensing element, and the subject is standing directly in front of it, with one leg directly behind the other leg. Under this situation, it might be difficult for the computing engine to identify the gesture of the other leg stepping backwards. The pressure-sensitive mat 190 embedded in the floor of the embodiment 125 solves this potential problem.
In FIG. 7, the pressure sensitive mat is divided into nine areas, with a center floor-region (Mat A) surrounded by eight peripheral floor-regions (Mat B) in four prime directions and the four diagonal directions. In this embodiment, the location of the foot does not have to be identified very precisely. When a floor-region is stepped on, a circuit is closed, providing an indication to the computing engine that a foot is in that region. In one embodiment, stepping on a specific floor-region can provide a signal to trigger a certain event.
In one embodiment, the volume of interest 112 is not predefined. The computing engine 118 analyzes the captured images to construct 3-D profiles of the subject and its environment. For example, the environment can include chairs and tables. Then, based on information regarding the characteristics of the profile of the subject, such as the subject should have an upright body with two arms and two legs, the computing engine identifies the subject from its environment. From the profile, the engine identifies a location related to the subject, such as the center of the subject. Based on the location, the computing engine defines the volume of interest, 172. Everything outside the volume is ignored in subsequent computation. For example, information regarding the chairs and tables will not be in subsequent computation.
FIG. 8 shows one embodiment 200 of an apparatus to implement the present invention. More than one infrared sensor 202 simultaneously capture images of the subject, illuminated by infrared sources 204. The infrared sensors have pre-installed infrared filters. After images have been captured, a computing engine 208 analyzes them to identify the subject's gestures. Different types of movements by the subject can be recognized, including body movement, such as jumping, crouching, leaning forward and backward; arm movements such as punching, climbing, and hand motions; and foot movements such as kicking, moving toward, and backward. Then, the gestures of the subject can be reproduced as the gestures of a video game figure shown on the screen of a monitor 206. In one embodiment, the screen of the monitor 206 shown in FIG. 8 is 50 inches in diameter. Both sensors are of the same height from the ground and are four inches apart horizontally. In this embodiment, a pre-defined volume of interest is 4 feet wide, 7 feet long and 8 feet high, with the center of the volume being located at 3.5 feet away in front of the center of the sensors and 4 feet above the floor.
The present invention can be extended to identify the gestures of more than one subject. In one embodiment, there are two subjects, and they are spaced apart. Each has its own volume of interest, and the two volumes of interest do not intersect. The two subjects may play a game using an embodiment similar to the one shown in FIG. 8. As each subject moves, its gesture is recognized and reproduced as the gesture of a video game figure shown on the screen of the monitor 206. The two video game figures can interact in the game, controlled by the gestures of the subjects. Techniques using, such as radar, lidar and cameras, have been described.
Other techniques may be used to measure, such as depth information, which in turn can determine volume of interest. Such techniques include using an array of ultrasonic distance measurement devices, and an array of infrared LEDs or laser diodes and detectors. Other embodiments of the invention will be apparent to those skilled in the art from a consideration of this specification or practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with the true scope and spirit of the invention being indicated by the following claims.

Claims

CLAIMS We claim:
1. A method for obtaining information regarding a subject (110) without the need of a fixed background, the method comprising the steps of: capturing (102) images of the subject; and analyzing (104) the captured images without considering information in the images outside a volume of interest (112) for obtaining information regarding the subject.
2. A method as recited in claim 1 wherein: the method is for identifying at least one gesture of the subject (110); the step of analyzing is for identifying at least one gesture of the subject (110); and the images are captured simultaneously by more than one sensor (116), with the position of at least one sensor relative to one other sensor being known.
3. A method as claimed in any preceding claim wherein the volume of interest (112) is pre-defined.
4. A method as claimed in any preceding claims wherein: at least a part of the volume of interest (112) and at least a part of the subject (110) are represented by a plurality of pixels; and the step of analyzing includes the step of calculating the number of pixels of the subject (110) overlapping the pixels of the volume of interest (112) to obtain information regarding the subject (110).
5. A method as recited in claims 1, 2 or 3, wherein the step of analyzing includes the steps of: identifying the profile of the subject (110) based on the images; and determining whether an edge of the profile of the subject ( 110) is within the volume of interest (112).
6. A method as recited in any preceding claims wherein the position of at least a part of the volume of interest (112) depends on at least one dimension of the subject (110).
7. A method as recited in any preceding claims wherein the size of at least a part of the volume of interest (112) is scaled based on at least one dimension of the subject (110).
8. A method as recited in any preceding claims wherein at least one position of the volume of interest (112) depends on one position of the subject (110).
9. An apparatus (125) for obtaining information regarding a subject (HO) without the need of a fixed background, the apparatus (125) comprising: a sensor (116) configured to capture images of the subject; and a computing engine (118) configured to analyze the captured images without considering information in the images outside a volume of interest for obtaining information regarding the subject.
10. An apparatus (125) as recited in claim 9, wherein the apparatus is configured for identifying at least one gesture of the subject (HO); the computing engine is configured to analyze the captured image for identifying at least one gesture of the subject (110); and the apparatus further comprises at least one additional sensor to simultaneously capture images of the subject, with the position of at least one sensor relative to one other sensor being known.
11. An apparatus (125) as recited in claims 9 or 10 wherein the volume of interest is pre-defined.
12. An apparatus (125) as recited in claims 9, 10 or 11 wherein: at least a part of the volume of interest (112) and at least a part of the subject (110) are represented by a plurality of pixels; and the computing engine (125) is configured to calculate the number of pixels of the subject overlapping the pixels of the volume of interest (112) to obtain information regarding the subj ect ( 110) .
13. An apparatus (125) as recited in claims 9, 10 or 11 wherein the computing engine (125) is configured to identify the profile of the subject (110) based on the images; and determine whether an edge of the profile of the subject (110) is within the volume of interest (112).
14. An apparatus (125) as recited in claims 9, 10, 11, 12 or 13 wherein the position of at least a part of the volume of interest (112) depends on at least one dimension of the subj ect ( 110) .
15. An apparatus (125) as recited in claims 9, 10, 11, 12, 13 or 14 wherein the size of at least a part of the volume of interest (112) is scaled based on at least one dimension of the subject (110).
16. An apparatus (125) as recited in claims 9, 10, 11, 12, 13, 14 or 15 wherein at least one position of the volume of interest (112) depends on one position of the subject (110).
PCT/US1999/027372 1998-11-17 1999-11-17 Stereo-vision for gesture recognition WO2000030023A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU19161/00A AU1916100A (en) 1998-11-17 1999-11-17 Stereo-vision for gesture recognition

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US19536198A 1998-11-17 1998-11-17
US09/195,361 1998-11-17
US26492199A 1999-03-09 1999-03-09
US09/264,921 1999-03-09

Publications (2)

Publication Number Publication Date
WO2000030023A1 true WO2000030023A1 (en) 2000-05-25
WO2000030023A9 WO2000030023A9 (en) 2002-08-29

Family

ID=26890921

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/027372 WO2000030023A1 (en) 1998-11-17 1999-11-17 Stereo-vision for gesture recognition

Country Status (2)

Country Link
AU (1) AU1916100A (en)
WO (1) WO2000030023A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002007839A2 (en) * 2000-07-24 2002-01-31 Jestertek, Inc. Video-based image control system
WO2003071410A2 (en) * 2002-02-15 2003-08-28 Canesta, Inc. Gesture recognition system using depth perceptive sensors
EP1833002A1 (en) * 2006-03-08 2007-09-12 ID-Development AG Method and apparatus for obtaining biometric image samples or biometric data
US7308112B2 (en) 2004-05-14 2007-12-11 Honda Motor Co., Ltd. Sign based human-machine interaction
US7372977B2 (en) 2003-05-29 2008-05-13 Honda Motor Co., Ltd. Visual tracking using depth data
WO2008128568A1 (en) 2007-04-20 2008-10-30 Softkinetic S.A. Volume recognition method and system
US7620202B2 (en) 2003-06-12 2009-11-17 Honda Motor Co., Ltd. Target orientation estimation using depth sensing
EP2249230A1 (en) 2009-05-04 2010-11-10 Topseed Technology Corp. Non-contact touchpad apparatus and method for operating the same
WO2011070313A1 (en) * 2009-12-08 2011-06-16 Qinetiq Limited Range based sensing
WO2011085815A1 (en) * 2010-01-14 2011-07-21 Brainlab Ag Controlling a surgical navigation system
US8005263B2 (en) 2007-10-26 2011-08-23 Honda Motor Co., Ltd. Hand sign recognition using label assignment
WO2012156159A1 (en) * 2011-05-16 2012-11-22 Siemens Aktiengesellschaft Evaluation method for a sequence of chronologically sequential depth images
WO2013153264A1 (en) * 2012-04-13 2013-10-17 Nokia Corporation Free hand gesture control of automotive user interface
EP2979155A4 (en) * 2013-07-10 2017-06-14 Hewlett-Packard Development Company, L.P. Sensor and tag to determine a relative position
TWI663377B (en) * 2015-05-15 2019-06-21 高準精密工業股份有限公司 Optical device and light emitting device thereof
WO2021236100A1 (en) * 2020-05-22 2021-11-25 Hewlett-Packard Development Company, L.P. Gesture areas

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DARRELL T ET AL: "A virtual mirror interface using real-time robust face tracking", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION,XX,XX, 14 April 1998 (1998-04-14), pages 616 - 621, XP002083775 *
HUBER E: "3-D real-time gesture recognition using proximity spaces", PROCEEDING. THIRD IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION. WACV'96 (CAT. NO.96TB100084), PROCEEDINGS THIRD IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION. WACV'96, SARASOTA, FL, USA, 2-4 DEC. 1996, 1996, Los Alamitos, CA, USA, IEEE Comput. Soc. Press, USA, pages 136 - 141, XP002135707, ISBN: 0-8186-7620-5 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8963963B2 (en) 2000-07-24 2015-02-24 Qualcomm Incorporated Video-based image control system
WO2002007839A3 (en) * 2000-07-24 2003-07-10 Jestertek Inc Video-based image control system
US7898522B2 (en) 2000-07-24 2011-03-01 Gesturetek, Inc. Video-based image control system
WO2002007839A2 (en) * 2000-07-24 2002-01-31 Jestertek, Inc. Video-based image control system
US7227526B2 (en) 2000-07-24 2007-06-05 Gesturetek, Inc. Video-based image control system
US8624932B2 (en) 2000-07-24 2014-01-07 Qualcomm Incorporated Video-based image control system
US8274535B2 (en) 2000-07-24 2012-09-25 Qualcomm Incorporated Video-based image control system
US20080018595A1 (en) * 2000-07-24 2008-01-24 Gesturetek, Inc. Video-based image control system
EP1967941A3 (en) * 2000-07-24 2008-11-19 GestureTek, Inc. Video-based image control system
WO2003071410A3 (en) * 2002-02-15 2004-03-18 Canesta Inc Gesture recognition system using depth perceptive sensors
WO2003071410A2 (en) * 2002-02-15 2003-08-28 Canesta, Inc. Gesture recognition system using depth perceptive sensors
US7372977B2 (en) 2003-05-29 2008-05-13 Honda Motor Co., Ltd. Visual tracking using depth data
US7590262B2 (en) 2003-05-29 2009-09-15 Honda Motor Co., Ltd. Visual tracking using depth data
US7620202B2 (en) 2003-06-12 2009-11-17 Honda Motor Co., Ltd. Target orientation estimation using depth sensing
US7308112B2 (en) 2004-05-14 2007-12-11 Honda Motor Co., Ltd. Sign based human-machine interaction
EP1833002A1 (en) * 2006-03-08 2007-09-12 ID-Development AG Method and apparatus for obtaining biometric image samples or biometric data
WO2008128568A1 (en) 2007-04-20 2008-10-30 Softkinetic S.A. Volume recognition method and system
US8005263B2 (en) 2007-10-26 2011-08-23 Honda Motor Co., Ltd. Hand sign recognition using label assignment
EP2249230A1 (en) 2009-05-04 2010-11-10 Topseed Technology Corp. Non-contact touchpad apparatus and method for operating the same
WO2011070313A1 (en) * 2009-12-08 2011-06-16 Qinetiq Limited Range based sensing
CN102640087A (en) * 2009-12-08 2012-08-15 秦内蒂克有限公司 Range based sensing
WO2011085815A1 (en) * 2010-01-14 2011-07-21 Brainlab Ag Controlling a surgical navigation system
EP2642371A1 (en) * 2010-01-14 2013-09-25 BrainLAB AG Controlling a surgical navigation system
US9542001B2 (en) 2010-01-14 2017-01-10 Brainlab Ag Controlling a surgical navigation system
US10064693B2 (en) 2010-01-14 2018-09-04 Brainlab Ag Controlling a surgical navigation system
WO2012156159A1 (en) * 2011-05-16 2012-11-22 Siemens Aktiengesellschaft Evaluation method for a sequence of chronologically sequential depth images
WO2013153264A1 (en) * 2012-04-13 2013-10-17 Nokia Corporation Free hand gesture control of automotive user interface
CN104364735A (en) * 2012-04-13 2015-02-18 诺基亚公司 Free hand gesture control of automotive user interface
US9239624B2 (en) 2012-04-13 2016-01-19 Nokia Technologies Oy Free hand gesture control of automotive user interface
EP2979155A4 (en) * 2013-07-10 2017-06-14 Hewlett-Packard Development Company, L.P. Sensor and tag to determine a relative position
US9990042B2 (en) 2013-07-10 2018-06-05 Hewlett-Packard Development Company, L.P. Sensor and tag to determine a relative position
TWI663377B (en) * 2015-05-15 2019-06-21 高準精密工業股份有限公司 Optical device and light emitting device thereof
WO2021236100A1 (en) * 2020-05-22 2021-11-25 Hewlett-Packard Development Company, L.P. Gesture areas

Also Published As

Publication number Publication date
WO2000030023A9 (en) 2002-08-29
AU1916100A (en) 2000-06-05

Similar Documents

Publication Publication Date Title
US9632505B2 (en) Methods and systems for obstacle detection using structured light
WO2000030023A1 (en) Stereo-vision for gesture recognition
CN104902246B (en) Video monitoring method and device
US10212324B2 (en) Position detection device, position detection method, and storage medium
Rougier et al. Monocular 3D head tracking to detect falls of elderly people
US7899211B2 (en) Object detecting system and object detecting method
US8373751B2 (en) Apparatus and method for measuring location and distance of object by using camera
KR101118654B1 (en) rehabilitation device using motion analysis based on motion capture and method thereof
CN108156450B (en) Method for calibrating a camera, calibration device, calibration system and machine-readable storage medium
CN104966062B (en) Video monitoring method and device
JP2010204805A (en) Periphery-monitoring device and method
Snidaro et al. Automatic camera selection and fusion for outdoor surveillance under changing weather conditions
JP2001056853A (en) Behavior detecting device and kind discriminating device, behavior detecting method, and recording medium where behavior detecting program is recorded
JP2006236184A (en) Human body detection method by image processing
JP2003196656A (en) Distance image processing device
CN109886064A (en) Determination can driving space boundary method
KR20120026956A (en) Method and apparatus for motion recognition
JP2011209794A (en) Object recognition system, monitoring system using the same, and watching system
KR101961266B1 (en) Gaze Tracking Apparatus and Method
JP2004042777A (en) Obstacle detector
Wang et al. Gait analysis and validation using voxel data
JP4628910B2 (en) Length measuring device and height measuring device
Hadi et al. Fusion of thermal and depth images for occlusion handling for human detection from mobile robot
JP5785515B2 (en) Pedestrian detection device and method, and vehicle collision determination device
CN107274396B (en) Device for counting number of people

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
AK Designated states

Kind code of ref document: C2

Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

COP Corrected version of pamphlet

Free format text: PAGES 1/8-8/8, DRAWINGS, REPLACED BY NEW PAGES 1/8-8/8; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE