WO1998059312A1 - Procedes et dispositifs pour la reconnaissance des gestes - Google Patents
Procedes et dispositifs pour la reconnaissance des gestes Download PDFInfo
- Publication number
- WO1998059312A1 WO1998059312A1 PCT/US1998/012169 US9812169W WO9859312A1 WO 1998059312 A1 WO1998059312 A1 WO 1998059312A1 US 9812169 W US9812169 W US 9812169W WO 9859312 A1 WO9859312 A1 WO 9859312A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- player
- gesture
- template
- game
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/20—Input arrangements for video game devices
- A63F13/21—Input arrangements for video game devices characterised by their sensors, purposes or types
- A63F13/213—Input arrangements for video game devices characterised by their sensors, purposes or types comprising photodetecting means, e.g. cameras, photodiodes or infrared cells
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/40—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment
- A63F13/42—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle
- A63F13/428—Processing input control signals of video game devices, e.g. signals generated by the player or derived from the environment by mapping the input signals into game commands, e.g. mapping the displacement of a stylus on a touch screen to the steering angle of a virtual vehicle involving motion or position input signals, e.g. signals representing the rotation of an input controller or a player's arm motions sensed by accelerometers or gyroscopes
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/50—Controlling the output signals based on the game progress
- A63F13/54—Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/10—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
- A63F2300/1087—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/10—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals
- A63F2300/1087—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera
- A63F2300/1093—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game characterized by input arrangements for converting player-generated signals into game device control signals comprising photodetecting means, e.g. a camera using visible light
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6063—Methods for processing data by generating or executing the game program for sound processing
- A63F2300/6081—Methods for processing data by generating or executing the game program for sound processing generating an output signal, e.g. under timing constraints, for spatialization
Definitions
- the present invention relates generally to gesture recognition, and more particularly to methods and apparatus for gesture recognition to be used in electronic games and other human-machine interface applications.
- a player guides an animated character or a vehicle with an input device, such as a low-key-count keyboard, a joystick or an electronic pointing device, like a mouse.
- an input device such as a low-key-count keyboard, a joystick or an electronic pointing device, like a mouse.
- the character might have to avoid traps, to grab an object, or to fight an opponent.
- the player uses a joystick to control the direction of movement of the character, and buttons to activate preset actions, such as jumping and hitting. Though one can learn to manipulate such input devices, it is not natural to act through a joystick or a keyboard.
- An alternative method to control animated characters in electronic games is through gesture recognition.
- the player's gesture controls the character's motion.
- the player's gesture is captured by an optical or an infrared detector array, such as a video camera.
- the processor analyzes the data from the array to identify the gesture, which is then used to control the character's motion.
- a detector array captures the image of the kick and relates it to a processor, which analyzes the image and makes the character kick.
- This is a more intuitive way to play electronic games. It is much more direct and natural— when the player jumps, the character jumps accordingly.
- prior art electronic game system implementing such image recognition techniques is relatively slow.
- One such system uses "dynamic" motion detection to recognize the player's images.
- a detector captures consecutive images, and a processor analyzes their differences. For example, as the player punches, the processor constantly compares consecutive images to find the path direction of the punch.
- the processor typically calculates the derivatives of the values of the pixels in the images.
- the processor extrapolates the direction of motion to identify future positions. Since the processor usually analyzes the images in greyscale, the "dynamic" motion detection technique requires significant amount of floating point multiplication. With significant number of images captured and intensive computation, such methods can accurately recognize gestures. However, this type of system can be slow.
- the present invention is on apparatus and methods that can efficiently and accurately recognize the gesture of the image of a player.
- the invented techniques are much faster than many prior art techniques.
- the invention includes many pre-defined gestures. Based on invented mapping techniques, one of the gestures is identified as the gesture of the image of the player.
- the invention includes a pre-processor, a template-matcher and a post-processor.
- a detector captures the image of a background. Then the player stands in front of the background and starts playing. The detector continues to capture the images of the player in front of the background. Note that the background image does not include the image of the player, but the image of the player in front of the background contains at least a portion of the background image. In this invention, the images of the player in front of the background are known as current images.
- the pre-processor removes at least a portion of the background image from a current image to generate a player's image. Then, the template-matcher directly maps the player's image to a number of templates to generate a number of template outputs. Based on the these outputs, the post-processor identifies one pre-defined gesture from the set of pre-defined gestures. That pre-defined gesture corresponds to the gesture in the player's image, and is used in the game.
- the background-removal process uses a threshold value and a upper limit value.
- the process to extract the player's image from the background image posts a number of challenges. They include the player casting shadows, and the colors in the player's clothes substantially matching to the background colors. In these situations, to create a more accurate player's image, the pre-processor uses a thresholding technique.
- the thresholding technique operates on each pixel.
- the pre-processor generates the difference between the value of each pixel in the background image and the value of the corresponding pixel in the current image. Then the pre- processor compares the magnitude of each difference value with the threshold value. Based on the comparison, the pre-processor generates an energy level, which reflects, for example, the size of the player's image. If the energy level is higher than the upper limit value, the pre-processor will change the threshold value and perform the comparison again. A higher energy level can imply that the size of player's image is too big. If the energy level is not higher than the upper limit value, the pre-processor sets the player's image based on the background image, the current image and the threshold value
- the thresholding process includes a lower limit value also If the energy level is lower than the lower limit value, the pre- processor will again change the threshold value and perform the comparison again A lower energy level can imply that the size of the player is too small In this case, the pre-processor sets the player's image only if the energy level falls between the upper limit value and the lower limit level
- the template-matcher maps the player's image with a number of templates
- each template has a bar of pixels whose values are one, and the templates are categorized into a number of sets The bars in the templates within each set are substantially parallel, while the orientations of the bars in different sets are different Also, combining the bars within each set substantially covers the player's image
- each template characterizes at least past of a pre-defined gesture, and every gesture can be characterized by one or more templates
- the template-matcher performs the mapping process
- each template is represented by a hologram
- the template-matcher optically maps the player's image directly with the holograms to generate a number of template outputs This can be done in parallel For example, with one hundred templates, all of the one hundred template outputs can be generated simultaneously
- template-matching to generate the template outputs is done by digital electronics There can be many template outputs One way to reduce the number can be based on a position on the player In one
- the height information is generated from a set of templates with horizontal bars, while the center information is generated from a set of templates with vertical bars.
- the post-processor can delete one or more template outputs from all of the outputs.
- a number of templates are not mapped with the player's image. Those template outputs will not be formed.
- the system can regularly identify the center and the height of the player's image. This helps the present invention to analyze players of different sizes, and players shifting around in front of the background.
- the post-processor analyzes the template outputs.
- the pre-defined gestures are separated into pre-defined general gestures, and pre-defined specific gestures.
- Each general gesture includes at least one specific gesture.
- the general gestures can identify the player's body positions, while the specific gestures can identify the player's limb positions and their orientation at certain body positions. For example, one general gesture is standing. Under this general standing gesture, there can be many specific gestures, such as punching and kicking while standing.
- the post-processor analyzes some of the template outputs to first identify at least one general gesture. Then, the post-processor analyses some of the template outputs to identify one specific gesture that corresponds to the gesture in the image.
- the post-processor identifies the at least one general gesture by a first neural network. Then, based on at least the outputs of the first neural network, the post-processor identifies the specific pre-defined gesture by a second neural network. In another embodiment, the post-processor identifies the gesture based on a set of rules.
- the present invention repeats the above process to recognize the gestures of the images of the player as the player moves.
- the present invention is much more efficient than prior art techniques.
- the present invention does not depend on calculating derivatives.
- the template-matcher maps the player's image directly to all the holograms to generate all the templates outputs simultaneously. Such mapping techniques save a lot of computation time.
- Another example to save computation time is through the neural network, which is an efficient way to generate the pre-defined gestures from the template outputs.
- an interactive game is played by more than one player, with each player controlling a character in the game.
- the method includes the steps of identifying an action being exerted on a first character by another character in the game, generating a force corresponding to the action exerted, and applying the force to the first player who controls the first character in the game.
- the method includes the step of providing non-uniform background light for the player to create an image of the player.
- FIG. 1 shows an electronic game station incorporating one embodiment of the present invention.
- FIG. 2 shows one embodiment of the present invention.
- FIG. 3 shows one set of steps to implement the present invention.
- FIG. 4 shows one embodiment to remove background images in the present invention.
- FIG. 5 shows one set of steps on thresholding in the present invention.
- FIG. 6 shows one set of steps on setting the threshold value in the present invention.
- FIGS. 7A-H show examples of different representations of one embodiment of templates used in the present invention.
- FIGS. 8A-C show examples of different representations of another embodiment of templates used in the present invention.
- FIG. 9 depicts an optical apparatus to map the player's image to templates for the present invention.
- FIG. 10 shows an electronic approach to map the player's image to templates of the present invention.
- FIGS. 11 A-G depict examples of template outputs in the present invention.
- FIG. 12 shows one set of steps analyzing template outputs performed by the post-processor in the present invention.
- FIG. 13 depicts one embodiment of a post-processor in the present invention.
- FIG. 14 shows one set of steps performed by the tracker in the present invention.
- FIG. 15 shows one set of steps using the change in a specific location on the player's image to help identify gestures.
- FIG. 16 shows one set of steps to determine when to update tracking information in the present invention.
- FIG. 17 illustrates another embodiment of the present invention for more than one player.
- FIG. 18 illustrates one set of steps to implement an embodiment of the present invention that generates a force due to an action of a character in a game.
- FIG. 19 shows one embodiment that implements the set of steps shown in FIG. 18.
- FIG. 20 shows one embodiment of the life bar of a character in a game of the present invention.
- FIG. 21 shows one embodiment of the present invention where a player is illuminated by non-uniform light in the present invention.
- FIG. 22 shows one embodiment of the present invention where more than one players are illuminated by non-uniform light while playing a game in the present invention.
- a player 102 standing in front of a background 104, is playing a game. His image is captured by a detector 106, which can be a camera, such as a charge-coupled-device camera, and is analyzed by the apparatus 100. Based on the analysis, the apparatus 100 identifies the player's gesture to be one of a number of pre-defined gestures, and incorporates that gesture into the game. The incorporation can be translating the gesture into the action of a character in the game shown on a display 108. For example, when he jumps, the character in the game jumps accordingly.
- the set of pre-defined gestures is separated into predefined general gestures and pre-defined specific gestures.
- pre-defined general gestures include jumping, standing and crouching positions; and examples of pre-defined specific gestures include one or more general gestures plus one or more of the following: punching upward, punching forward, punching downward, kicking forward, kicking backward, stepping forward and stepping backward, etc.
- the specific gestures are more detailed than the general gestures. For example, a general gesture is standing, while a specific gesture is standing and kicking forward.
- FIG. 2 shows one embodiment of the apparatus 100, which includes a pre-processor 152, a template matcher 154 and a post-processor 156.
- FIG. 3 shows one set of steps 175 illustrating in general a method to implement the present invention.
- the pre-processor 152 retrieves (step 177) the image of the background 104, and retrieves (step 179) the current image.
- the images were captured by the detector 106.
- the current image shows the player's image merged in the background image. However, the background image only includes the image of the background, without the image of the player.
- the pre-processor 152 removes (step 181) at least a portion of the background from the current image to generate the player's image.
- the template matcher 154 maps (step 183) the player's image to a number of templates to generate template outputs.
- the post-processor 156 identifies one or more of the predefined gestures that correspond to the gesture in the player's image.
- the identification process is based on a set of rules.
- the rules are typically stored in a database or in a look-up table residing in a storage medium, which the post-processor 156 can access.
- the present invention can repeatedly identify the player's gestures. In one embodiment, after identifying one gesture (step 185) and using the gesture in the game, the present invention can repeat from the step of retrieving a current image (step 179) to identify another gesture.
- FIG. 4 shows one embodiment of a background remover 200 in the pre- processor 152.
- the images whether they are the background or the current images, have many pixels, and are in gray-scale, with many possible levels of intensity. Also, the images can be in color or in monochrome, or can be frames from a video signal.
- the background remover 200 modifies the values of the pixels in the current image to generate the player's image.
- the background remover 200 acts on frames from a video signal. First, the detector captures the background image and generates the video signal, and the remover 200 retrieves it. The first frame grabber 204 grabs the background image and stores it. Then, the detector captures the current image and generates its video signal, and again the background remover 200 retrieves it. A sync separator 202 extracts the sync signal from the video signal, and provides it to two frame grabbers 204 and 206. The frame grabber 204 generates the video signal of the stored background image using the sync signal. The differential amplifier 208 receives both the current image and the background image, and subtracts one from another. The second frame grabber
- FIG. 5 shows one set 225 of steps on thresholding by the thresholder
- the intensity value of every pixel digitized by the second frame grabber 206 can be positive, negative or zero.
- the thresholder 210 finds the magnitude (step 227) of the value at that pixel.
- the thresholder 210 compares (step 229) the magnitude or the absolute value with a threshold value. If the absolute value is smaller, the value of that pixel becomes zero; otherwise it becomes one. This "binarizes" the player's image, or changes the values in each pixel of the player's image to one of two values.
- the process to remove at least a portion of the background so as to identify the profile of an image can take a number of factors into consideration.
- the player 102 casts a shadow on the background 104, but the player's image should not include the shadow, which is a function of the lighting of the environment.
- Another factor is that the colors in the player's clothes may be similar to the background color; for example, the player may wear a white shirt, and the background may also be white in color. It might be difficult to separate the white shirt from the white background.
- One method to minimize this color- blurring problem is to have a special or peculiar pattern on the background. This will reduce the likelihood of anyone wearing clothes with similar design.
- Another technique to resolve such problems is to have the thresholder 210 modify the threshold value.
- FIG. 6 shows one set 250 of steps for the thresholder 210 to set the threshold value.
- the thresholder 210 thresholds (step 252) or binarizes the player's image.
- the thresholder 210 generates (step 254) an energy level from the thresholded image. This level reflects the size of the player's image. In one embodiment, the energy level is the sum of the values of all the pixels in the thresholded image. The size of an average player should be within a upper and a lower limit, which can be previously entered into the thresholder 210.
- the thresholder will increase (step 256) the threshold value by delta and perform thresholding (step 252) on the output from the second frame grabber 206 again. If the energy level is lower than the lower limit value, the size of the player is too small; the thresholder 210 will decrease (step 258) the threshold value by delta and perform thresholding (step 252) again on the output from the second frame grabber 206. If the energy level is within the upper and the lower limit, the threshold value is considered appropriate, and thresholding is accomplished (step 260).
- the value for delta is a compromise between the speed of convergence and the quality of the player's image. In one embodiment, the gray scale image has 256 levels, and the delta is 2.
- the upper and the lower limits can be set so that the thresholder 210 can obtain a well-defined binary image of the player under various conditions, such as players of different sizes, clothes having different colors, and different lighting conditions.
- One approach to set the limits is to sample many individuals. To set the lower limit, the approach picks the largest individual from the group. The lower limit is the lowest value one can set the limit while the apparatus 100 can still identify the player's image. Similarly, to set the upper limit, the approach picks the smallest individual from the group. The upper limit is the largest value one can set the limit while the apparatus 100 can still identify the player's image.
- the energy level can be used to determine if the player is in front of the background.
- the player is deemed to have left if the energy level of the player's image is lower than a minimum level.
- a minimum level For example, if the detector is made up of an array of 100 by 100 sub-detectors, the minimum level can be 40 units. This means that after thresholding, only 40 or fewer pixels register the value of one.
- the pre-processor can retrieve another background image (step 177). This ensures that if, for example, the lighting has changed, the background image is updated.
- the template matcher 154 maps (step 183) the player's image to a number of templates to generate template outputs.
- the template-matching output of a particular template is obtained by performing either a correlation operation or an inner- product operation between the player's image and that template.
- all of the templates have very simple patterns.
- each template has a bar of pixels whose values are non-zero, such as one.
- pixels with values being one are transparent, while pixels with values being zero are opaque.
- FIGS. 7A-B show one embodiment of examples of such templates. In FIG 7B, the size of that template has 120 by
- the templates are typically stored in a storage medium in the template matcher 154. They are separated into a number of sets of templates, with the bars in the templates within each set being substantially parallel, and with the orientations of the bars in different sets being different.
- One embodiment uses eight sets of templates, with the eight different orientations being 0°, 22.5°, 45°,
- FIGS. 7C-H show another representation of the bars. This representation only shows the boundaries for each bar. For example, in FIG. 7C, the boundaries 275 and 277 for a vertical bar 279 are shown. In another embodiment, there are altogether 118 templates.
- FIGS. 8A-C show another embodiment for the templates, with FIGS. 8A-B illustrating pixels with values equal to one being white or transparent, and values equal to zero being black or opaque, and with FIG. 8C showing the boundaries of a set of templates in one embodiment.
- FIGS. 8A-B illustrating pixels with values equal to one being white or transparent, and values equal to zero being black or opaque
- FIG. 8C showing the boundaries of a set of templates in one embodiment.
- a researcher can select one or more patterns that characterize this gesture, such as a block of pixels covering the area where the legs extend backwards, and a block covering the area where the bodies extend forward; each block should be of minimal size, but is big enough to cover all of the ten candidates when they kick backwards.
- the one or more patterns constitute one template. Then each candidate will perform another pre-defined gesture to determine one or more blocks, until all the pre-defined gestures are exhausted.
- every pre-defined gesture can be characterized by one or more templates so that every pre-defined gesture can be identified by the present invention.
- the word “characterizing” used in the phrase, "a template characterizing a gesture,” means “a template to be used for measuring characteristics of or in a gesture.”
- the clause "can be characterized” used in the sentence, "a gesture can be characterized by a template,” means “characteristics in or of a gesture can be measured using a template.”
- Another approach to generate the templates is by a process of elimination.
- the template starts with blocks of pixels with values equal to one, having equal dimensions, such as 5 pixels by 5 pixels, regularly positioned, such as the distance between the centers of adjacent blocks is 10 pixels.
- Each template includes one of the blocks.
- the candidates perform all of the pre-defined gestures. Based on the template outputs, the present invention uniquely defines each gesture. Templates are then removed, for example one at a time, until there is ambiguity in defining one or more gestures. Note that for this approach the dimensions of the blocks do not have to be equal, nor do they have to be regularly positioned.
- Yet another approach to generate the templates is by a process of addition.
- the process starts with two templates, each with one block of pixels with values equal to one, having dimensions, such as 3 pixels by 4 pixels; the blocks are randomly positioned on the templates.
- the candidates perform all of the pre-defined gestures. If based on the template outputs, the present invention cannot uniquely define each gesture, another template with a block of pixels with values equal to one is added. The process repeats until the present invention can uniquely define all of the pre-defined gesture.
- FIG 9 depicts the template matcher having an optical apparatus perform the mapping
- holographic recording of the templates can be done in the following way
- the image of a template is displayed on a spatial light modulator (SLM) 302
- a plane wave 301 illuminates the SLM 302, whose output (called the object beam) is focused by a lens 304 onto a spot 312 on a holographic medium 306
- the holographic medium is also illuminated by a plane wave 314, which is called the reference beam and is coherent with the object beam
- the interference pattern formed by the object beam and the reference beam 314 is recorded in the holographic medium 306 as either a volume hologram or a 2-D hologram
- This hologram is referred to as the hologram of the template displayed on the SLM 302
- the hologram of a second template is recorded at the same location 312 of the medium 306 by
- each beam when focused by a lens 308, falls onto a different element of the detector array 310, which is located at the focal plane of the lens 308; for example, one focused beam can fall onto one detector element 318, and one falls onto another detector element 320.
- the intensity profile of the reconstructed beam from each template is the template output for that template.
- the template output is proportional to the inner product of the player's image and the corresponding template. If the holographic medium 306 is located or near the focal plane of the lens 304, then the template output is proportional to the correlation between the player's image and the corresponding template. In one embodiment, the inner-product operation between the player's image and each template is performed during template matching. With all the holograms stored at the same spot, all of the template outputs are generated simultaneously. In one embodiment, the detector array 310 is a linear array with 118 detector elements—one element per template output.
- the template holograms are stored at different locations.
- the output from the spatial light modulator 302 scans or is duplicated to the different locations to map the player's image over each of the templates.
- Such template outputs can be designed to again fall onto different detectors, or onto the same detector at different time.
- the techniques to generate such holographic responses should be obvious to those skilled in the art, and can be found in "Holographic Memories,” written by D. Psaltis and F. Mok in the November 1995 issue of Scientific American.
- the above mapping approach to perform inner products can be done electronically also.
- the value at each pixel of the player's image is multiplied to the value at the corresponding pixel in each template. After the multiplication process, the outputs in the pixels on a template are summed to get the inner product or template output for that template.
- FIG. 10 illustrates one embodiment showing one set 350 of steps for an electronic mapping approach.
- the template matcher 154 compresses (step 352) the digitized player's image.
- the digitized templates have previously been compressed in the same fashion.
- the template matcher 154 performs inner products (step 354) of the compressed player's image with each template that has been pre-compressed to generate template outputs.
- the digitized player's image and the digitized templates are binarized.
- the compression can be combining the numerous pixels, each represented by one bit, on the player's image into bytes, or into words with many bits, such as 64 bits.
- the inner product is calculated by first performing a bitwise AND operation between each word from the compressed player's image and the corresponding word from the compressed template, then finding the number of logic-one's in the resulting word either by counting the number of one's or by using a look-up table. This process is repeated for all the words in the image if there is more than one word, with the results from all the word-matching processes summed.
- the simultaneous pixel matching and the use of look-up table increase the speed of computation.
- FIGS. 11A-G depict examples of template outputs.
- FIG. 11A shows the player's image mapped onto a set of vertical height templates; and
- FIG. 11B shows template outputs in a histogram, which also indicates the height of the player 102.
- FIGS. 1 IC and 1 ID show the corresponding outputs for horizontal position templates, which also indicate the width of the player 102. Outputs indicating height and width generally provide the player's body positions.
- FIGS. 1 IE and 1 IF show the outputs for templates with bars having an orientation of 157.5° from the x-axis; the outputs generally show the player's limb positions and their general orientation.
- FIG. 11G depicts a player's image mapped onto the templates shown on FIG. 8C.
- the post processor 156 analyzes those outputs to identify the specific pre-defined gesture that corresponds to the gesture in the player image.
- FIG. 12 shows one set 375 of steps analyzing template outputs performed by the post-processor in the present invention.
- the post-processor 156 tracks (step 377) the player's body position, and may delete (step 378) one or more template outputs.
- One rationale for the deletion is that the outputs from a number of templates may be null, as illustrated, for example, in the far-left and the far-right templates in FIGS. 1 IC & D.
- the postprocessor 156 may not analyze the outputs from those templates.
- the template matcher 154 may not even perform the matching of the player's image with a number of templates. In other words, there will be fewer template outputs than the number of templates.
- the post-processor 156 identifies (step 379) pre-defined general gestures, identifies (step 381) predefined specific gestures, and analyzes (step 383) the specific gestures based on a set of rules to identify the one or more of them that correspond to the gesture in the player's image.
- the set of rules can be generated based on known inputs and known outputs. Generating such rules with known inputs and outputs should be obvious to those skilled in the art.
- FIG 13 depicts one embodiment of a post-processor 156.
- a tracker 400 receives the template outputs, and performs the tracking and the deleting template-outputs functions. From the remaining template outputs, a neural network 402 generates one or more pre-defined specific gestures. Then, a rule- based analyzer 404 analyzes the specific gestures to identify the gesture in the player's image.
- FIG. 14 shows one set 450 of steps of the tracker 400 deleting template outputs.
- the tracker starts by identifying the center (step 453) of the player's image, and the height (step 455) of the player's image. Height and center identification can be done, for example, through the template-output histograms shown in FIGS. 1 IB and 1 ID. The highest non-zero point of the histograms in FIG. 1 IB shows the player's height, while the maximum- value point of the histogram shown in FIG. 1 ID shows the player's center. Note that determining the player's height and center only requires mapping the player's image to two sets of templates, and does not require mapping the image to all of the sets of templates.
- the tracker 400 may delete (step 457) one or more template outputs from further analysis.
- the player's image has previously been mapped to all the templates.
- the tracker 400 may delete one or more template outputs.
- the template matcher 154 deletes at least one template from one set of templates from the matching process; the one or more deleted templates have not been mapped to the player's image before. Then, the template matcher 154 maps the player's image to the remaining un-mapped templates to generate template outputs. In this approach, initially, the template matcher 154 only maps the player's image to the templates required to identify the player's center and height.
- one or more templates will not be needed to map with the player's image; they are relevant to areas that the player's image is not located at.
- the template matcher deletes those one or more templates for this particular player's image, with the remaining templates mapped to generate additional template outputs.
- This embodiment can be implemented by the electronic mapping approach.
- the post-processor 156 based on the height and the center information, identifies the player's gesture at a similar speed whether the player shifts around, such as in the left or right direction, or towards or away from the detector 106. In other words, the post-processor 156 becomes substantially shift and scale invariant. It is shift invariant because though the player shifts around, the tracker can identify his center, and perform the analysis from there. The post-processor 156 is scale invariant because though the player can be of different height, or can move towards or away from the detector, the tracker can identify his height, and perform the analysis from there.
- the change in one or more specific locations on the player can be used to help recognize gestures, such as the player stepping forward or stepping backward.
- gestures such as the player stepping forward or stepping backward.
- FIG. 15 shows one set 461 of steps that uses the change in a specific location on the player's image to identify one or more gestures. Assume that the player's image has already been generated.
- Step 463 that image is retrieved (Step 463) and is mapped (Step 465) by the template matcher to at least one set of templates to generate template outputs. Based on the template outputs, one specific location on the player's image is identified (Step 467) by the tracker.
- this position can be the center of the player's image or can be the top of the player's head.
- steps 463 to 467 are repeated (Step 469) to identify the change in the specific location by the tracker.
- template outputs are analyzed (Step 471) by the post-processor to identify the player's gesture. For example, in a player's image, template outputs indicate that the player is standing with his hand extended forward. In a subsequent image, template outputs indicate that the player is still standing with his hand extended forward. However, the center of the player has moved forward also. Based on the movement of the center of the player, the apparatus 100 can determine that the player has moved forward while standing with his hand extended forward.
- the player is standing with his hand extended downward; the subsequent image shows the same gesture, except the top of the player's head has moved upward. Based on the movement of the top of the head, the apparatus 100 can determine that the player has jumped up with his hand extended downward.
- the tracker 400 may not want to update the center or the height information all the time. In one embodiment, updating occurs when the player 102 is at rest, and when updating has not been done for a period of time.
- FIG. 16 shows one set 475 of steps to determine when to update the center and the height information. First, assume that the player's gesture has already been identified (step 185).
- the tracker 400 determines (step 477) whether the gesture is a rest gesture, which, in one embodiment, is defined as a standing gesture with both hands hanging down.
- the tracker 400 also decides (step 479) whether m seconds has passed since the player was in the rest gesture, where m can be 2. If m seconds has passed, and if the player is in the rest position, the tracker 400 will identify and update the height and the center of the player. The above description, focuses on both the center and the height of the player. In another embodiment, only one location is tracked. In yet another embodiment, more than 2 locations are tracked.
- the neural network 402 analyzes the template outputs.
- a neural network is a network whose circuit parameters are learnt through known inputs and known outputs. Parameters, also known as weights of interconnections, in the circuit are modified until the known inputs provide the known outputs. Generating such networks should be obvious to those skilled in the arts.
- Using neural network is just one approach to analyze the outputs.
- Other approaches are applicable, such as a rule-based system.
- the rules are set based on the known inputs and known outputs. When an input is received, the ruled-based system compares it with its rules to generate the appropriate output. Generating such ruled-based systems should also be obvious to those skilled in the art. Whether a neural network or a rule-based system is used, the analysis can be modified into a two-step approach.
- the template outputs are analyzed to identify one or more pre-defined general gestures that the image's gesture belongs. Then, in a second step, one or more pre-defined specific gestures that correspond to the image's gesture are identified. For example, the first step determines the player's body general information, which indicates that he is crouching. Then the next step determines the player's limb locations and orientations while he is crouching, and they indicate that he is punching up. In one embodiment, the first step uses only a portion of the template outputs, while the second step uses the results from the first step and some of the template outputs. Both steps may not analyze all of the template outputs at once. Such two-step approach can reduce the complexity of the problem and increase the accuracy of recognition. Based on the above description, it should be obvious to those skilled in the art that the problem of analyzing the template outputs can be further separated into more than two steps.
- the neural network is either a two-layer neural network, or two sequential neural networks.
- the first layer or the first neural network of the sequence identifies the one or more pre-defined general gestures the image's gesture belongs.
- the second layer or the second neural network of the sequence identifies the one or more pre-defined specific gestures the image's gesture belongs.
- Using more than one neural network or using a neural network with more than one layer reduces the complexity of designing a neural network to analyze all of the template outputs. Based on the description, it should be obvious to those skilled in the art that more than two neural networks, or a neural network with more than two layers can be used to analyze the template outputs.
- the one or more outputs from the neural network 402 identify the pre-defined specific gesture that corresponds to the gesture in the image.
- a rule-based analyzer 404 identifies one specific gesture that corresponds to the gesture in the image.
- the rule-based analyzer can be a look-up table.
- the types of rules used can depend on the types of games a player is in, and the type of templates used. If the game is for a player fighting only one opponent, then an example of a rule is: "If the pre-defined specific gesture is kicking backwards and punching upwards while the player is standing, actually the player's gesture is kicking backwards with his body leaning forward while the player is standing.” The reason is that a player would not kick backward and punch upwards at the same time — there is only one opponent. However, when the player kicks backwards, to maintain balance, he typically leans forward. With the player leaning forward, probably, the templates in FIGS. 7E and 7G, which can be used to indicate that the player punches upwards, would register some values.
- the above rule is established. Based on the types of games and the templates used, a set of rules are generated. These rules can be stored in a lookup table, and are stored in a medium that can be accessed by the post-processor.
- the present invention is not limited to gestures.
- the player's sound is also captured. As the character on the monitor moves following the gesture of the player, the character also makes the same sound as the player did.
- the player's face is digitized, and is used as the face of the character in the game.
- the present invention is not limited to players.
- the present invention is also applicable to recognizing the gestures of other living beings, such as dogs.
- the invention is also applicable to robots simulating a living being.
- the living beings and the robots are collectively known as beings, which, in the present invention, include living beings and robots simulating living beings.
- the being with the non-being is collectively considered as a being. For example, if a man is holding a knife, the man and the knife together are considered as a being.
- the present invention is not limited to playing games.
- Gesture recognition can be applied to manipulating objects.
- a robot can follow a man's gesture. As he moves, its gesture is recognized, and the robot follows it. For example, as he raises his hand, the robot moves an object up accordingly.
- images are not binarized, or thresholded.
- Template outputs can be generated, for example, through inner products performed on the corresponding non-binarized images. This can be done through digital electronics or through holography.
- the word “gesture” can have a number of meanings.
- the word “gesture” implies the use of movement of limbs or a body as a means of expression. This includes the body in resting position, such as standing, because the expression conveyed is resting.
- the word “gesture” implies the change in relative positions of some parts of a being. For example, the player may be holding a knife. By moving her fingers a little bit, the knife can move significantly. This knife movement is considered as a gesture in the present invention because there is a change in relative positions of some parts of the being, which is the player holding the knife.
- the present invention includes the pre-processor, the template-matcher, the post-processor and the detector 106.
- the present invention also includes the monitor 108.
- the present invention further includes the background 104.
- FIG. 1 shows only one detector 106 capturing images.
- images can be captured by more than one detector or camera, with the different detectors capturing images from different directions.
- the present invention describes one technique to obtain the images of a player by current images and background images.
- the player can be playing in a controlled environment.
- the background is black and the player's clothing is white.
- the current image is substantially the player's image.
- the present invention also includes one or more storage media. They can be used in many ways. For example, after the player's image has been generated, one storage medium can store the image. Then the image is retrieved to be analyzed.
- the storage media can be in one or more of the following: the pre-processor, the template matcher and the post-processor.
- FIG. 17 shows another embodiment 500 of the present invention for two players.
- This embodiment includes two detectors, 502 and 504, two backgrounds, 506 and
- the controller 510 which again can include the preprocessor, template matcher and the post processor.
- the pre- processor is configured to retrieve the images of the players; the template matcher coupled to the pre-processor retrieves a number of templates, with each template characterizing at least part of a pre-defined gesture, and maps the players' images directly to more than one templates to generate a number of template outputs; and the post-processor coupled to the template matcher analyzes the template outputs to identify specific pre-defined gestures. Those gestures correspond to the gestures in the images, and they are used to replicate the identified gestures for controlling characters interacting in the game.
- the display 512 shows the interactive game.
- One advantage of this embodiment 500 is that as a player, such as 514, is playing, with his movement replicated through gesture-recognition by a character on the display 512, he can see the character following his movement.
- the embodiment 500 can further include two mirrors, 524 and 526. The mirrors are positioned so that one player can see the other player easily, while he is watching the display. This is because the mirrors are in close proximity to the display 512.
- the present invention only describes two players playing. It should be obvious to those skilled in the art that the present invention can be extended to more than two players playing simultaneously, with their gestures captured and replicated by characters in the game.
- the present invention describes different approaches to recognize gestures. In the embodiments with more than one player, other gesture recognition approaches known to those skilled in the art can also be used. As described, when a player moves, his motion can control the actions of a character in a game. With two or more players playing the same game, their corresponding characters can be interacting in the game.
- the player corresponding to the second character— the second player— will feel it; that player will feel a force.
- an impulse will be transmitted to the players, or to the player whose character has been hit.
- the player(s) will feel some vibration or force due to the impulse.
- This impulse can be sound waves from a woofer, or can be the movement of a plate a player is standing on.
- FIG. 18 illustrates one set 600 of steps to implement one such embodiment, 610, as shown in FIG. 19.
- Christine is playing with Linda.
- Christine's character hits Linda's character.
- an identifier, 612 identifies (Step 602) the action of Christine's character hitting Linda's character.
- the action of hitting can be identified in a number of ways.
- the identifier 612 retrieves a number of images of the game, such as through a frame grabber grabbing a number of frames. At least two images have a temporal relationship, with one being earlier in time than another. Certain patterns in at least the two images with temporal relationship are recognized.
- each character in the game has a life bar to show the health of the character. Every time a character receives a hit, the length of its life bar decreases. When the length of a life bar diminishes to zero, the corresponding character is considered dead, and the game is over at least for that player.
- One way to identify if Linda's character has been hit is to measure the change in length of the life bar of Linda's character.
- the bar, 625 is made of many vertical columns of bits, 627 and 629.
- the identifier 612 can detect the change in the length of the life bar. To accommodate noise in the system, if the difference is, for example, only one column, Linda's character will not be considered as being hit.
- the identifier decides that Linda's character has been hit, it sends a message to a generator, 614, which will generate (Step 604) a force in response to the action exerted onto Linda's character.
- This generation can be the production of an electrical signal to run an instrument.
- an applicator, 616 applies (Step 606) the force onto Linda.
- the force can be an impulse directed towards Linda, which can be generated by a woofer producing low-frequency sound waves for one or two seconds. If strong enough, the sound wave can be felt.
- the directionality of the sound waves can be tailored so that only Linda feels the wave, or all other players also feel the wave.
- each player stands on a plate, and the impulse can be applied onto Linda by vibrating for a short duration of time, such as a second, the plate Linda is standing on.
- FIG. 21 shows another embodiment of the invention, where the embodiment provides background illumination for a player 650 so as to increase the signal-to-noise ratio of the player's image at a detector 680 of the image.
- the detector 680 can be the detector 106, and can be connected to the pre-processor 152, the template matcher 154 and the postprocessor 156.
- background light is generated to be as uniform as possible when the images of a player are measured.
- non-uniform light can create an atmosphere of fun and excitement.
- the background light generated is spatially non-uniform.
- FIG. 21 shows the top view of one such approach.
- the background light is emitted from three light boxes, 656, 658 and 660, with two being approximately 4 feet wide by 8.375 feet high, and one being approximately 2 feet wide and 8.375 feet high.
- Each of the 4 feet wide boxes holds three 8 feet long flourescent light tubes, such as 652 and 654, and the 2 feet wide box holds one of those flourescent light tube.
- the seven light tubes are spaced substantially evenly apart.
- the front covers of the light boxes, such as 662 is translucent, and is blue in color. Light generated is conspicuously non-uniform. Note that spatial non-uniformity does not mean that light is non- uniform only close to the edge of, for example, a light box; the background light is spatially non-uniform before the edge of the box.
- the ratio of the intensity of light between a peak and a valley at the front covers of the light boxes is more than two to one to create the spatial non- uniformity.
- the generated background light helps to create an image of the player at the detector 680.
- the intensity of background light at the valley is more than the intensity of light at any point within at least 80% of the image of the player 650 at the detector 680. If the color of the player's clothing is dark, then the intensity of light at the valley can be more than the intensity of light at almost any point within the image of the player at the detector.
- the non-uniform light is generated by light bulbs, instead of light tubes.
- the non-uniform light is generated by fluorescent tubes that have a meandering shape.
- the background light generated is temporally non-uniform.
- the light generated blinks as a function of time, where the blinking rate synchronizes with the time when the player's image is detected Blinking can mean that light is turned on and off, and the player's image is detected when light is turned on
- FIG 22 shows another embodiment with two players, 704 and
- the non-uniform light generated by light boxes, 700 and 702 provides background light for the players
- the image of the player 704 is measured by the detector 708, and the image of the player 706 is measured by the detector 710
- Other methods can be used to identify the gestures of the image of a player For example, through edge detection techniques, one can identify the edges of the image Based on the edge information, one can identify the gestures of the player One way to enhance edge identification is to illuminate the player with background light
- the image of the player when measured from the front can be in the form of a silhouette, with the edges of the silhouette having the biggest change in intensity Through identifying such large changes, one can find out the edges of the image, and, in turn, the gestures in the image
- edge detection techniques should be obvious to those skilled in the art, and will not be further described in this specification
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU81411/98A AU8141198A (en) | 1997-06-20 | 1998-06-22 | Methods and apparatus for gesture recognition |
JP11504582A JP2000517087A (ja) | 1997-06-20 | 1998-06-22 | ジェスチャ認識のための方法及び装置 |
EP98931238A EP0920670A1 (fr) | 1997-06-20 | 1998-06-22 | Procedes et dispositifs pour la reconnaissance des gestes |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/879,854 US6075895A (en) | 1997-06-20 | 1997-06-20 | Methods and apparatus for gesture recognition based on templates |
US90637697A | 1997-08-05 | 1997-08-05 | |
US95388197A | 1997-10-20 | 1997-10-20 | |
US2277098A | 1998-02-12 | 1998-02-12 | |
US08/953,881 | 1998-02-12 | ||
US08/906,376 | 1998-02-12 | ||
US08/879,854 | 1998-02-12 | ||
US09/022,770 | 1998-02-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1998059312A1 true WO1998059312A1 (fr) | 1998-12-30 |
Family
ID=27487144
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/012169 WO1998059312A1 (fr) | 1997-06-20 | 1998-06-22 | Procedes et dispositifs pour la reconnaissance des gestes |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP0920670A1 (fr) |
JP (1) | JP2000517087A (fr) |
AU (1) | AU8141198A (fr) |
WO (1) | WO1998059312A1 (fr) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2359623A (en) * | 1999-11-22 | 2001-08-29 | Namco Ltd | Computer game system with sign perception system |
WO2001075568A1 (fr) * | 2000-03-30 | 2001-10-11 | Ideogramic Aps | Procede de modelisation fonde sur le mouvement |
CN101332362A (zh) * | 2008-08-05 | 2008-12-31 | 北京中星微电子有限公司 | 基于人体姿态识别的互动娱乐系统及其实现方法 |
WO2010007587A1 (fr) * | 2008-07-16 | 2010-01-21 | Nxp B.V. | Système et procédé pour réaliser une commande de mouvement avec une compensation de luminance d'affichage |
WO2011123845A3 (fr) * | 2010-04-01 | 2012-06-28 | Qualcomm Incorporated | Interface de dispositif informatique |
WO2013103410A1 (fr) * | 2012-01-05 | 2013-07-11 | California Institute Of Technology | Systèmes surround d'imagerie pour commande d'affichage sans contact |
US9530213B2 (en) | 2013-01-02 | 2016-12-27 | California Institute Of Technology | Single-sensor system for extracting depth information from image blur |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1428108B1 (fr) * | 2001-05-14 | 2013-02-13 | Koninklijke Philips Electronics N.V. | Procede destine a interagir avec des flux de contenu en temps reel |
JP3933107B2 (ja) * | 2003-07-31 | 2007-06-20 | 日産自動車株式会社 | 車輌用入力装置 |
-
1998
- 1998-06-22 WO PCT/US1998/012169 patent/WO1998059312A1/fr not_active Application Discontinuation
- 1998-06-22 AU AU81411/98A patent/AU8141198A/en not_active Abandoned
- 1998-06-22 EP EP98931238A patent/EP0920670A1/fr not_active Withdrawn
- 1998-06-22 JP JP11504582A patent/JP2000517087A/ja active Pending
Non-Patent Citations (2)
Title |
---|
TOMITA A ET AL: "EXTRACTION OF A PERSON'S HANDSHAPE FOR APPLICATION IN A HUMAN INTERFACE", IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS, COMMUNICATIONS AND COMPUTER SCIENCES, vol. E78-A, no. 8, August 1995 (1995-08-01), pages 951 - 956, XP000536050 * |
WATANABE T ET AL: "REAL-TIME GESTURE RECOGNITION USING MASKABLE TEMPLATE MODEL", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, 17 June 1996 (1996-06-17), pages 341 - 348, XP000676114 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2359623A (en) * | 1999-11-22 | 2001-08-29 | Namco Ltd | Computer game system with sign perception system |
US6642917B1 (en) | 1999-11-22 | 2003-11-04 | Namco, Ltd. | Sign perception system, game system, and computer-readable recording medium having game program recorded thereon |
GB2359623B (en) * | 1999-11-22 | 2004-04-14 | Namco Ltd | A game system and a method of enabling a player to play a game |
WO2001075568A1 (fr) * | 2000-03-30 | 2001-10-11 | Ideogramic Aps | Procede de modelisation fonde sur le mouvement |
US20110109644A1 (en) * | 2008-07-16 | 2011-05-12 | Nxp B.V. | System and method for performing motion control with display luminance compensation |
WO2010007587A1 (fr) * | 2008-07-16 | 2010-01-21 | Nxp B.V. | Système et procédé pour réaliser une commande de mouvement avec une compensation de luminance d'affichage |
CN101332362A (zh) * | 2008-08-05 | 2008-12-31 | 北京中星微电子有限公司 | 基于人体姿态识别的互动娱乐系统及其实现方法 |
CN101332362B (zh) * | 2008-08-05 | 2012-09-19 | 北京中星微电子有限公司 | 基于人体姿态识别的互动娱乐系统及其实现方法 |
WO2011123845A3 (fr) * | 2010-04-01 | 2012-06-28 | Qualcomm Incorporated | Interface de dispositif informatique |
US8818027B2 (en) | 2010-04-01 | 2014-08-26 | Qualcomm Incorporated | Computing device interface |
WO2013103410A1 (fr) * | 2012-01-05 | 2013-07-11 | California Institute Of Technology | Systèmes surround d'imagerie pour commande d'affichage sans contact |
US9524021B2 (en) | 2012-01-05 | 2016-12-20 | California Institute Of Technology | Imaging surround system for touch-free display control |
US9530213B2 (en) | 2013-01-02 | 2016-12-27 | California Institute Of Technology | Single-sensor system for extracting depth information from image blur |
US10291894B2 (en) | 2013-01-02 | 2019-05-14 | California Institute Of Technology | Single-sensor system for extracting depth information from image blur |
Also Published As
Publication number | Publication date |
---|---|
AU8141198A (en) | 1999-01-04 |
EP0920670A1 (fr) | 1999-06-09 |
JP2000517087A (ja) | 2000-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6075895A (en) | Methods and apparatus for gesture recognition based on templates | |
Calabrese et al. | DHP19: Dynamic vision sensor 3D human pose dataset | |
US8411149B2 (en) | Method and device for identifying and extracting images of multiple users, and for recognizing user gestures | |
Montserrat et al. | Training object detection and recognition CNN models using data augmentation | |
US6002808A (en) | Hand gesture control system | |
KR100474848B1 (ko) | 영상시각 정보를 결합하여 실시간으로 복수의 얼굴을검출하고 추적하는 얼굴 검출 및 추적 시스템 및 방법 | |
Anderson et al. | A real-time automated system for the recognition of human facial expressions | |
US6499025B1 (en) | System and method for tracking objects by fusing results of multiple sensing modalities | |
US8407625B2 (en) | Behavior recognition system | |
CN108090561B (zh) | 存储介质、电子装置、游戏操作的执行方法和装置 | |
US20090180669A1 (en) | Device, system and method for determining compliance with a positioning instruction by a figure in an image | |
US20120033856A1 (en) | System and method for enabling meaningful interaction with video based characters and objects | |
JP2012518236A (ja) | ジェスチャー認識のための方法及びシステム | |
CN106815578A (zh) | 一种基于深度运动图‑尺度不变特征变换的手势识别方法 | |
WO1998059312A1 (fr) | Procedes et dispositifs pour la reconnaissance des gestes | |
Fiaz et al. | Vision based human activity tracking using artificial neural networks | |
CN106851937A (zh) | 一种手势控制台灯的方法及装置 | |
CN112991282A (zh) | 一种基于机器人的键盘输入设备自动化测试方法 | |
Jetley et al. | 3D activity recognition using motion history and binary shape templates | |
Ebner | On the evolution of edge detectors for robot vision using genetic programming | |
KR20020011851A (ko) | 인공시각과 패턴인식을 이용한 체감형 게임 장치 및 방법. | |
CN115063724A (zh) | 一种果树田垄的识别方法及电子设备 | |
JP4221681B2 (ja) | ジェスチャ認識装置 | |
KR200239844Y1 (ko) | 인공시각과 패턴인식을 이용한 체감형 게임 장치. | |
Mao | Tracking a tennis ball using image processing techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM GW HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 1999 504582 Kind code of ref document: A Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1998931238 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1998931238 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1998931238 Country of ref document: EP |