WO2008053433A2 - Hand gesture recognition by scanning line-wise hand images and by extracting contour extreme points - Google Patents

Hand gesture recognition by scanning line-wise hand images and by extracting contour extreme points Download PDF

Info

Publication number
WO2008053433A2
WO2008053433A2 PCT/IB2007/054412 IB2007054412W WO2008053433A2 WO 2008053433 A2 WO2008053433 A2 WO 2008053433A2 IB 2007054412 W IB2007054412 W IB 2007054412W WO 2008053433 A2 WO2008053433 A2 WO 2008053433A2
Authority
WO
WIPO (PCT)
Prior art keywords
line
image
pixels
pixel
qualified
Prior art date
Application number
PCT/IB2007/054412
Other languages
French (fr)
Other versions
WO2008053433A3 (en
Inventor
Alexander A. Danilin
Yannick Bihan
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Priority to JP2009535171A priority Critical patent/JP2010509651A/en
Publication of WO2008053433A2 publication Critical patent/WO2008053433A2/en
Publication of WO2008053433A3 publication Critical patent/WO2008053433A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention relates to apparatus and a method for image recognition and is concerned particularly, although not exclusively, with apparatus and a method for the recognition of a human hand shape.
  • Apparatus for the recognition of objects are well known and several prior systems exist which aim to recognize different hand signals using cameras and electronic processing apparatus.
  • Such prior systems typically require a relatively large amount of memory and/or involve relatively intensive computations in order to distinguish between different hand signs. Because of this they consume relatively large amounts of power.
  • JP2003-346162 One such prior system for recognizing an image of a human hand is described in Japanese patent number JP2003-346162.
  • the technique requires the detection of a hand, based on a skin tone recognition procedure.
  • a detailed polygonal shape, described by points and angles, is built up and is used to determine how many fingers are raised.
  • the calculations necessary to process the image are numerous and thus the electronic processing capacity and memory of the apparatus used in this technique are both necessarily relatively large, as is its power consumption.
  • Embodiments of the present invention aim to provide a robust technique for recognizing the shape of an object, such as a human hand, which requires relatively little electronic processing power and memory, and involves low power consumption, and which may therefore be suitable for applications which use wireless "smart camera" apparatus.
  • Smart cameras i.e. cameras with built in processing capability, process locally the raw image data and send only keywords of information by wireless transmission, to a host system.
  • the inventors found that this is more efficient for power consumption than broadcasting live video to an analyzing host computer.
  • an image is electronically scanned line-byline.
  • the lines of pixels are processed and pixels which qualify according to both tonal and positional criteria are selected as contour points.
  • a set of contour points is compared with a stored reference to determine the nature of the image.
  • image recognition apparatus comprising an image sensor, a first electronic processor, a second electronic processor, and a memory, - wherein the first electronic processor is a parallel video processor comprising a plurality of line memories,
  • the first electronic processor sweeps horizontally line by line an image sensed by the image sensor
  • each line of pixels is processed and stored in one of the line memories, - and wherein each pixel in a line stored in a line memory is first compared with a qualification criterion based upon its tone, and, for each line after the first line in which a tonally qualified pixel was detected, the first and last qualifying pixels on each line are compared with positional criteria, relative to tonally qualified pixels on a previous line,
  • contour points which qualify according to both their tone and position are selected as contour points
  • the apparatus is arranged to store contour points
  • a set of the stored contour points is processed by the second electronic processor which compares them with stored information to determine the nature of the image as described by the contour points.
  • the sensor registers the image comprising a plurality of pixels organized in a line by line basis.
  • the image is communicated to the first electronic processor.
  • the image is horizontally swept line by line by the first electronic processor in order to detect the presence of the hand in the image based on the tone of the pixels. For each of the lines, when the tonally qualified pixel is detected, the first and last qualifying pixels are validated as the contour points.
  • the points of the contour are further provided to the second electronic processor that compares the received contour points with the stored information in order to determine the nature of the image as described by the contour points.
  • the advantage of such image recognition apparatus is that it does require just the most significant points of the contour in order to determine the nature of the image.
  • the determination of the contour points takes advantage of the line organization of the memory and of the parallel video processor by processing the received image in line by line manner. Since the apparatus processes the image line by line in a parallel way, power consumption can be kept to a minimum.
  • the apparatus comprises a wireless camera, as the image sensor, with embedded first and second electronic processors.
  • the tonal criterion qualification of a pixel is that the pixel has a value within a predetermined range of values, which is indicative of a skin tone, in UV color space.
  • Such tonal criterion qualification is a simple and convenient way of determining the presence of the hand in the image.
  • the apparatus is arranged to recognize images of a human hand.
  • the camera comprises at least one filter. This for the purpose of filtering skin color pixels.
  • the invention also provides a method of electronically identifying an image, the method comprising the steps of: -sweeping the image horizontally as a number of lines;
  • contour points - storing pixels, which are qualified according to both tonal and positional criteria, in a memory as contour points; and - processing a set of contour points by comparing them with stored contours to identify the nature of the image.
  • the first contour point is taken as the first detected pixel that qualifies according to the tonal criterion.
  • a pixel is considered to meet the positional criterion when a difference in its position, as compared with a corresponding maximum or minimum value pixel from a preceding line, falls within a predetermined range.
  • the method comprises a method of recognizing an image of a human hand, wherein a tonally qualified pixel may be one for which its value lies within a predetermined range of values of UV color space.
  • the method comprises comparing contours with a set of stored contours each of which corresponds to a different hand shape or sign.
  • the method comprises determining a sequence of hand shapes or signs in order to identify a hand gesture.
  • Figure 1 is a schematic view of image recognition apparatus according to an embodiment of the present invention
  • Figure 2 is a schematic flow diagram showing a method of acquiring contour points from a scanned image
  • Figures 3a and 3b show schematically examples of a line-scan technique for different images
  • Figure 4 shows an image contour derived from the image scan shown in Figure 3 a
  • Figure 5 shows a plurality of images and their corresponding image contours, derived by a technique according to the present invention.
  • the embodiment described herein uses a wireless "smart camera" to detect images of a human hand. Power consumption must be kept to a minimum to prolong battery life. Accordingly, a parallel processing architecture is used since this keeps to a minimum the number of memory accesses, the clock speed and the decoding of instructions. Processing the image data using a parallel video processor in the wireless camera is more power efficient than transmitting raw captured data to a fixed device.
  • the smart camera consists of basically four components, one or two image sensors, an SIMD (Single Instruction Multiple Data) processor for low level image processing, a general purpose processor for intermediate and high level processing and control, and a communication module. Both processors are coupled using a dual ported RAM that enables them to work in a shared workspace at their own processing speed.
  • FIG. 1 shows schematically the architecture of an embodiment of image recognition apparatus according to the invention.
  • the apparatus comprises a smart camera, shown generally at 10.
  • the apparatus comprises a sensor 12, a video processor 14, a general processor 16, a dual ported RAM (DPRAM) 18, an EEPROM 20, a wireless communication subsystem 22, an inter-chip control device (I2C) 24, connecting the video processor 14 and the general processor 16, DPRAM buses 26 and 28, an EEPROM bus 30, and UART (or other serial alternative) bus 32.
  • DPRAM dual ported RAM
  • I2C inter-chip control device
  • the video processor 14 comprises a linear processor array (LPA) and a plurality of line memories (not shown).
  • LPA linear processor array
  • line memories not shown
  • the video processor is a parallel processor and may be realized by an IC3D, which is a member of the Philips Xetal family of SIMD processors.
  • an Atmel 8051 device may be used as the general processor.
  • the heart of the video processor 14 is formed by the linear processor array (LPA) with 320 Reduced Instruction Set Computer (RISC) processors. Each of these processors has simultaneous read and write access within one clock cycle to memory positions in the LPA. Both of the memory address and the instruction of the processors are shared in SIMD sense. Each processor in the LPA can also read the memory data of its left and right neighbors directly. At the extremes of the linear array the inputs of these processors are optionally coupled or mirrored.
  • the LPA processors can handle up to 64 instructions ranging from arithmetic and single cycle multiply-accumulate to compound instructions. In addition to these there are conditional guarding instructions enabling data dependent operations. Data paths are 10 bits wide. Each processor has two word registers and a flag register.
  • the line memories in the video processor store 64 lines of 3,200 bits. Pixels of the image lines are placed in an interlaced way on this memory.
  • GOPS Giga Operations Per Second
  • the device is inherently a low power processor as not only instruction decoding is shared between all 320 processing elements, but also memory access is on ultra wide memory words that contain complete image lines instead of energy consuming access to multiple-pixel- wide memory locations.
  • the power consumption is measured to be below 100m W in active processing modes.
  • Programs for the video processor 14 are stored in the EEPROM and can be uploaded from the general processor 16 via the I 2 C 24.
  • the general processor 16 can load a program into the video processor for a specific task that has to be carried out for an image.
  • the software for the device 10 consists of three parts that are almost independently developed.
  • Programs for the video (parallel) processor 14 are written in a C++ language with implicit parallel data types. All programs are written in a line-based manner where complete image lines are processed in single clock cycle instructions. By guarding constructions, data adaptive software structures can be implemented. Typical functions, which can run on this processor, are image improvement, motion analysis, object detection and tracking algorithms.
  • the programs on the general processor 16 are dedicated to keep track of the object data over time.
  • the general processor performs the host function (running the operating system) and can decide to transmit events to a host system via the communications subsystem 22.
  • the purpose of the video processor program is, in this embodiment, to detect "contour points" of a hand and to store them in the DPRAM.
  • the video processor receives information from one or two VGA sensors 12 (only one in this embodiment) on four channels with a YUYV format (depicted by the element 34 in the figure). Also other formats are possible for communicating the image from the sensor to the video processor.
  • the first step consists in filtering skin color pixels. Low- pass or median filters with appropriate thresholds are employed in order to remove noise from the detected image because for the next step a very robust detection with minimal noise is required.
  • the video processor is an SIMD processor so it can process the image only on a complete line and not pixel by pixel. After detecting a hand, a line sweep technique is used to build an object contour, which in this embodiment is a hand contour.
  • the technique involves sweeping a horizontal line across the image, keeping track of certain data, and performing certain actions every time a certain event is encountered during the line sweep.
  • Pixels which qualify as contour points are stored in the DPRAM 18.
  • the contour thus derived is then analyzed by the general processor 16, by comparison with stored reference contours, in order to determine the nature of the object, or in this particular case to determine the nature of a hand sign.
  • a determination is made as to which pixels qualify, due to their tone, as image pixels (i.e. of the object in question).
  • the tone which is of interest is a skin tone, and accordingly pixels whose value falls within a predetermined range of values appropriate to skin color in UV space-space are selected as pixels which qualify tonally.
  • Figure 2 shows schematically a method of building a set of contour points defining the image, according to this embodiment of the invention.
  • a line of pixels is read into the processor 14 and at step 110 a determination is made as to whether the line of pixels contains a tonally qualified pixel (in this case a skin tone pixel).
  • a tonally qualified pixel in this case a skin tone pixel
  • step 130 the left most (MinX) and right most (MaxX) tonally qualified pixels on the line are obtained.
  • a noise reduction process is performed.
  • these pixels are compared with their counterparts from the previously considered line.
  • step 150 a determination is made as to whether the pixels qualify positionally.
  • Figure 3 shows an example of the line sweep technique and contour points.
  • MinX and MaxX are the minimum and maximum X coordinates of tonally qualified pixels in the line (i.e. skin tone pixels in the line).
  • the table below shows an example of the minimum and maximum values of X for each of the contour points.
  • a contour point is generated only when . MinX is within a predetermined range ( . i, . 2 ) or . MaxX is within the predetermined range ( . l, . 2 ).
  • the reason for imposing this positional qualification criteria is that pixels can be ignored when they are too close to contour points on a preceding line. Such pixels may, for example, indicate merely the curvature of the hand and are not needed to form the uniquely identifying contour of the hand. This is with the exception of the first contour point E. Contour point Ei is generated for the very first skin tone pixel to be detected. Using this approach a contour C (Ei ... E n ) can be built.
  • Figure 4 shows the contour derived from the five finger hand sign depicted in Figure 3.
  • Figure 5 shows a number of other hand signs and their contours, which may be derived in the above described manner.
  • the X and Y ratio describe every contour. For example in Figure 4, if Y5 is greater than Y 4 then this is a right hand, otherwise it is a left hand. Different hand signs correspond to different X and Y ratios.
  • a sequence of hand shapes may be used to determine a hand gesture. While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments.
  • a computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)
  • Image Processing (AREA)

Abstract

Apparatus and a method for image recognition are disclosed, in which an image is electronically scanned line-by-line. Each line of pixels is processed and pixels which qualify according to both tonal and positional criteria are selected as contour points. A set of contour points is compared with a stored reference to determine the nature of the image.

Description

IMAGE RECOGNITION APPARATUS AND METHOD
FIELD OF THE INVENTION
The present invention relates to apparatus and a method for image recognition and is concerned particularly, although not exclusively, with apparatus and a method for the recognition of a human hand shape.
BACKGROUND OF THE INVENTION
Apparatus for the recognition of objects, such as human hand shapes, are well known and several prior systems exist which aim to recognize different hand signals using cameras and electronic processing apparatus. Such prior systems typically require a relatively large amount of memory and/or involve relatively intensive computations in order to distinguish between different hand signs. Because of this they consume relatively large amounts of power.
One such prior system for recognizing an image of a human hand is described in Japanese patent number JP2003-346162. This describes a computer input system that aims to recognize which finger extends from a human hand by the electronic processing of an image of the hand. The technique requires the detection of a hand, based on a skin tone recognition procedure. A detailed polygonal shape, described by points and angles, is built up and is used to determine how many fingers are raised. The calculations necessary to process the image are numerous and thus the electronic processing capacity and memory of the apparatus used in this technique are both necessarily relatively large, as is its power consumption.
SUMMARY OF THE INVENTION
Embodiments of the present invention aim to provide a robust technique for recognizing the shape of an object, such as a human hand, which requires relatively little electronic processing power and memory, and involves low power consumption, and which may therefore be suitable for applications which use wireless "smart camera" apparatus.
Smart cameras, i.e. cameras with built in processing capability, process locally the raw image data and send only keywords of information by wireless transmission, to a host system. The inventors found that this is more efficient for power consumption than broadcasting live video to an analyzing host computer.
In accordance with the invention an image is electronically scanned line-byline. The lines of pixels are processed and pixels which qualify according to both tonal and positional criteria are selected as contour points. A set of contour points is compared with a stored reference to determine the nature of the image.
According to one aspect of the present invention there is provided image recognition apparatus comprising an image sensor, a first electronic processor, a second electronic processor, and a memory, - wherein the first electronic processor is a parallel video processor comprising a plurality of line memories,
- wherein the first electronic processor sweeps horizontally line by line an image sensed by the image sensor,
- each line of pixels is processed and stored in one of the line memories, - and wherein each pixel in a line stored in a line memory is first compared with a qualification criterion based upon its tone, and, for each line after the first line in which a tonally qualified pixel was detected, the first and last qualifying pixels on each line are compared with positional criteria, relative to tonally qualified pixels on a previous line,
- pixels which qualify according to both their tone and position are selected as contour points, and wherein the apparatus is arranged to store contour points, and a set of the stored contour points is processed by the second electronic processor which compares them with stored information to determine the nature of the image as described by the contour points. The sensor registers the image comprising a plurality of pixels organized in a line by line basis. The image is communicated to the first electronic processor. The image is horizontally swept line by line by the first electronic processor in order to detect the presence of the hand in the image based on the tone of the pixels. For each of the lines, when the tonally qualified pixel is detected, the first and last qualifying pixels are validated as the contour points. The points of the contour are further provided to the second electronic processor that compares the received contour points with the stored information in order to determine the nature of the image as described by the contour points.
The advantage of such image recognition apparatus is that it does require just the most significant points of the contour in order to determine the nature of the image. The determination of the contour points takes advantage of the line organization of the memory and of the parallel video processor by processing the received image in line by line manner. Since the apparatus processes the image line by line in a parallel way, power consumption can be kept to a minimum. In a preferred arrangement the apparatus comprises a wireless camera, as the image sensor, with embedded first and second electronic processors.
Using a wireless camera as the image sensor has a lot of advantages as it makes positioning of the camera independent from availability of the power supply in the vicinity of the camera. In a preferred arrangement the tonal criterion qualification of a pixel is that the pixel has a value within a predetermined range of values, which is indicative of a skin tone, in UV color space. Such tonal criterion qualification is a simple and convenient way of determining the presence of the hand in the image.
In a preferred arrangement the apparatus is arranged to recognize images of a human hand.
In a preferred arrangement the camera comprises at least one filter. This for the purpose of filtering skin color pixels.
The invention also provides a method of electronically identifying an image, the method comprising the steps of: -sweeping the image horizontally as a number of lines;
-for each swept line of the image, detecting which pixels qualify according to a predetermined tonal criterion;
- for each line after the first in which a tonally qualified pixel was detected, determining which are the minimum and maximum qualifying pixels in terms of their position on the line;
- comparing qualified minimum and maximum pixels with counterparts on a preceding line to determine which pixels qualify according to a positional criterion;
- storing pixels, which are qualified according to both tonal and positional criteria, in a memory as contour points; and - processing a set of contour points by comparing them with stored contours to identify the nature of the image.
In a preferred arrangement the first contour point is taken as the first detected pixel that qualifies according to the tonal criterion. Preferably, a pixel is considered to meet the positional criterion when a difference in its position, as compared with a corresponding maximum or minimum value pixel from a preceding line, falls within a predetermined range.
The method comprises a method of recognizing an image of a human hand, wherein a tonally qualified pixel may be one for which its value lies within a predetermined range of values of UV color space.
Preferably the method comprises comparing contours with a set of stored contours each of which corresponds to a different hand shape or sign.
The method comprises determining a sequence of hand shapes or signs in order to identify a hand gesture.
BRIEF DESCRIPTION OF DRAWINGS
A preferred embodiment of the present invention will now be described by way of example only with reference to the accompanying drawings in which: Figure 1 is a schematic view of image recognition apparatus according to an embodiment of the present invention;
Figure 2 is a schematic flow diagram showing a method of acquiring contour points from a scanned image;
Figures 3a and 3b show schematically examples of a line-scan technique for different images;
Figure 4 shows an image contour derived from the image scan shown in Figure 3 a, and
Figure 5 shows a plurality of images and their corresponding image contours, derived by a technique according to the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENT
The embodiment described herein uses a wireless "smart camera" to detect images of a human hand. Power consumption must be kept to a minimum to prolong battery life. Accordingly, a parallel processing architecture is used since this keeps to a minimum the number of memory accesses, the clock speed and the decoding of instructions. Processing the image data using a parallel video processor in the wireless camera is more power efficient than transmitting raw captured data to a fixed device. The smart camera consists of basically four components, one or two image sensors, an SIMD (Single Instruction Multiple Data) processor for low level image processing, a general purpose processor for intermediate and high level processing and control, and a communication module. Both processors are coupled using a dual ported RAM that enables them to work in a shared workspace at their own processing speed.
Figure 1 shows schematically the architecture of an embodiment of image recognition apparatus according to the invention. In this case the apparatus comprises a smart camera, shown generally at 10. The apparatus comprises a sensor 12, a video processor 14, a general processor 16, a dual ported RAM (DPRAM) 18, an EEPROM 20, a wireless communication subsystem 22, an inter-chip control device (I2C) 24, connecting the video processor 14 and the general processor 16, DPRAM buses 26 and 28, an EEPROM bus 30, and UART (or other serial alternative) bus 32.
The video processor 14 comprises a linear processor array (LPA) and a plurality of line memories (not shown).
The video processor is a parallel processor and may be realized by an IC3D, which is a member of the Philips Xetal family of SIMD processors. As the general processor an Atmel 8051 device may be used.
The heart of the video processor 14 is formed by the linear processor array (LPA) with 320 Reduced Instruction Set Computer (RISC) processors. Each of these processors has simultaneous read and write access within one clock cycle to memory positions in the LPA. Both of the memory address and the instruction of the processors are shared in SIMD sense. Each processor in the LPA can also read the memory data of its left and right neighbors directly. At the extremes of the linear array the inputs of these processors are optionally coupled or mirrored. The LPA processors can handle up to 64 instructions ranging from arithmetic and single cycle multiply-accumulate to compound instructions. In addition to these there are conditional guarding instructions enabling data dependent operations. Data paths are 10 bits wide. Each processor has two word registers and a flag register.
The line memories in the video processor store 64 lines of 3,200 bits. Pixels of the image lines are placed in an interlaced way on this memory.
The peak pixel performance of the video processor is around 50 GOPS (= Giga Operations Per Second). Despite its high pixel performance the device is inherently a low power processor as not only instruction decoding is shared between all 320 processing elements, but also memory access is on ultra wide memory words that contain complete image lines instead of energy consuming access to multiple-pixel- wide memory locations. For typical applications such as feature finding, the power consumption is measured to be below 100m W in active processing modes.
Programs for the video processor 14 are stored in the EEPROM and can be uploaded from the general processor 16 via the I2C 24. The general processor 16 can load a program into the video processor for a specific task that has to be carried out for an image.
The software for the device 10 consists of three parts that are almost independently developed. Programs for the video (parallel) processor 14 are written in a C++ language with implicit parallel data types. All programs are written in a line-based manner where complete image lines are processed in single clock cycle instructions. By guarding constructions, data adaptive software structures can be implemented. Typical functions, which can run on this processor, are image improvement, motion analysis, object detection and tracking algorithms. The programs on the general processor 16 are dedicated to keep track of the object data over time. The general processor performs the host function (running the operating system) and can decide to transmit events to a host system via the communications subsystem 22.
The purpose of the video processor program is, in this embodiment, to detect "contour points" of a hand and to store them in the DPRAM.
The video processor receives information from one or two VGA sensors 12 (only one in this embodiment) on four channels with a YUYV format (depicted by the element 34 in the figure). Also other formats are possible for communicating the image from the sensor to the video processor. The first step consists in filtering skin color pixels. Low- pass or median filters with appropriate thresholds are employed in order to remove noise from the detected image because for the next step a very robust detection with minimal noise is required. In this embodiment the video processor is an SIMD processor so it can process the image only on a complete line and not pixel by pixel. After detecting a hand, a line sweep technique is used to build an object contour, which in this embodiment is a hand contour. The technique involves sweeping a horizontal line across the image, keeping track of certain data, and performing certain actions every time a certain event is encountered during the line sweep. Pixels which qualify as contour points are stored in the DPRAM 18. The contour thus derived is then analyzed by the general processor 16, by comparison with stored reference contours, in order to determine the nature of the object, or in this particular case to determine the nature of a hand sign. When pixels are processed initially by the processor 14 a determination is made as to which pixels qualify, due to their tone, as image pixels (i.e. of the object in question). In the present embodiment the tone which is of interest is a skin tone, and accordingly pixels whose value falls within a predetermined range of values appropriate to skin color in UV space-space are selected as pixels which qualify tonally.
Figure 2 shows schematically a method of building a set of contour points defining the image, according to this embodiment of the invention.
At a step 100, a line of pixels is read into the processor 14 and at step 110 a determination is made as to whether the line of pixels contains a tonally qualified pixel (in this case a skin tone pixel).
If no tonally qualified pixel is detected then the next line of pixels is read into the processor.
If a tonally qualified pixel is detected, a determination is made at step 120 as to whether the pixel is the first detected said pixel. If so the pixel with the corresponding coordinates is stored as a contour point at step 160.
If the pixel is not the first such pixel then at step 130 the left most (MinX) and right most (MaxX) tonally qualified pixels on the line are obtained.
At step 135 a noise reduction process is performed. At step 140 these pixels (MinX and MaxX) are compared with their counterparts from the previously considered line.
At step 150 a determination is made as to whether the pixels qualify positionally.
If neither of the pixels meets the positional qualification criteria then the next line of pixels is read into the processor 14 at step 100.
If either or both of the pixels meet the positional qualification criteria then it or they are stored as a contour point at step 160 and the next line of pixels is read in.
Figure 3 shows an example of the line sweep technique and contour points. In Figure 3 a the line begins sweeping from the top of the screen (Line=Line 0) and moves down keeping track of MinX and MaxX values where MinX and MaxX are the minimum and maximum X coordinates of tonally qualified pixels in the line (i.e. skin tone pixels in the line). For the first contour point Ei (X1, Yi) Line = Yi and MinX == MaxX == X1. For the second point E2 (X2, Y2) Line = Y2 and MinX == Xi and MaxX == X2 and so on. The table below shows an example of the minimum and maximum values of X for each of the contour points.
Figure imgf000009_0001
If . MinX = MinX - Current MinX and . MaxX = MaxX - Current MaxX where Current MaxX and Current MinX are respectively the maximum and minimum values of X coordinates for skin tone pixels, then a contour point is generated only when . MinX is within a predetermined range ( . i, .2) or . MaxX is within the predetermined range ( . l, . 2). The reason for imposing this positional qualification criteria is that pixels can be ignored when they are too close to contour points on a preceding line. Such pixels may, for example, indicate merely the curvature of the hand and are not needed to form the uniquely identifying contour of the hand. This is with the exception of the first contour point E. Contour point Ei is generated for the very first skin tone pixel to be detected. Using this approach a contour C (Ei ... En) can be built.
Figure 4 shows the contour derived from the five finger hand sign depicted in Figure 3. Figure 5 shows a number of other hand signs and their contours, which may be derived in the above described manner.
The most important thing is that for every hand sign shown in Figure 5, its contour is distinguished from all other contours.
The X and Y ratio describe every contour. For example in Figure 4, if Y5 is greater than Y4 then this is a right hand, otherwise it is a left hand. Different hand signs correspond to different X and Y ratios.
A sequence of hand shapes may be used to determine a hand gesture. While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments.
Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Claims

CLAIMS:
1. Image recognition apparatus (10) comprising an image sensor (12), a first electronic processor (14), a second electronic processor (16), and a memory, wherein the first electronic processor (14) is a parallel video processor comprising a plurality of line memories; - wherein the first electronic processor is arranged to sweep horizontally line by line an image sensed by the image sensor (12),
- each line of pixels is processed by the video processor and stored in one of the line memories,
- and wherein each pixel in a line stored in a line memory is first compared with a qualification criterion based upon its tone, and, for each line after the first line in which a tonally qualified pixel was detected, the first and last qualifying pixels on each line are compared with positional criteria, relative to tonally qualified pixels on a previous line,
- pixels which qualify according to both their tone and position are selected as contour points, - and wherein the apparatus is arranged to store contour points, and a set of the stored contour points is processed by the second electronic processor which compares them with stored information to determine the nature of the image as described by the contour points.
2. Image recognition apparatus according to Claim 1 comprising a wireless camera as the image sensor (12), with embedded first (14) and second (16) electronic processors.
3. Image recognition apparatus according to Claim 1 or Claim 2, wherein the tonal criterion qualification of a pixel is that the pixel has a value within a predetermined range of values, which is indicative of a skin tone, in UV color space.
4. Image recognition apparatus according to Claim 3 wherein the apparatus is arranged to recognize images of a human hand.
5. Image recognition apparatus according to any of the preceding claims, wherein the sensor comprises at least one filter.
6. A method of electronically identifying an image, the method comprising the steps of : electronically sweeping the image horizontally as a number of lines; for each swept line of the image, detecting which pixels qualify according to predetermined tonal criteria; for each line after the first in which a tonally qualified pixel was detected, determining which are the minimum and maximum qualifying pixels in terms of their position on the line; comparing qualified maximum and minimum pixels with counterparts on a preceding line to determine which pixels qualify according to a positional criterion; storing pixels, which are qualified according to both tonal and positional criteria, in a memory as contour points; and processing a set of contour points by comparing them with stored contours to identify the nature of the image.
7. A method according to Claim 6, wherein the first contour point is taken as the first detected pixel that qualifies according to the tonal criterion.
8. A method according to Claim 6 or Claim 7 wherein a pixel is considered to meet the positional criterion when a difference in its position, as compared with a corresponding maximum or minimum value pixel from a preceding line, falls within a predetermined range.
9. A method according to any of the Claims 6 to 8 wherein a tonally qualified pixel is one for which its value lies within a predetermined range of values of UV color space.
10. A method according to Claim 9 wherein the method comprises a method of recognizing an image of a human hand.
11. A method according to any of Claims 6 to 10, wherein the method further comprises comparing contours with a set of stored contours each of which corresponds to a different hand shape or sign.
12. A computer program comprising instructions for enabling a computer to perform a method according to any of Claims 6 to 11.
13. A computer-readable storage medium having recorded thereon a computer program according to Claim 12.
PCT/IB2007/054412 2006-11-02 2007-10-31 Hand gesture recognition by scanning line-wise hand images and by extracting contour extreme points WO2008053433A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009535171A JP2010509651A (en) 2006-11-02 2007-10-31 Image recognition apparatus and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP06123366 2006-11-02
EP06123366.4 2006-11-02

Publications (2)

Publication Number Publication Date
WO2008053433A2 true WO2008053433A2 (en) 2008-05-08
WO2008053433A3 WO2008053433A3 (en) 2009-03-19

Family

ID=39344681

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/054412 WO2008053433A2 (en) 2006-11-02 2007-10-31 Hand gesture recognition by scanning line-wise hand images and by extracting contour extreme points

Country Status (3)

Country Link
JP (1) JP2010509651A (en)
CN (1) CN101536032A (en)
WO (1) WO2008053433A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9922241B2 (en) 2011-12-01 2018-03-20 Nokia Technologies Oy Gesture recognition method, an apparatus and a computer program for the same
US10614332B2 (en) 2016-12-16 2020-04-07 Qualcomm Incorportaed Light source modulation for iris size adjustment
US10984235B2 (en) 2016-12-16 2021-04-20 Qualcomm Incorporated Low power data generation for iris-related detection and authentication
US11068712B2 (en) 2014-09-30 2021-07-20 Qualcomm Incorporated Low-power iris scan initialization

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8761448B1 (en) * 2012-12-13 2014-06-24 Intel Corporation Gesture pre-processing of video stream using a markered region
JP5886809B2 (en) * 2013-09-27 2016-03-16 富士重工業株式会社 Outside environment recognition device
US9838635B2 (en) * 2014-09-30 2017-12-05 Qualcomm Incorporated Feature computation in a sensor element array
KR101774549B1 (en) 2016-06-28 2017-09-21 주식회사 팀엘리시움 Apparatus for recognizing menstrual blood point, method thereof and computer recordable medium storing the method
CN111563477A (en) * 2020-05-21 2020-08-21 苏州沃柯雷克智能系统有限公司 Method, device, equipment and storage medium for acquiring qualified hand photos

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050063564A1 (en) * 2003-08-11 2005-03-24 Keiichi Yamamoto Hand pattern switch device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050063564A1 (en) * 2003-08-11 2005-03-24 Keiichi Yamamoto Hand pattern switch device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KLEIHORST R P ET AL: "Xetal: a low-power high-performance smart camera processor" ISCAS 2001. PROCEEDINGS OF THE 2001 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS. SYDNEY, AUSTRALIA, MAY 6 - 9, 2001; [IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS], NEW YORK, NY : IEEE, US, vol. 5, 6 May 2001 (2001-05-06), pages 215-218, XP010542070 ISBN: 978-0-7803-6685-5 *
KULKARNI ASHOK V: "Sequential shape feature extraction from line drawings" PRIP. PATTERN RECOGNITION AND INFORMATION PROCESSING, XX, XX, 1 January 1978 (1978-01-01), pages 230-237, XP008089498 *
NATASHA MOHANTY ET AL: "Learning Shapes for Image Classification and Retrieval" IMAGE AND VIDEO RETRIEVAL; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, vol. 3568, 4 August 2005 (2005-08-04), pages 589-598, XP019012855 ISBN: 978-3-540-27858-0 *
O'HAGAN R ET AL: "Visual gesture interfaces for virtual environments" USER INTERFACE CONFERENCE, 2000. AUIC 2000. FIRST AUSTRALASIAN CANBERRA, ACT, AUSTRALIA 31 JAN.-3 FEB. 2000, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 31 January 2000 (2000-01-31), pages 73-80, XP010371183 ISBN: 978-0-7695-0515-2 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9922241B2 (en) 2011-12-01 2018-03-20 Nokia Technologies Oy Gesture recognition method, an apparatus and a computer program for the same
US11068712B2 (en) 2014-09-30 2021-07-20 Qualcomm Incorporated Low-power iris scan initialization
US10614332B2 (en) 2016-12-16 2020-04-07 Qualcomm Incorportaed Light source modulation for iris size adjustment
US10984235B2 (en) 2016-12-16 2021-04-20 Qualcomm Incorporated Low power data generation for iris-related detection and authentication

Also Published As

Publication number Publication date
WO2008053433A3 (en) 2009-03-19
JP2010509651A (en) 2010-03-25
CN101536032A (en) 2009-09-16

Similar Documents

Publication Publication Date Title
WO2008053433A2 (en) Hand gesture recognition by scanning line-wise hand images and by extracting contour extreme points
CN101443784B (en) Fingerprint preview quality and segmentation
JP5739522B2 (en) System and method for processing image data relating to the focus of attention within a whole image
US9189670B2 (en) System and method for capturing and detecting symbology features and parameters
US9451142B2 (en) Vision sensors, systems, and methods
US7203347B2 (en) Method and system for extracting an area of interest from within a swipe image of a biological surface
US20090097704A1 (en) On-chip camera system for multiple object tracking and identification
US9958961B2 (en) Optical pointing system
US20060245652A1 (en) Method for recognizing objects in an image without recording the image in its entirety
US11908227B2 (en) Method and device for reference imaging and optical object recognition
CN101794450B (en) Method and device for detecting smoke in video image sequence
CN1042981C (en) Reader for symbolic information
US9313412B2 (en) Image sensor and operating method thereof
CN111797715A (en) Parking space detection method and device, electronic equipment and storage medium
US20100008542A1 (en) Object detection method and apparatus
US10380463B2 (en) Image processing device, setting support method, and non-transitory computer-readable media
CN113033551A (en) Object detection method, device, equipment and storage medium
US9123131B2 (en) Parallel correlation method and correlation apparatus using the same
US20230328368A1 (en) Signal processing device, imaging device, and signal processing method
US11468703B2 (en) Method, storage media and device for biometric identification driving
CN110929093B (en) Method, apparatus, device and medium for search control
Sakurai et al. Overtaking vehicle detection method and its implementation using IMAPCAR highly parallel image processor
CN115439772A (en) Video analysis method and device, computer equipment and storage medium
US20130044911A1 (en) Particle filter
CN113449580A (en) Method for performing fingerprint sensing, electronic module and computing device

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780040994.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07826925

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2007826925

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2009535171

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 3020/CHENP/2009

Country of ref document: IN