JP4153818B2 - Gesture recognition device, gesture recognition method, and gesture recognition program - Google Patents

Gesture recognition device, gesture recognition method, and gesture recognition program Download PDF

Info

Publication number
JP4153818B2
JP4153818B2 JP2003096271A JP2003096271A JP4153818B2 JP 4153818 B2 JP4153818 B2 JP 4153818B2 JP 2003096271 A JP2003096271 A JP 2003096271A JP 2003096271 A JP2003096271 A JP 2003096271A JP 4153818 B2 JP4153818 B2 JP 4153818B2
Authority
JP
Japan
Prior art keywords
posture
gesture
position
hand
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2003096271A
Other languages
Japanese (ja)
Other versions
JP2004302992A (en
Inventor
貴通 嶋田
信男 檜垣
Original Assignee
本田技研工業株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 本田技研工業株式会社 filed Critical 本田技研工業株式会社
Priority to JP2003096271A priority Critical patent/JP4153818B2/en
Priority claimed from DE200460006190 external-priority patent/DE602004006190T8/en
Publication of JP2004302992A publication Critical patent/JP2004302992A/en
Application granted granted Critical
Publication of JP4153818B2 publication Critical patent/JP4153818B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a gesture recognition device capable of reducing the amount of calculation required for posture recognition processing or gesture recognition processing. <P>SOLUTION: The gesture recognition device 4 is provided with: a face and finger position detection means 41 for detecting a face position and finger positions in the real space of a target person; and a posture/gesture recognition means 42 for recognizing the posture or gesture of the target person on the basis of the face position and finger positions detected by the face and finger position detection means 41. The posture/gesture recognition means 42 detects the "relative positional relation between the face position and the finger positions" and "displacement of finger positions based on the face position" from "the face position in the real space" and "the finger positions in the real space" and compares the detected results with the posture data or the gesture data stored in the posture/gesture storage part 12A to recognize the posture or gesture of the target person. <P>COPYRIGHT: (C)2005,JPO&amp;NCIPI

Description

[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus, a method, and a program for recognizing a posture (posture) or a gesture (motion) of a target person from an image obtained by capturing the target person with a camera.
[0002]
[Prior art]
Conventionally, many gesture recognition methods have been proposed in which a point (feature point) indicating a motion characteristic of a target person is detected from an image of the target person captured by a camera, and the target person's gesture is estimated based on the feature point. (For example, refer to Patent Document 1).
[0003]
[Patent Document 1]
Japanese Unexamined Patent Publication No. 2000-149025 (page 3-6, FIG. 1)
[0004]
[Problems to be solved by the invention]
However, the conventional gesture recognition method has a problem that the amount of calculation required for the posture recognition process or the gesture recognition process increases because it is necessary to detect the feature points one by one when recognizing the gesture of the target person. .
[0005]
The present invention has been made in view of the above problems, and an object of the present invention is to provide a gesture recognition device that can reduce the amount of calculation required for posture recognition processing or gesture recognition processing.
[0006]
[Means for Solving the Problems]
The gesture recognition apparatus according to claim 1 is an apparatus for recognizing a posture or a gesture of the target person from an image obtained by capturing the target person with a camera, and the contour information of the target person generated from the captured image. And a face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the skin color area information, and the relative position between the face position and the hand position from the face position and the hand position. Of the hand position when the positional relationship and the face position are used as a reference, and the detection result and the relative positional relationship between the face position and the hand position and the hand position when the face position is used as a reference A posture for recognizing the posture or gesture of the target person by comparing the posture data or gesture data describing the posture or gesture corresponding to the change in position. Provided with a turbocharger, gesture recognition means, the The posture / gesture recognizing unit sets a determination area having a size that allows the hand of the target person to enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, Distinguish a posture or gesture in which the relative positional relationship between the face position and the hand position is similar It is characterized by that.
[0007]
In this apparatus, first, the face / hand position detection means detects the face position and hand position of the target person in the real space based on the contour information and skin color area information of the target person generated from the image. Next, the posture / gesture recognition means detects “the relative positional relationship between the face position and the hand position from the face position and the hand position” and “the fluctuation of the hand position with respect to the face position”. Then, posture data or gesture data in which the detection result and the posture or gesture corresponding to “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position when the face position is used as a reference” are described. To recognize the posture or gesture of the target person.
[0008]
The “relative positional relationship between the face position and the hand position” checked by the posture / gesture recognition means specifically refers to “the height of the face position and the hand position” and “the camera of the face position and the hand position”. It is a "distance from" (Claim 2). With this configuration, by comparing the “face position height” with the “hand position height”, and the “face position camera distance” with the “hand position camera distance”, The “relative positional relationship between the face position and the hand position” can be easily detected. Further, the posture / gesture recognition means can detect “the relative positional relationship between the face position and the hand position” by examining “the horizontal shift of the face position and the hand position on the image”.
[0009]
The posture / gesture recognition means may use a pattern matching method when recognizing the posture or gesture of the target person (claim 3). With this configuration, an “input pattern” composed of “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position with respect to the face position” is stored in advance. It is possible to easily recognize the posture or gesture of the target person by searching for the most similar pattern by superimposing the posture data or the gesture data.
[0010]
In addition, the posture / gesture recognition unit sets a determination area of a size in which the hand of the target person can enter, and compares the area of the hand with the area of the determination area, thereby comparing the relative position between the face position and the hand position. Differentiate postures or gestures that have similar relationships ( Claim 1 ). With this configuration, for example, both the hand position height is lower than the face position height, and the distance from the camera to the hand position is shorter than the distance from the camera to the face position. "(See FIG. 9D) and the gesture" COMOMEHER "(see FIG. 10C) can be distinguished. Specifically, when the area of the hand is larger than ½ of the area of the determination circle that is the determination region, it is determined as “COMEHERE”, and the area of the hand is equal to or less than ½ of the area of the determination circle. In this case, it is discriminated by determining that it is “HANDSHAKE”.
[0011]
Claim 4 The gesture recognition method according to claim 1 is a method for recognizing a posture or a gesture of the target person from an image obtained by capturing the target person with a camera, and the contour information and skin color area information of the target person generated from the image Based on the face position and hand position of the target person in real space By face / hand position detection means From the face / hand position detection step to be detected, and the relative position relationship between the face position and the hand position and the face position by the posture / gesture recognition means based on the face position and the hand position. Posture data that detects fluctuations of the hand position and describes the detection result, the relative positional relationship between the face position and the hand position, and the posture or gesture corresponding to the movement of the hand position when the face position is used as a reference, or A gesture / gesture recognition step for recognizing a posture or gesture of the target person by comparing with gesture data, and the posture / gesture recognition step includes: The posture / gesture recognition means sets a determination area of a size that allows the hand of the target person to enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area. Distinguish between postures and gestures in which the relative positional relationship between the face position and the hand position is similar It is characterized by that.
[0012]
In this method, first, in the face / hand position detection step, the face position and hand position in the real space of the target person are detected based on the contour information and skin color area information of the target person generated from the image. Next, in the posture / gesture recognition step, “the relative positional relationship between the face position and the hand position from the face position and the hand position” and “the fluctuation of the hand position with respect to the face position” are detected. Then, posture data or gesture data in which the detection result and the posture or gesture corresponding to “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position when the face position is used as a reference” are described. To recognize the posture or gesture of the target person.
[0013]
Claim 5 In order to recognize the posture or gesture of the target person from an image obtained by capturing the target person with a camera, the gesture recognition program described in (1) is applied to the contour information and skin color area information of the target person generated from the image. A face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the relative position relationship between the face position and the hand position from the face position and the hand position; and The variation in the hand position when the face position is used as a reference is detected, and the detection result, the relative positional relationship between the face position and the hand position, and the change in the hand position when the face position is used as a reference are dealt with. Posture for recognizing the posture or gesture of the target person by comparing with posture data or gesture data describing the posture or gesture Gesture recognition means, to function as, The posture / gesture recognizing means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, thereby Differentiate postures or gestures in which the relative positional relationship between the face position and the hand position is similar. It is characterized by that.
[0014]
This program first detects the face position and hand position of the target person in the real space based on the contour information and skin color area information of the target person generated from the image by the face / hand position detection means. Next, the posture / gesture recognition means detects “the relative positional relationship between the face position and the hand position from the face position and the hand position” and “the fluctuation of the hand position with respect to the face position”. Then, posture data or gesture data in which the detection result and the posture or gesture corresponding to “the relative positional relationship between the face position and the hand position” and “the fluctuation of the hand position when the face position is used as a reference” are described. To recognize the posture or gesture of the target person.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate. Here, first, the configuration of a gesture recognition system including a gesture recognition device according to the present invention will be described with reference to FIGS. 1 to 19, and then the operation of the gesture recognition system will be described with reference to FIGS. 20 and 21. To do.
[0016]
(Configuration of gesture recognition system A)
First, an overall configuration of a gesture recognition system A including a gesture recognition device 4 according to the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing the overall configuration of the gesture recognition system A.
[0017]
As shown in FIG. 1, the gesture recognition system A analyzes two cameras 1 (1a, 1b) that capture a target person (not shown) and an image (captured image) captured by the camera 1 to obtain various information. A captured image analysis device 2 to be generated, a contour extraction device 3 that extracts the contour of the target person based on various information generated by the captured image analysis device 2, various information generated by the captured image analysis device 2, and a contour The gesture recognition device 4 is configured to recognize a posture (posture) or a gesture (motion) of the target person based on the contour (outline information) of the target person extracted by the extraction device 3. Hereinafter, the camera 1, the captured image analysis device 2, the contour extraction device 3, and the gesture recognition device 4 will be described in order.
[0018]
(Camera 1)
The camera 1 (1a, 1b) is a color CCD camera, and the right camera 1a and the left camera 1b are arranged side by side by a distance B on the left and right. Here, the right camera 1a is used as a reference camera. Images captured by the cameras 1a and 1b (captured images) are stored in a frame grabber (not shown) for each frame, and then input in synchronization with the captured image analysis apparatus 2.
[0019]
Note that images (captured images) captured by the cameras 1a and 1b are input to the captured image analysis apparatus 2 after performing calibration processing and rectification processing by a correction device (not shown) and correcting the images.
[0020]
(Captured image analysis device 2)
The captured image analysis apparatus 2 is an apparatus that analyzes images (captured images) input from the cameras 1a and 1b and generates “distance information”, “motion information”, “edge information”, and “skin color area information”. Yes (see FIG. 1).
[0021]
FIG. 2 is a block diagram illustrating configurations of the captured image analysis device 2 and the contour extraction device 3 included in the gesture recognition system A illustrated in FIG. As illustrated in FIG. 2, the captured image analysis apparatus 2 includes a distance information generation unit 21 that generates “distance information”, a motion information generation unit 22 that generates “motion information”, and an edge that generates “edge information”. The information generation unit 23 includes a skin color region information generation unit 24 that generates “skin color region information”.
[0022]
(Distance information generator 21)
The distance information generation unit 21 detects the distance from the camera 1 for each pixel based on the parallax between the two captured images captured by the cameras 1a and 1b at the same time. Specifically, the parallax is obtained from the first captured image captured by the camera 1a as the reference camera and the second captured image captured by the camera 1b using the block correlation method, and the trigonometric method is obtained from the parallax. Is used to determine the distance from the camera 1 to the “object captured at each pixel”. Then, the obtained distance is associated with each pixel of the first captured image, and a distance image D1 (see FIG. 3A) in which the distance is expressed by a pixel value is generated. This distance image D1 becomes distance information. In the example of FIG. 3A, the target person C exists at the same distance.
[0023]
In the block correlation method, the first captured image and the second captured image are compared with the same block (for example, 8 × 3 pixels) having a specific size between the first captured image and the second captured image. This is a method for detecting parallax by examining how many pixels the subject in the block is shifted.
[0024]
(Motion information generating unit 22)
The motion information generation unit 22 is based on the difference between the “captured image (t)” at “time t” and the “captured image (t + Δt)” at “time t + Δt” captured in time series by the camera 1a as the reference camera. Then, the movement of the target person is detected. Specifically, the difference between the “captured image (t)” and the “captured image (t + Δt)” is taken to examine the displacement of each pixel. Then, a displacement vector is obtained based on the examined displacement, and a difference image D2 (see FIG. 3B) in which the obtained displacement vector is represented by a pixel value is generated. This difference image D2 becomes motion information. In the example of FIG. 3B, a motion is detected on the left arm of the target person C.
[0025]
(Edge information generation unit 23)
The edge information generation unit 23 generates an edge image in which edges existing in the captured image are extracted based on the density information or color information of each pixel in the image (captured image) captured by the camera 1a as the reference camera. To do. Specifically, based on the luminance of each pixel in the captured image, a portion where the luminance greatly changes is detected as an edge, and an edge image D3 (see FIG. 3C) including only the edge is generated. This edge image D3 becomes edge information.
[0026]
For example, the Sobel operator is multiplied for each pixel, and a line segment having a predetermined difference from the adjacent line segment is detected as an edge (horizontal edge or vertical edge) in units of rows or columns. Note that the Sobel operator is an example of a coefficient row having a weighting factor for pixels in the vicinity of a certain pixel.
[0027]
(Skin color area information generation unit 24)
The skin color area information generation unit 24 extracts the skin color area of the target person existing in the captured image from the image (captured image) captured by the camera 1a that is the reference camera. Specifically, the RGB values of all the pixels in the captured image are converted into an HLS space consisting of hue, brightness, and saturation, and pixels whose hue, brightness, and saturation are within a preset threshold range are converted to a flesh-colored region. (See FIG. 3D). In the example of FIG. 3D, the face of the target person C is extracted as the skin color region R1, and the hand is extracted as the skin color region R2. The skin color areas R1 and R2 serve as skin color area information.
[0028]
The “distance information (distance image D1)”, “motion information (difference image D2)”, and “edge information (edge image D3)” generated by the captured image analysis device 2 are input to the contour extraction device 3. Further, “distance information (distance image D1)” and “skin color area information (skin color areas R1, R2)” generated by the captured image analysis apparatus 2 are input to the gesture recognition apparatus 4.
[0029]
(Outline extraction device 3)
The contour extraction device 3 is based on the “distance information (distance image D1)”, “motion information (difference image D2)”, and “edge information (edge image D3)” generated by the captured image analysis device 2. It is an apparatus which extracts the outline of (refer FIG. 1).
[0030]
As illustrated in FIG. 2, the contour extraction device 3 generates a target distance setting unit 31 that sets a “target distance” that is a distance where the target person exists, and a “target distance image” based on the “target distance”. From the target distance image generating unit 32, the target region setting unit 33 for setting the “target region” in “in the target distance image”, and the contour extracting unit 34 for extracting “the contour of the target person” from “in the target region” It is configured.
[0031]
(Target distance setting unit 31)
The target distance setting unit 31 includes a target person based on the distance image D1 (see FIG. 3A) generated by the captured image analysis device 2 and the difference image D2 (see FIG. 3B). Set the “target distance” which is the distance. Specifically, the pixels having the same pixel value in the distance image D1 are grouped (pixel group), and the pixel values of the pixel group in the difference image D2 are accumulated. Then, it is considered that a moving object with the largest amount of movement, that is, a target person exists in an area where the cumulative value of pixel values is larger than a predetermined value and is closest to the camera 1, and the distance is set as a target. The distance is used (see FIG. 4A). In the example of FIG. 4A, the target distance is set to 2.2 m. The target distance set by the target distance setting unit 31 is input to the target distance image generation unit 32.
[0032]
(Target distance image generation unit 32)
The target distance image generation unit 32 refers to the distance image D1 generated by the captured image analysis device 2 (see FIG. 3A), and applies the pixel existing in the target distance ± αm set by the target distance setting unit 31. A “target distance image” is generated by extracting corresponding pixels from the edge image D3 (see FIG. 3C). Specifically, a pixel corresponding to the target distance ± αm input from the target distance setting unit 31 in the distance image D1 is obtained. Then, only the obtained pixels are extracted from the edge image D3 generated by the edge information generation unit 23, and a target distance image D4 (see FIG. 4B) is generated. Therefore, the target distance image D4 is an image representing the target person existing at the target distance with an edge. The target distance image D4 generated by the target distance image generation unit 32 is input to the target region setting unit 33 and the contour extraction unit 34.
[0033]
(Target area setting unit 33)
The target area setting unit 33 sets a “target area” in the target distance image D4 (see FIG. 3B) generated by the target distance image generating unit 32. Specifically, a histogram H in which the pixel values in the vertical direction of the target distance image D4 are accumulated is generated, and the position where the frequency in the histogram H is maximum is specified as the center position in the horizontal direction of the target person C (FIG. 5). (See (a)). Then, a range of a specific size (for example, 0.5 m) is set as the target region T on the left and right of the specified center position (see FIG. 5B). The vertical range of the target region T is set to a specific size (for example, 2 m). When setting the target area T, the setting range of the target area T is corrected with reference to camera parameters such as the tilt angle and height of the camera 1. The target area T set by the target area setting unit 33 is input to the contour extraction unit 34.
[0034]
(Outline extraction unit 34)
The contour extraction unit 34 is generated by the target distance image generation unit 32. Target distance image D4 In FIG. 4B, the contour O of the target person C is extracted from the target area T set by the target area setting unit 33 (see FIG. 5C). Specifically, when extracting the contour O of the target person C, a method using a dynamic contour model composed of a closed curve called “Snakes” (hereinafter referred to as “snake method”) is used. The snake technique is a technique for extracting the contour of the object by contracting and deforming “Snakes”, which is a dynamic contour model, so that a predefined energy function is minimized. The contour O of the target person C extracted by the contour extraction unit 34 is input to the gesture recognition device 4 as “contour information” (see FIG. 1).
[0035]
Based on the “distance information” and “skin color area information” generated by the captured image analysis device 2 and the “contour information” generated by the contour extraction device 3, the gesture recognition device 4 performs the posture or gesture of the target person. Is a device that recognizes and outputs the recognition result (see FIG. 1).
[0036]
FIG. 6 is a block diagram showing a configuration of the gesture recognition device 4 included in the gesture recognition system A shown in FIG. As shown in FIG. 6, the gesture recognition device 4 includes a face / hand position detection unit 41 that detects a face position and a hand position in the real space of the target person C, and a face detected by the face / hand position detection unit 41. Posture / gesture recognition means for recognizing the posture or gesture of the target person based on the position and the hand position is provided.
[0037]
The face / hand position detection means 41 includes a head position detection unit 41A that detects the “head position (head position)” of the target person in real space, and the “face position (face position)” of the target person. A face position detecting unit 41B for detecting, a hand position detecting unit 41C for detecting the “hand position (hand position)” of the target person, and a hand position detecting unit for detecting the “hand position (hand position)” of the target person 41D. Here, the “hand” is a part made up of an arm and a hand, and the “hand” is a fingertip of the hand.
[0038]
(Head position detector 41A)
The head position detection unit 41 </ b> A detects the “head position” of the target person C based on the contour information generated by the contour extraction device 3. A method for detecting the position of the top of the head will be described with reference to FIG. Next, an area (top position search area) F1 for searching for the top position is set (2). The lateral width (width in the X-axis direction) of the top position search area F1 is set to a preset average shoulder width W of a human with the X coordinate of the center of gravity G as the center. The average shoulder width W of the human is set with reference to the distance information generated by the captured image analysis device 2. Further, the vertical width (width in the Y-axis direction) of the top position search area F1 is set to a width that can cover the contour O. Then, the upper end point of the contour O in the parietal position search area F1 is set as the parietal position m1 (3). The top position m1 detected by the head position detection unit 41A is input to the face position detection unit 41B.
[0039]
(Face position detection unit 41B)
The face position detection unit 41B detects the “face position” of the target person C based on the top position m1 detected by the head position detection unit 41A and the skin color area information generated by the captured image analysis device 2. . The face position detection method will be described with reference to FIG. 7B. First, an area (face position search area) F2 for searching for a face position is set (4). The range of the face position search area F2 is set to a preset “approximate size covering a human head” with reference to the top position m1. The range of the face position search area F2 is set with reference to the distance information generated by the captured image analysis device 2.
[0040]
Next, the center of gravity of the skin color area R1 in the face position search area F2 is set as the face position m2 on the image (5). For the skin color region R1, the skin color region information generated by the captured image analysis device 2 is referred to. Then, the face position m2t (Xft, Yft, Zft) in the real space is obtained from the face position m2 (Xf, Yf) on the image with reference to the distance information generated by the captured image analysis apparatus 2.
[0041]
The “face position m2 on the image” detected by the face position detection unit 41B is input to the hand position detection unit 41C and the hand position detection unit 41D. Further, the “face position m2t in the real space” detected by the face position detection unit 41B is stored in a storage unit (not shown), and the target is detected by the posture / gesture recognition unit 42B (see FIG. 6) of the posture / gesture recognition unit 42. Used when recognizing the posture or gesture of the person C.
[0042]
(Hand position detector 41C)
The hand position detection unit 41 </ b> C detects the “hand position” of the target person C based on the skin color area information generated by the captured image analysis device 2 and the contour information generated by the contour extraction device 3. Here, the skin color area information uses information of an area excluding the periphery of the face position m2. The hand position detection method will be described with reference to FIG. 8A. First, an area (hand position search area) F3 (F3R, F3L) for searching for a hand position is set (6). The hand position search area F3 is set to a preset “range within which the hand can be reached (range where the left and right hands can reach)” with reference to the face position m2 detected by the face position detection unit 41B. Note that the size of the hand position search area F3 is set with reference to the distance information generated by the captured image analysis apparatus 2.
[0043]
Next, the center of gravity of the skin color area R2 in the hand position search area F3 is set as the hand position m3 on the image (7). For the skin color region R2, the skin color region information generated by the captured image analysis device 2 is referred to. Here, the skin color area information uses information of an area excluding the periphery of the face position m2. In the example of FIG. 8A, since the skin color area exists only in the hand position search area F3 (L), the hand position m3 is detected only in the hand position search area F3 (L). In the example of FIG. 8A, the target person is wearing long-sleeved clothes and is only exposed beyond the wrist, so the position of the hand (HAND) is the hand position m3. The “hand position m3 on the image” detected by the hand position detection unit 41C is input to the hand position detection unit 41D.
[0044]
(Hand position detector 41D)
The hand position detector 41D detects the “hand position” of the target person C based on the face position m2 detected by the face position detector 41B and the hand position m3 detected by the hand position detector 41C. The method for detecting the hand position will be described with reference to FIG. 8B. First, an area (hand position search range) F4 for searching for the hand position is set in the hand position search area F3L (8). The hand position search range F4 is set to a preset “approximate size to cover the hand” with the hand position m3 as the center. The range of the hand position search range F4 is set with reference to the distance information generated by the captured image analysis device 2.
[0045]
Subsequently, the upper, lower, left and right end points m4a to m4d of the skin color region R2 in the hand position search range F4 are detected (9). For the skin color region R2, the skin color region information generated by the captured image analysis device 2 is referred to. Then, the vertical distance d1 between the upper and lower end points (between m4a and m4b) and the horizontal direction distance d2 between the left and right end points (between m4c and m4d) are compared, and the longer distance is determined as the direction in which the hand extends. (10). In the example of FIG. 8B, since the vertical distance d1 is longer than the horizontal distance d2, it is determined that the hand extends in the vertical direction.
[0046]
Next, based on the positional relationship between the face position m2 on the image and the hand position m3 on the image, which of the upper and lower end points m4a and m4b (or the left and right end points m4c and m4d) is the tip position is determined. to decide. Specifically, when the hand position m3 is far from the face position m2, the hand is regarded as extending, and the end point far from the face position m2 is determined as the hand position (hand position on the image) m4. Conversely, if the hand position m3 is close to the face position m2, it is considered that the elbow is bent, and the end point closer to the face position m2 is determined as the hand position m4. In the example of FIG. 8B, since the hand position m3 is far from the face position m2 and the upper end point m4a is farther from the face position m2 than the lower end point m4b, it is determined that the upper end point m4a is the hand position m4 (11). .
[0047]
Then, the hand position m4t (Xht, Yht, Zht) in the real space is obtained from the hand position m4 (Xh, Yh) on the image with reference to the distance information generated by the captured image analysis apparatus 2. The “hand position m4t in real space” detected by the hand position detection unit 41D is stored in a storage unit (not shown), and the target person C is detected by the posture / gesture recognition unit 42B (see FIG. 6) of the posture / gesture recognition unit 42. Used when recognizing a posture or gesture.
[0048]
(Posture / gesture recognition means 42)
The posture / gesture recognition unit 42 includes a posture / gesture data storage unit 42A for storing posture data and gesture data, and “face position m2t in real space” and “in real space” detected by the face / hand position detection unit 41. And a posture / gesture recognition unit 42B that recognizes the posture or gesture of the target person based on the “hand position m4t” (see FIG. 6).
[0049]
(Posture / gesture data storage unit 42A)
The posture / gesture data storage unit 42A stores posture data P1 to P6 (see FIG. 9) and gesture data J1 to J4 (see FIG. 10). The posture data P1 to P6 and the gesture data J1 to J4 are postures corresponding to “the relative positional relationship between the face position and the hand position in the real space” and “the fluctuation of the hand position with respect to the face position”. Alternatively, it is data describing a gesture. The “relative positional relationship between the face position and the hand position” specifically refers to “the height of the face position and the hand position” and “the distance of the face position and the hand position from the camera 1”. That is. Posture / gesture recognition means 42 Can also detect “the relative positional relationship between the face position and the hand position” by examining “the horizontal shift of the face position and the hand position on the image”. The posture data P1 to P6 and the gesture data J1 to J4 are used when the posture or gesture of the target person is recognized by the posture / gesture recognition unit 42B.
[0050]
With reference to FIG. 9, the posture data P1 to P6, shown in FIG. 9 (a) "Posture P1: FACE SIDE" is "Hello", shown in FIG. 9 (b) "Posture P2: HIGH HAND" is " 9 is “stop”, “posture P4: HANDSHAKE” shown in FIG. 9D is “handshake”, and “posture P5” shown in FIG. 9 (e). : SIDE HAND is a posture meaning “see the direction of the hand”, and “posture P6: LOW HAND” shown in FIG. 9F is a posture meaning “bend in the direction of the hand”.
[0051]
The gestures J1 to J4 will be described with reference to FIG. 10. “Gesture J1: HAND SWING” shown in FIG. 10A is “Caution”, and “Gesture J2: BYE BYE” shown in FIG. Is a gesture meaning “Goodbye”, “Gesture J3: COME HERE” shown in FIG. 10 (c) is “Get closer”, and “Gesture J4: HAND CIRCLING” shown in FIG. 10 (d) is a gesture meaning “Turn”. .
[0052]
In the present embodiment, the posture / gesture data storage unit 42A (see FIG. 6) stores posture data P1 to P6 (see FIG. 9) and gesture data J1 to J4 (see FIG. 10). Posture data and gesture data to be stored in the posture / gesture data storage unit 42A can be arbitrarily set. The meaning of each posture and each gesture can also be set arbitrarily.
[0053]
The posture / gesture recognition unit 42B determines whether the “face position m2t and the hand position m4t” from the “face position m2t in the real space” and the “hand position m4t in the real space” detected by the face / hand position detection means 41. "Relative positional relationship" and "variation of hand position m4t with respect to face position m2t" are detected, and the detection result and posture data P1 to P6 (posture / gesture data storage unit 42A) The posture or gesture of the target person is recognized by comparing the gesture data J1 to J4 (see FIG. 10). The recognition result in the posture / gesture recognition unit 42B is stored as a history.
[0054]
Next, a posture or gesture recognition method in the posture / gesture recognition unit 42B will be described in detail with reference to the flowcharts shown in FIGS. Here, first, the outline of the processing in the posture / gesture recognition unit 42B will be described with reference to the flowchart shown in FIG. 11, and then “step S1 in the flowchart shown in FIG. 11 will be described with reference to the flowchart shown in FIG. : Posture recognition processing ”, and“ Step S4: Posture / gesture recognition processing ”in the flowchart shown in FIG. 11 will be described with reference to the flowcharts shown in FIGS.
[0055]
(Outline of processing in the posture / gesture recognition unit 42B)
FIG. 11 is a flowchart for explaining an outline of processing in the posture / gesture recognition unit 42B. Referring to the flowchart shown in FIG. 11, first, in step S1, recognition of postures P1 to P4 (see FIG. 9) is attempted. Subsequently, in step S2, it is determined whether or not the posture has been recognized in step S1. If it is determined that the posture has been recognized, the process proceeds to step S3. If it is determined that the posture has not been recognized, the process proceeds to step S4. In step S3, the posture recognized in step S1 is output as a recognition result, and the process ends.
[0056]
In step S4, recognition of postures P5 and P6 (see FIG. 9) or gestures J1 to J4 (see FIG. 10) is attempted. Next, in step S5, it is determined whether or not a posture or gesture has been recognized in step S4. If it is determined that the posture or gesture has been recognized, the process proceeds to step S6. If it is determined that the posture or gesture has not been recognized, the process proceeds to step S8.
[0057]
In step S6, it is determined whether the same posture or gesture has been recognized a predetermined number of times (for example, 5 times) or more in a predetermined number of frames (for example, 10 frames) in the past. If it is determined that the same posture or gesture can be recognized a predetermined number of times or more, the process proceeds to step S7. If it is determined that the same posture or gesture cannot be recognized a predetermined number of times or more, the process proceeds to step S8.
[0058]
In step S7, the posture or gesture recognized in step S4 is output as a recognition result, and the process ends. In step S8, it is output that the posture or gesture could not be recognized, that is, it cannot be recognized, and the process is terminated.
[0059]
(Step S1: Posture recognition processing)
FIG. 12 is a flowchart for explaining “step S1: posture recognition processing” in the flowchart shown in FIG. Referring to the flowchart shown in FIG. 12, first, in step S11, the face position m2t and the hand position m4t (hereinafter referred to as “input information”) in the real space of the target person are input from the face / hand position detection means 41. The Subsequently, in step S12, based on the face position m2t and the hand position m4t, the distance from the camera 1 to the hand (hereinafter referred to as “hand distance”) and the distance from the camera 1 to the face (hereinafter referred to as “face distance”). To determine whether or not the hand distance and the face distance are substantially the same, that is, whether or not the difference between the hand distance and the face distance is equal to or less than a predetermined value. Here, if it is determined that the distance is approximately the same, the process proceeds to step S13, and if it is determined that the distance is not approximately the same, the process proceeds to step S18.
[0060]
In step S13, the hand height (hereinafter referred to as “hand height”) is compared with the face height (hereinafter referred to as “face height”), and the hand height and the face height are substantially the same. Whether or not the difference between the hand height and the face height is less than or equal to a predetermined value. If it is determined that the two are substantially the same, the process proceeds to step S14. If it is determined that the two are not substantially the same, the process proceeds to step S15. In step S14, a recognition result indicating that the posture corresponding to the input information is “posture P1: FACE SIDE” (see FIG. 9A) is output, and the process ends.
[0061]
In step S15, the hand height is compared with the face height to determine whether the hand height is higher than the face height. If it is determined that the hand height is higher than the face height, the process proceeds to step S16. If it is determined that the hand height is not higher than the face height, the process proceeds to step S17. In step S16, a recognition result indicating that the posture corresponding to the input information is “posture P2: HIGH HAND” (see FIG. 9B) is output, and the process ends. In step S17, a recognition result indicating that the posture corresponding to the input information is “none” is output, and the process ends.
[0062]
In step S18, the hand height and the face height are compared, and it is determined whether or not the hand height and the face height are substantially the same, that is, whether or not the difference between the hand height and the face height is a predetermined value or less. If it is determined that both are substantially the same, the process proceeds to step S19. If it is determined that both are not substantially the same, the process proceeds to step S20. In step S19, a recognition result indicating that the posture corresponding to the input information is “posture P3: STOP” (see FIG. 9C) is output, and the process ends.
[0063]
In step S20, the hand height is compared with the face height to determine whether the hand height is lower than the face height. If it is determined that the hand height is not lower than the face height, the process proceeds to step S21. If the hand height is determined to be higher than the face height, the process proceeds to step S22. In step S21, a recognition result indicating that the posture corresponding to the input information is “posture P4: HANDSHAKE” (see FIG. 9D) is output, and the process ends. In step S22, a recognition result “No” is output for the posture corresponding to the input information, and the process ends.
[0064]
(Step S4: posture / gesture recognition processing)
FIG. 13 is a first flowchart for explaining the “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. Referring to the flowchart shown in FIG. 13, first, in step S31, input information (a face position m2t and a hand position m4t in the real space of the target person) is input. Subsequently, in step S32, a standard deviation of the hand position m4t with respect to the face position m2t is obtained, and the presence / absence of hand movement is determined based on the obtained standard deviation. Specifically, when the standard deviation of the hand position m4t is equal to or smaller than a predetermined value, it is determined that there is no hand movement, and when it is larger than the predetermined value, it is determined that there is a hand movement. If it is determined that there is no hand movement, the process proceeds to step S33. If it is determined that there is a hand movement, the process proceeds to step S36.
[0065]
In step S33, it is determined whether or not the hand height is just below the face height. If it is determined that the hand height is immediately below the face height, the process proceeds to step S34. If it is determined that the hand height is not immediately below the face height, the process proceeds to step S35. In step S34, a recognition result indicating that the posture or gesture corresponding to the input information is “posture P5: SIDE HAND” (see FIG. 9E) is output, and the process ends. In step S35, a recognition result indicating that the posture or gesture corresponding to the input information is “posture P6: LOW HAND” (see FIG. 9F) is output, and the process ends.
[0066]
In step S36, the hand height is compared with the face height to determine whether the hand height is higher than the face height. If it is determined that the hand height is higher than the face height, the process proceeds to step S37. If it is determined that the hand height is not higher than the face height, the process proceeds to step S41 (see FIG. 14). In step S37, the hand distance and the face distance are compared, and it is determined whether or not the hand distance and the face distance are substantially the same, that is, whether or not the difference between the hand distance and the face distance is equal to or less than a predetermined value. . If it is determined that both are substantially the same, the process proceeds to step S38. If it is determined that both are not substantially the same, the process proceeds to step S40.
[0067]
In step S38, it is determined whether or not the hand is swung from side to side. Here, if it is determined that the hand is swinging left and right based on the shift in the horizontal direction between the two frames, the process proceeds to step S39. If it is determined that the hand is not swinging left and right, the process proceeds to step S40. In step S39, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J1: HAND SWING” (see FIG. 10A) is output, and the process ends. In step S41, the recognition result that the posture or gesture corresponding to the input information is “none” is output, and the process ends.
[0068]
FIG. 14 is a second flowchart for explaining the “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. Referring to the flowchart shown in FIG. 14, in step S41, the hand distance is compared with the face distance to determine whether the hand distance is shorter than the face distance. If it is determined that the hand distance is shorter than the face distance, the process proceeds to step S42. If it is determined that the hand distance is not shorter than the face distance, the process proceeds to step S47.
[0069]
In step S42, it is determined whether or not the hand is swung from side to side. Here, if it is determined that the hand is swinging left and right based on the shift in the horizontal direction between the two frames, the process proceeds to step S43. If it is determined that the hand is not swinging left and right, the process proceeds to step S44. In step S43, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J2: BYE BYE” (see FIG. 10B) is output, and the process ends.
[0070]
In step S44, it is determined whether or not the hand is swung up and down. If it is determined from the vertical shift between the two frames that the hand is swinging up and down, the process proceeds to step S45. If it is determined that the hand is not swinging up and down, the process proceeds to step S46. In step S45, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J3: COME HERE” (see FIG. 10C) is output, and the process ends. In step S46, the recognition result that the posture or gesture corresponding to the input information is “none” is output, and the process ends.
[0071]
In step S47, the hand distance and the face distance are compared, and it is determined whether or not the hand distance and the face distance are substantially the same, that is, whether or not the difference between the hand distance and the face distance is a predetermined value or less. If it is determined that both are substantially the same, the process proceeds to step S48. If it is determined that both are not substantially the same, the process proceeds to step S50. In step S48, it is determined whether or not the hand is swung from side to side. If it is determined that the hand is swinging left or right, the process proceeds to step S49. If it is determined that the hand is not swinging left or right, the process proceeds to step S50.
[0072]
In step S49, a recognition result indicating that the posture or gesture corresponding to the input information is “gesture J4: HAND CIRCLING” (see FIG. 10D) is output, and the process ends. In step S50, the recognition result that the posture or gesture corresponding to the input information is “none” is output, and the process ends.
[0073]
As described above, the posture / gesture recognition unit 42B obtains “the face position m2t and the face position m2t from the input information (the face position m2t and the hand position m4t in the real space of the target person) input from the face / hand position detection unit 41. “Relative positional relationship with the hand position m4t” and “fluctuation of the hand position m4t with respect to the face position m2t” are detected, and the detection result and posture / gesture are detected. data By comparing the postures P1 to P6 (see FIG. 9) and the gesture data J1 to J4 (see FIG. 10) stored in the storage unit 42A, the posture or gesture of the target person can be recognized.
[0074]
Note that the posture / gesture recognition unit 42B can recognize the posture or gesture of the target person by the following methods 1 and 2 in addition to the method described above. Hereinafter, “variation example 1 of the processing in the posture / gesture recognition unit 42B” will be described with reference to FIGS. 15 to 17, and the processing in the posture / gesture recognition unit 42B will be described with reference to FIGS. 18 and 19. Modification 2 ”will be described.
[0075]
(Modification 1 of processing in the posture / gesture recognition unit 42B)
In the first modification, the pattern matching method is used when recognizing the posture or gesture of the target person. FIG. 15 is a flowchart for explaining “variation example 1 of processing in the posture / gesture recognition unit 42 </ b> B”. Referring to the flowchart shown in FIG. 15, first, in step S61, recognition of a posture or a gesture is attempted. At this time, the posture / gesture recognition unit 42B receives the input information (the face position m2t and the hand position m4t in the real space of the target person) input from the face / hand position detection means 41 and “when the face position m2t is used as a reference. "Input pattern" consisting of "change in hand position m4t" data The posture or gesture of the target person can be obtained by searching for the most similar pattern by superimposing with the postures P11 to P16 (see FIG. 16) or the gesture data J11 to J14 (see FIG. 17) stored in the storage unit 42A. recognize. Posture and gesture data In the storage unit 42A, pattern matching posture data P11 to P16 (see FIG. 16) and gesture data J11 to J14 (see FIG. 17) are stored in advance.
[0076]
Subsequently, in step S62, it is determined whether or not a posture or a gesture has been recognized in step S61. If it is determined that the posture or gesture has been recognized, the process proceeds to step S63. If it is determined that the posture or gesture has not been recognized, the process proceeds to step S65.
[0077]
In step S63, it is determined whether the same posture or gesture has been recognized a predetermined number of times (for example, 5 times) or more in a predetermined number of frames (for example, 10 frames) in the past. If it is determined that the same posture or gesture can be recognized a predetermined number of times or more, the process proceeds to step S64. If it is determined that the same posture or gesture cannot be recognized a predetermined number of times or more, the process proceeds to step S65.
[0078]
In step S64, the posture or gesture recognized in step S61 is output as a recognition result, and the process ends. In step S65, it is output that the posture or the gesture could not be recognized, that is, the recognition is impossible, and the process is terminated.
[0079]
As described above, the posture / gesture recognition unit 42B uses the pattern matching method to input information input from the face / hand position detecting unit 41 and “variation of the hand position m4t with respect to the face position m2t. "Input pattern" consisting of "posture and gesture data By performing pattern matching with the postures P11 to P16 (see FIG. 16) and the gesture data J11 to J14 (see FIG. 17) stored in the storage unit 42A, the posture or gesture of the target person can be recognized.
[0080]
(Modification 2 of processing in the posture / gesture recognition unit 42B)
In the second modification, the relative position relationship between the face position m2t and the hand position m4t is set by setting a determination circle E having a size that allows the hand of the target person to enter and comparing the area of the hand with the area of the determination circle. Are different from “posture P4: HANDSHAKE” (see FIG. 9D) and “gesture J3: COME HERE” (see FIG. 10C). Note that “posture P4: HANDSHAKE” and “gesture J3: COME HERE” both have a hand position that is lower than the face position and the distance from the camera to the hand position is greater than the distance from the camera to the face position. It is short and difficult to distinguish from each other.
[0081]
FIG. 18 is a flowchart for explaining a second modification of the processing in the posture / gesture recognition unit 42B. Referring to the flowchart shown in FIG. 18, first, in step S71, a determination circle (determination area) E centered on the hand position m3 is set (see FIG. 19). The size of the determination circle E is set to a size that allows the hand to enter the determination circle E. The size (diameter) of the determination circle E is set with reference to the distance information generated by the captured image analysis device 2. In the example of FIG. 19, the diameter of the determination circle E is set to 20 cm.
[0082]
Subsequently, in step S72, it is determined whether or not the area Sh of the skin color region R2 in the determination circle E is ½ or more of the area S of the determination circle E. For the flesh color region R2, the flesh color region information generated by the captured image analysis device 2 is referred to. Here, when it is determined that the area Sh of the skin color region R2 is equal to or greater than ½ of the area S of the determination circle E (see FIG. 19B), the process proceeds to step S73, and the area Sh of the skin color region R2 is determined as the determination circle E. If it is determined that the area is not ½ or more of the area S, that is, smaller than ½ (see FIG. 19C), the process proceeds to step S74.
[0083]
In step S73, the determination result that the posture or gesture corresponding to the input information is “gesture J3: COME HERE” (see FIG. 10C) is output, and the process ends. In step S73, a determination result that the posture or gesture corresponding to the input information is “posture P4: HANDSHAKE” (see FIG. 9D) is output, and the process ends.
[0084]
As described above, the posture / gesture recognition unit 42B sets the determination circle E having a size that allows the hand of the target person to enter, and compares the area Sh of the skin color region R2 in the determination circle E with the area of the determination circle E. If the area of the hand is larger than 1/2 of the area of the determination circle, it is determined as “COME HERE”, and if the area of the hand is equal to or less than 1/2 of the area of the determination circle, “HANDSHAKE” is determined. By determining that there is, it is possible to distinguish between “gesture J3: COME HERE” and “posture P4: HANDSHAKE”.
[0085]
(Operation of gesture recognition system A)
Next, the operation of the gesture recognition system A will be described with reference to the block diagram showing the overall configuration of the gesture recognition system A shown in FIG. 1 and the flowcharts shown in FIGS. FIG. 20 is a flowchart for explaining the “captured image analysis step” and the “contour extraction step” in the operation of the gesture recognition system A, and FIG. 21 shows “face / hand position detection in the operation of the gesture recognition system A”. It is a flowchart shown in order to explain "step" and "posture / gesture recognition step".
[0086]
<Captured image analysis step>
Referring to the flowchart shown in FIG. 20, in captured image analysis apparatus 2, when captured images are input from cameras 1a and 1b (step S101), distance information that is distance information from the captured images is displayed in distance information generation unit 21. D1 (see FIG. 3A) is generated (step S102), and the motion information generating unit 22 generates a difference image D2 (see FIG. 3B) as motion information from the captured image (step S103). Further, the edge information generation unit 23 generates an edge image D3 (see FIG. 3C) as edge information from the captured image (step S104), and the skin color area information generation unit 24 uses the skin color area information from the captured image. A certain skin color region R1, R2 (see FIG. 3D) is extracted (step S105).
[0087]
<Outline extraction step>
Still referring to the flowchart shown in FIG. 20, in the contour extracting device 3, first, the target distance setting unit 31 uses the distance image D1 and the difference image D2 generated in steps S102 and S103 to determine the distance at which the target person exists. The target distance is set (step S106). Subsequently, the target distance image generation unit 32 generates a target distance image D4 (see FIG. 4B) obtained by extracting pixels existing at the target distance set in step S106 from the edge image D3 generated in step S104. (Step S107).
[0088]
Next, the target area setting unit 33 sets the target area T (see FIG. 5B) in the target distance image D4 generated in step S107 (step S108). Then, the contour extracting unit 34 extracts the contour O (see FIG. 5C) of the target person C from the target region T set in step S108 (step S109).
[0089]
<Face / hand position detection step>
Referring to the flowchart shown in FIG. 21, in the face / hand position detection means 41 of the gesture recognition device 4, first, the head position detection unit 41A uses the head information of the target person C based on the contour information generated in step S109. The top position m1 (see FIG. 7A) is detected (step S110).
[0090]
Subsequently, in the face position detection unit 41B, based on the top position m1 detected in step S110 and the skin color area information generated in step S105, the “face position m2 on the image” (FIG. 7B). Reference) is detected, and the distance information generated in step S102 is referred to from the detected “face position m2 (Xf, Yf) on the image”, and “face position m2t (Xft, Yft, Zft) "is obtained (step S111).
[0091]
Next, the “hand position m3 on the image” (see FIG. 8A) is detected from the “face position m2 on the image” detected in step S111 in the hand position detection unit 41C (step S112).
[0092]
Then, in the hand position detection unit 41D, based on the “face position m2 on the image” detected by the face position detection unit 41B and the hand position m3 detected by the hand position detection unit 41C, “the hand on the image” The position m4 "(see FIG. 8B) is detected, and the distance information generated in step S102 is referred to from the detected" hand position m4 (Xh, Yh) on the image " The hand position m4t (Xht, Yht, Zht) "is obtained (step S113).
[0093]
<Posture and gesture recognition step>
With reference to the flowchart shown in FIG. 18 again, the posture / gesture recognition unit 42 of the gesture recognition device 4 uses the “face position m2t (in real space) obtained by the posture / gesture recognition unit 42B in step S111 and step S113. From “Xft, Yft, Zft)” and “hand position m4t in real space (Xht, Yht, Zht)”, “relative positional relationship between face position m2t and hand position m4t” and “face position m2t are used as references. Variation of the hand position m4t at the time of detection, and the detection result and posture / gesture data The posture or gesture of the target person is recognized by comparing the postures P1 to P6 (see FIG. 9) and the gesture data J1 to J4 (see FIG. 10) stored in the storage unit 42A (step S114). Since the details of the posture or gesture recognition method in the posture / gesture recognition unit 42B have been described above, they are omitted here.
[0094]
Although the gesture recognition system A has been described above, the gesture recognition device 4 included in the gesture recognition system A can also implement each unit as each function program in a computer. It is also possible to operate as a recognition program.
[0095]
The gesture recognition system A can be applied to an autonomous robot, for example. In this case, the autonomous robot recognizes the posture as “posture P4: HANDSHAKE” (see FIG. 9D) when a person puts his hand, for example, or recognizes the gesture as “gesture J1: HAND”. It can be recognized as “SWING” (see FIG. 10A).
[0096]
It should be noted that instructions by gestures and gestures are not influenced by ambient noise and can be given even in situations where voice does not reach, compared to instructions by voice, and instructions that are difficult to express in words (or become redundant) There is an advantage that it can be done simply.
[0097]
【The invention's effect】
As described above in detail, according to the present invention, when recognizing the gesture of the target person, it is not necessary to detect feature points (points indicating the characteristics of the movement of the target person) one by one. Alternatively, the calculation amount required for the gesture recognition process can be reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a gesture recognition system A. FIG.
2 is a block diagram showing a configuration of a captured image analysis device 2 and a contour extraction device 3 included in the gesture recognition system A shown in FIG. 1. FIG.
3A is a diagram illustrating a distance image D1, FIG. 3B is a difference image D2, FIG. 3C is an edge image D3, and FIG. 3D is a diagram illustrating skin color regions R1 and R2.
FIG. 4 is a diagram for explaining a method of setting a target distance.
FIG. 5 is a diagram for explaining a method for setting a target region T and a method for extracting a contour O of a target person C from within the target region T;
6 is a block diagram showing a configuration of a gesture recognition device 4 included in the gesture recognition system A shown in FIG. 1. FIG.
FIG. 7A is a diagram for explaining a method for detecting the top position m1, and FIG. 7B is a diagram for explaining a method for detecting the face position m2.
8A is a diagram for explaining a method for detecting the hand position m3, and FIG. 8B is a diagram for explaining a method for detecting the hand position m4.
FIG. 9 is a diagram showing posture data P1 to P6.
FIG. 10 is a diagram showing gesture data J1 to J4.
FIG. 11 is a flowchart for explaining an outline of processing in a posture / gesture recognition unit 42B;
12 is a flowchart for explaining “step S1: posture recognition processing” in the flowchart shown in FIG. 11. FIG.
13 is a first flowchart for explaining “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. 11. FIG.
14 is a second flowchart for explaining the “step S4: posture / gesture recognition processing” in the flowchart shown in FIG. 11. FIG.
FIG. 15 is a flowchart for explaining a first modification of the processing in the posture / gesture recognition unit 42B;
FIG. 16 is a diagram showing posture data P11 to P16.
FIG. 17 is a diagram showing gesture data J11 to J14.
FIG. 18 is a flowchart for explaining a second modification of the processing in the posture / gesture recognition unit 42B;
FIG. 19A is a diagram for explaining a method for setting a determination circle E; Further, (b) is a diagram showing a case where the area Sh of the skin color region R2 is larger than ½ of the area S of the determination circle E, and (c) is an area of the skin color region R2. Sh Is the area of judgment circle E S It is a figure which shows the case where it is below 1/2.
FIG. 20 is a flowchart shown for explaining the “captured image analysis step” and the “contour extraction step” in the operation of the gesture recognition system A.
FIG. 21 is a flowchart for explaining a “face / hand position detection step” and a “posture / gesture recognition step” in the operation of the gesture recognition system A;
[Explanation of symbols]
A gesture recognition system
1 Camera
2 Captured image analyzer
3 Contour extraction device
4 Gesture recognition device
41 Face / hand position detection means
41A Head position detector
41B face position detection unit
41C Hand position detector
41D Hand position detector
42 Posture and gesture recognition means
42A Posture / gesture data storage
42B Posture / Gesture Recognition Unit

Claims (5)

  1. An apparatus for recognizing a posture or a gesture of the target person from an image obtained by capturing the target person with a camera,
    A face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the contour information and skin color area information of the target person generated from the captured image;
    From the face position and the hand position, a relative positional relationship between the face position and the hand position and a variation in the hand position when the face position is used as a reference are detected, and the detection result, the face position and the hand position are detected. Comparing the posture or gesture of the target person by comparing the posture data or the gesture data describing the posture or the gesture corresponding to the relative positional relationship with the position and the movement of the hand position when the face position is used as a reference. Posture and gesture recognition means for recognizing ,
    The posture / gesture recognizing means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, thereby A gesture recognition device that distinguishes a posture or a gesture in which a relative positional relationship between a face position and the hand position is similar .
  2.   The gesture recognition apparatus according to claim 1, wherein a relative positional relationship between the face position and the hand position is detected by comparing a height and a distance from the camera.
  3.   The gesture recognition apparatus according to claim 1, wherein the posture / gesture recognition unit recognizes a posture or a gesture of the target person using a pattern matching method.
  4. A method for recognizing a posture or gesture of the target person from an image of the target person captured by a camera,
    A face / hand position detecting step for detecting a face position and a hand position in the real space of the target person by a face / hand position detecting means based on the contour information and skin color area information of the target person generated from the image;
    From the face position and the hand position, a posture / gesture recognition unit detects a relative positional relationship between the face position and the hand position and a change in the hand position when the face position is used as a reference. By comparing the result with the posture data or gesture data describing the relative positional relationship between the face position and the hand position and the posture or gesture corresponding to the variation of the hand position when the face position is used as a reference, A gesture / gesture recognition step for recognizing a posture or gesture of a target person ,
    In the posture / gesture recognition step, the posture / gesture recognition means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and the area of the hand and the area of the determination area A gesture recognition method characterized by distinguishing a posture or a gesture in which the relative positional relationship between the face position and the hand position is similar by comparing with each other .
  5. In order to recognize the posture or gesture of the target person from an image obtained by capturing the target person with a camera,
    Face / hand position detecting means for detecting a face position and a hand position in the real space of the target person based on the contour information and skin color area information of the target person generated from the image;
    From the face position and the hand position, a relative positional relationship between the face position and the hand position and a variation in the hand position when the face position is used as a reference are detected, and the detection result, the face position and the hand position are detected. The posture or gesture of the target person is compared by comparing posture data or gesture data describing the posture or gesture corresponding to the positional relationship relative to the position and the movement of the hand position when the face position is used as a reference. Function as a gesture and gesture recognition means to recognize ,
    The posture / gesture recognizing means sets a determination area of a size in which the hand of the target person can enter with respect to the relative positional relationship, and compares the area of the hand with the area of the determination area, thereby A gesture recognition program characterized by distinguishing a posture or a gesture whose relative positional relationship between a face position and the hand position is similar .
JP2003096271A 2003-03-31 2003-03-31 Gesture recognition device, gesture recognition method, and gesture recognition program Active JP4153818B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003096271A JP4153818B2 (en) 2003-03-31 2003-03-31 Gesture recognition device, gesture recognition method, and gesture recognition program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2003096271A JP4153818B2 (en) 2003-03-31 2003-03-31 Gesture recognition device, gesture recognition method, and gesture recognition program
DE200460006190 DE602004006190T8 (en) 2003-03-31 2004-03-19 Device, method and program for gesture recognition
EP04006728A EP1477924B1 (en) 2003-03-31 2004-03-19 Gesture recognition apparatus, method and program
US10/805,392 US7593552B2 (en) 2003-03-31 2004-03-22 Gesture recognition apparatus, gesture recognition method, and gesture recognition program

Publications (2)

Publication Number Publication Date
JP2004302992A JP2004302992A (en) 2004-10-28
JP4153818B2 true JP4153818B2 (en) 2008-09-24

Family

ID=33408391

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003096271A Active JP4153818B2 (en) 2003-03-31 2003-03-31 Gesture recognition device, gesture recognition method, and gesture recognition program

Country Status (1)

Country Link
JP (1) JP4153818B2 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4711885B2 (en) * 2006-05-25 2011-06-29 三菱電機株式会社 Remote control device and method
KR100826878B1 (en) * 2006-09-28 2008-05-06 한국전자통신연구원 Hand shafe recognition method and apparatus for thereof
US8487938B2 (en) * 2009-01-30 2013-07-16 Microsoft Corporation Standard Gestures
JP2010238145A (en) * 2009-03-31 2010-10-21 Casio Computer Co Ltd Information output device, remote control method and program
JP5553141B2 (en) 2009-11-11 2014-07-16 ソニー株式会社 Image processing system, image processing apparatus, image processing method, and program
JP5569062B2 (en) * 2010-03-15 2014-08-13 オムロン株式会社 Gesture recognition device, method for controlling gesture recognition device, and control program
JP5625643B2 (en) 2010-09-07 2014-11-19 ソニー株式会社 Information processing apparatus and information processing method
JP5829390B2 (en) * 2010-09-07 2015-12-09 ソニー株式会社 Information processing apparatus and information processing method
JP5604279B2 (en) * 2010-12-08 2014-10-08 日本システムウエア株式会社 Gesture recognition apparatus, method, program, and computer-readable medium storing the program
JP5653206B2 (en) 2010-12-27 2015-01-14 日立マクセル株式会社 Video processing device
EP2671134A1 (en) * 2011-02-04 2013-12-11 Koninklijke Philips N.V. Gesture controllable system uses proprioception to create absolute frame of reference
WO2012143829A2 (en) * 2011-04-20 2012-10-26 Koninklijke Philips Electronics N.V. Gesture based control of element or item
JP5865615B2 (en) * 2011-06-30 2016-02-17 株式会社東芝 Electronic apparatus and control method
US9117274B2 (en) * 2011-08-01 2015-08-25 Fuji Xerox Co., Ltd. System and method for interactive markerless paper documents in 3D space with mobile cameras and projectors
JP5174978B1 (en) * 2012-04-26 2013-04-03 株式会社三菱東京Ufj銀行 Information processing apparatus, electronic device, and program
WO2014030442A1 (en) * 2012-08-22 2014-02-27 日本電気株式会社 Input device, input method, program, and electronic sign
JP5783385B2 (en) * 2013-02-27 2015-09-24 カシオ計算機株式会社 data processing apparatus and program
JP5518225B2 (en) * 2013-03-07 2014-06-11 富士通テン株式会社 Display device
JP6460862B2 (en) * 2014-03-20 2019-01-30 国立研究開発法人産業技術総合研究所 Gesture recognition device, system and program thereof

Also Published As

Publication number Publication date
JP2004302992A (en) 2004-10-28

Similar Documents

Publication Publication Date Title
JP6079832B2 (en) Human computer interaction system, hand-to-hand pointing point positioning method, and finger gesture determination method
JP5777582B2 (en) Detection and tracking of objects in images
Zhou et al. A novel finger and hand pose estimation technique for real-time hand gesture recognition
KR101381439B1 (en) Face recognition apparatus, and face recognition method
US9129155B2 (en) Systems and methods for initializing motion tracking of human hands using template matching within bounded regions determined using a depth map
US10234957B2 (en) Information processing device and method, program and recording medium for identifying a gesture of a person from captured image data
JP5629803B2 (en) Image processing apparatus, imaging apparatus, and image processing method
US20130322770A1 (en) Information processing apparatus and control method therefor
US8884985B2 (en) Interface apparatus, method, and recording medium
US8218862B2 (en) Automatic mask design and registration and feature detection for computer-aided skin analysis
Liu et al. Hand gesture recognition using depth data
US20140211991A1 (en) Systems and methods for initializing motion tracking of human hands
JP4650669B2 (en) Motion recognition device
US9129154B2 (en) Gesture recognition apparatus, robot system including the same and gesture recognition method using the same
US7194393B2 (en) Numerical model for image feature extraction
JP5715946B2 (en) Motion analysis apparatus and motion analysis method
JP5303652B2 (en) Apparatus, method, and computer program for recognizing gesture in image, and apparatus, method, and computer program for controlling device
WO2019080580A1 (en) 3d face identity authentication method and apparatus
US6072903A (en) Image processing apparatus and image processing method
JP4756660B2 (en) Image processing apparatus and image processing method
JP4929109B2 (en) Gesture recognition apparatus and method
TW569148B (en) Method for locating facial features in an image
KR101700817B1 (en) Apparatus and method for multiple armas and hands detection and traking using 3d image
JP3863809B2 (en) Input system by hand image recognition
Juang et al. Computer vision-based human body segmentation and posture estimation

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20051130

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20080408

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20080609

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20080701

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20080704

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110711

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110711

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120711

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120711

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130711

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140711

Year of fee payment: 6

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250