CN107463873B - Real-time gesture analysis and evaluation method and system based on RGBD depth sensor - Google Patents
Real-time gesture analysis and evaluation method and system based on RGBD depth sensor Download PDFInfo
- Publication number
- CN107463873B CN107463873B CN201710523575.5A CN201710523575A CN107463873B CN 107463873 B CN107463873 B CN 107463873B CN 201710523575 A CN201710523575 A CN 201710523575A CN 107463873 B CN107463873 B CN 107463873B
- Authority
- CN
- China
- Prior art keywords
- palm
- node
- frame
- initial image
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 52
- 238000004458 analytical method Methods 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 210000000707 wrist Anatomy 0.000 claims description 30
- 238000013528 artificial neural network Methods 0.000 claims description 13
- 230000007797 corrosion Effects 0.000 claims description 6
- 238000005260 corrosion Methods 0.000 claims description 6
- 239000011541 reaction mixture Substances 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 9
- 230000000694 effects Effects 0.000 abstract description 4
- 238000005286 illumination Methods 0.000 abstract description 4
- 238000012544 monitoring process Methods 0.000 abstract description 4
- 230000003068 static effect Effects 0.000 abstract description 3
- 230000007613 environmental effect Effects 0.000 abstract 1
- 230000010339 dilation Effects 0.000 description 4
- 230000003628 erosive effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a real-time gesture analysis and evaluation method and system based on an RGBD depth sensor, which comprises a static gesture recognition and evaluation system of a train driver palm and a dynamic gesture recognition and evaluation system of a train driver arm, wherein the static gesture recognition and evaluation system of the train driver palm comprises a palm center position determining module, a palm area image extracting module, a denoising module and a gesture recognition and evaluation module; the train driver arm dynamic gesture recognition and evaluation system comprises an arm skeleton node motion sequence extraction module, a dynamic gesture optimal matching module and an arm dynamic gesture evaluation module. The method has strong robustness to environmental background and illumination, and adopts gesture pixel search based on palm nodes when detecting the palm gesture, so that the detection effect of the palm gesture is improved; the method and the device can monitor the gestures of the driver in real time to ensure the running safety of the train, can also avoid artificial monitoring of the gestures of the driver of the train, and reduce the consumption of human resources.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a real-time gesture analysis and evaluation method and system based on an RGBD depth sensor.
Background
The gesture recognition technology has important research value as one of the key technologies of the future human-computer interaction system and has wide application prospect. At present, the traditional gesture recognition method usually performs gesture detection and recognition on an input two-dimensional image, however, the method is sensitive to the input image, the gesture detection and recognition effect is good when the background is simple and the influence of ambient light is small, however, the gesture detection and recognition effect is sharply reduced when the background is complex and the change of light is large, and the application range is limited. In recent years, in order to overcome the defects of the traditional gesture recognition method, three-dimensional image sensing equipment is more and more favored by people, and the equipment can acquire not only an RGB image but also depth data of the image, so that the influence of complex environment background, illumination change and the like on gesture recognition can be avoided.
At present, gestures are widely applied in the traffic field, for example, a train driver needs to demonstrate the conditions of instrument detection in a train before, during and in the train to ensure the safe running of the train and prevent accidents, and a traffic police ensures the safe and smooth road traffic through a series of gestures. However, since the working environments of train drivers, traffic polices and other personnel are complex, the light changes greatly, the traditional method cannot effectively identify and evaluate the gestures of the drivers, and the standard gestures of the drivers are numerous, which not only include dynamic gestures on arms, but also include gestures in palm areas, thereby further increasing the difficulty of gesture identification. The traditional gesture recognition method mainly comprises two steps, namely firstly detecting a palm and an arm on a two-dimensional image, and secondly recognizing the detected gesture. Generally speaking, the quality of gesture detection directly affects the gesture recognition effect, usually the background environment and illumination will affect gesture detection, and it is a big difficulty to detect palms and arms on two-dimensional images, the traditional method is to train a large number of gesture samples to obtain classifiers, however, human hands are complex variants, gestures have the characteristics of diversity, ambiguity, time difference and the like, and it is difficult to train to obtain ideal gesture classifiers, so that the traditional gesture recognition method cannot be applied to gesture recognition of drivers and the like.
Disclosure of Invention
The invention aims to provide a real-time gesture analysis and evaluation system based on an RGBD depth sensor, which solves the problem of detecting palms and arms under complex background and illumination conditions, can analyze and evaluate various gestures of workers such as train drivers, traffic polices and the like, and has wide application prospect.
The technical scheme adopted by the invention is that,
a real-time palm gesture analysis and evaluation method based on an RGBD depth sensor comprises the following steps:
step 1, acquiring T frames of initial images within a period of time by using an RGBD sensor, wherein each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and determining the coordinates of the palm node of the T frames of initial images;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
step 2, extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image, wherein the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
step 3, performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
step 4, recognizing the gesture of the palm area in the T-frame de-noised palm area image through a neural network, and obtaining the fraction P of the recognized gesture of the palm area through the formula (1)palm:
In the formula (1), the reaction mixture is,inputting the denoised palm region image of the T frame into the neural network to obtain the output result of the T frame of the neural network, wherein T is the denoised palm region imageThe total number of frames, round (·) represents an integer.
A real-time arm gesture analysis and evaluation method based on an RGBD depth sensor comprises the following steps:
step 1, acquiring a T frame initial image within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial image as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t:
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
step 2, finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of the distances of corresponding points between the motion sequence of the arm skeleton nodes in the initial image and the motion sequence of the driver standard dynamic gesture sample in the motion sequence library of the driver standard dynamic gesture sample;
step 3, obtaining the score P of the dynamic gesture of the arm through the formula (4)arm:
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
A real-time gesture analysis and evaluation method based on an RGBD depth sensor comprises the real-time palm gesture analysis and evaluation method of claim 1 and the real-time arm gesture analysis and evaluation method of claim 2.
A real-time palm gesture analysis and evaluation system based on an RGBD depth sensor comprises:
the device comprises a palm center position determining module, a palm image acquiring module and a palm image acquiring module, wherein the palm center position determining module is used for acquiring T frames of initial images in a period of time by using an RGBD sensor, each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and the palm node coordinates of the T frames of initial images are determined;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiRepresenting the ith pixelAbscissa, yiThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
the palm region image extracting module is used for extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image:
the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkDenotes the k-th timeSet of gesture pixels searched, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
a denoising module: the image processing device is used for performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
gesture recognition and evaluation module: the method is used for recognizing the gesture of the palm region in the T-frame de-noised palm region image through a neural network and obtaining the fraction P of the recognized gesture of the palm region through the formula (1)palm:
In the formula (1), the reaction mixture is,and (3) inputting the denoised palm region image of the T frame into an output result of the T frame of the neural network, wherein T is the total frame number of the denoised palm region image, and round (·) represents an integer.
A real-time arm gesture analysis and evaluation system based on an RGBD depth sensor comprises:
the arm skeleton node motion sequence extraction module is used for acquiring T frame initial images within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial images as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t:
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
the dynamic gesture optimal matching module is used for finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of corresponding point distances between the motion sequence and the arm skeleton nodes in the initial image in the motion sequence library of the driver standard dynamic gesture sample;
an arm dynamic gesture evaluation module for obtaining the score P of the arm dynamic gesture through the formula (4)arm:
In equation (4), α is the DTW distance between the standard gesture sequence samplesFrom the average value of the values,Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
A real-time gesture analysis and evaluation system based on RGBD depth sensors, comprising the real-time palm gesture analysis and evaluation system of claim 4 and the real-time arm gesture analysis and evaluation system of claim 5.
The invention has the advantages that
According to the invention, the RGBD depth sensor is adopted to obtain the depth data of the two-dimensional image, and key data such as the palm and the arm can be well extracted through a corresponding algorithm. Secondly, the method and the device can simultaneously recognize the palm gesture and the arm dynamic gesture of the driver, can perform normative evaluation on the driver gesture according to the output result of the recognition algorithm and give scores of the palm gesture and the arm dynamic gesture, not only can monitor the driver gesture in real time to ensure the running safety of the train, but also can avoid artificial monitoring on the train driver gesture and reduce the consumption of human resources.
Drawings
FIG. 1 is a flow chart of train driver palm area gesture recognition and evaluation;
FIG. 2 is a schematic view of nodes of a train driver's palm and arm;
FIG. 3 is a flow chart of train driver arm dynamic gesture recognition and evaluation;
fig. 4 is a diagram of an application scenario of the present invention.
Detailed Description
Train driver gestures usually include a palm region gesture and an arm portion dynamic gesture, and the palm region gesture and the arm portion dynamic gesture recognition process are respectively described in detail below.
Example 1
A real-time palm gesture analysis and evaluation method based on an RGBD depth sensor is characterized by comprising the following steps:
step 1, acquiring T frames of initial images within a period of time by using an RGBD sensor, wherein each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and determining the coordinates of the palm node of the T frames of initial images;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
as shown in fig. 2, the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
as shown in fig. 4, the palm node coordinate P is in a coordinate system with the center of the RGBD sensor as the origin, the horizontal direction as the X axis, the vertical direction as the Y axis, and the direction of the sensor pointing to the driver as the Z axis;
because the RGBD sensor takes place node drift phenomenon easily when tracking human skeleton node, leads to the distance between palm and the sensor to have the deviation, in order to reduce the deviation, need to rectify initial palm node and obtain the accurate coordinate of palm node P:
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
step 2, extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image, wherein the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
because the W and the H in the rectangular search area can not be set too small, the change of the size of the gesture area is prevented from causing incomplete gesture detection, and the height and the width of the rectangular search area are as follows:wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
step 3, performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
the palm area image usually contains some noises, which include some burrs of the gesture edge and some holes inside the image, and in order to obtain a more accurate gesture image, it is necessary to perform dilation and erosion operations, the dilation can remove some burrs of the binary gesture image edge and scattered noise points, and the erosion can fill some holes inside the image.
Step 4, recognizing the gesture of the palm area in the T-frame de-noised palm area image through a neural network, and obtaining the fraction P of the recognized gesture of the palm area through the formula (1)palm:
In the formula (1), the reaction mixture is,and (3) inputting the denoised palm region image of the T frame into an output result of the T frame of the neural network, wherein T is the total frame number of the denoised palm region image, and round (·) represents an integer.
Example 2
A real-time arm gesture analysis and evaluation method based on an RGBD depth sensor, as shown in FIG. 3, comprises the following steps:
step 1, acquiring a T frame initial image within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial image as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t:
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
step 2, finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of the distances of corresponding points between the motion sequence of the arm skeleton nodes in the initial image and the motion sequence of the driver standard dynamic gesture sample in the motion sequence library of the driver standard dynamic gesture sample;
in this embodiment, the DTW algorithm is used to solve the problem of different lengths of the two motion sequences, and the motion sequence D of the standard dynamic gesture sample of the driver is seta=(Da 1,Da 2,...,Da T′) Motion sequence D of arm skeleton nodes in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn T) Let the point-to-point relationship between the two sequences be (k) to (phi)s(k),φa(k) Wherein 1 is less than or equal to phis(k)≤T,1≤φa(k) T 'is less than or equal to T', max (T, T ') < k is less than or equal to T + T', and the DTW algorithm aims to find out between two sequencesSuch that the sum of distances between corresponding points DTW (D) is givena,Dsn) Minimum:
step 3, obtaining the score P of the dynamic gesture of the arm through the formula (4)arm:
Example 3
The embodiment provides a real-time gesture analysis and evaluation method based on an RGBD depth sensor based on embodiments 1 and 2, which includes the real-time palm gesture analysis and evaluation method provided in embodiment 1 and the real-time arm gesture analysis and evaluation method provided in embodiment 2. The embodiment can simultaneously recognize the palm gesture and the arm dynamic gesture of the driver, can perform normative evaluation on the gesture of the driver according to the output result of the recognition algorithm and give scores of the palm gesture and the arm dynamic gesture, can supervise the gesture of the driver in real time to ensure the running safety of a train, can avoid artificial gesture monitoring on the train driver, and reduces the consumption of human resources.
Example 4
The embodiment provides a static gesture recognition and evaluation system for train driver palms, as shown in fig. 1, including:
the device comprises a palm center position determining module, a palm image acquiring module and a palm image acquiring module, wherein the palm center position determining module is used for acquiring T frames of initial images in a period of time by using an RGBD sensor, each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and the palm node coordinates of the T frames of initial images are determined;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
as shown in fig. 2, the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
as shown in fig. 4, the palm node coordinate P is in a coordinate system with the center of the RGBD sensor as the origin, the horizontal direction as the X axis, the vertical direction as the Y axis, and the direction of the sensor pointing to the driver as the Z axis;
because the RGBD sensor takes place node drift phenomenon easily when tracking human skeleton node, leads to the distance between palm and the sensor to have the deviation, in order to reduce the deviation, need to rectify initial palm node and obtain the accurate coordinate of palm node P:
the palm region image extracting module is used for extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image:
the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for a palm pixel point in a rectangular region with the width of W and the height of H, and filling the rectangular region with the palm pixel pointPutting the palm pixel points of the foot type (2) into a palm pixel point set SkObtaining a current palm area image;
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
because the W and the H in the rectangular search area can not be set too small, the change of the size of the gesture area is prevented from causing incomplete gesture detection, and the height and the width of the rectangular search area are as follows:wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
a denoising module: the image processing device is used for performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
the palm area image usually contains some noises, which include some burrs of the gesture edge and some holes inside the image, and in order to obtain a more accurate gesture image, it is necessary to perform dilation and erosion operations, the dilation can remove some burrs of the binary gesture image edge and scattered noise points, and the erosion can fill some holes inside the image.
Gesture recognition and evaluation module: the method is used for recognizing the gesture of the palm region in the T-frame de-noised palm region image through a neural network and obtaining the fraction P of the recognized gesture of the palm region through the formula (1)palm:
Example 5
The embodiment provides a dynamic gesture recognition and evaluation system for train driver arms, as shown in fig. 3, including:
the arm skeleton node motion sequence extraction module is used for acquiring T frame initial images within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial images as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t:
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
the dynamic gesture optimal matching module is used for finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of corresponding point distances between the motion sequence and the arm skeleton nodes in the initial image in the motion sequence library of the driver standard dynamic gesture sample;
in this embodiment, the DTW algorithm is used to solve the problem of different lengths of the two motion sequences, and the motion sequence D of the standard dynamic gesture sample of the driver is seta=(Da 1,Da 2,...,Da T′) Motion sequence D of arm skeleton nodes in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn T) Let the point-to-point relationship between the two sequences be (k) to (phi)s(k),φa(k) Wherein 1 is less than or equal to phis(k)≤T,1≤φa(k) T ≦ T ', max (T, T ') ≦ k ≦ T + T ', and the DTW algorithm aims to find the optimal point pair relationship φ (k) between the two sequences, so that the sum of the distances between the corresponding points DTW (D)a,Dsn) Minimum:
an arm dynamic gesture evaluation module for obtaining the score P of the arm dynamic gesture through the formula (4)arm:
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,Da,Dbrepresenting any motion sequence in a motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, a, N, b is 1,2, a, N, a is not equal to b, and N is the total number of motion sequences in the motion sequence library of the driver standard dynamic gesture sample; dsn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T),DsnIs the motion sequence of the arm skeleton node in the initial image.
Example 6
In this embodiment, on the basis of embodiments 4 and 5, a real-time gesture analysis and evaluation system based on an RGBD depth sensor is provided, which includes the real-time palm gesture analysis and evaluation system provided in embodiment 4 and the real-time arm gesture analysis and evaluation system provided in embodiment 5. The embodiment can simultaneously recognize the palm gesture and the arm dynamic gesture of the driver, can perform normative evaluation on the gesture of the driver according to the output result of the recognition algorithm and give scores of the palm gesture and the arm dynamic gesture, can supervise the gesture of the driver in real time to ensure the running safety of a train, can avoid artificial gesture monitoring on the train driver, and reduces the consumption of human resources.
Claims (4)
1. A real-time palm gesture analysis and evaluation method based on an RGBD depth sensor is characterized by comprising the following steps:
step 1, acquiring T frames of initial images within a period of time by using an RGBD sensor, wherein each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and determining the coordinates of the palm node of the T frames of initial images;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
step 2, extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image, wherein the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
in formula (2), k is 1,2Indicating the number of searches, dpRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
step 3, performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
step 4, recognizing the gesture of the palm area in the T-frame de-noised palm area image through a neural network, and obtaining the fraction P of the recognized gesture of the palm area through the formula (1)palm:
In the formula (1), the reaction mixture is, inputting the denoised palm region image of the T frame into the output result of the T frame of the neural network, wherein T is the denoised output result of the T frame of the neural networkThe total frame number of the palm region image of (1), round (·) represents an integer.
2. A real-time gesture analysis and evaluation method based on an RGBD depth sensor is characterized by comprising the real-time palm gesture analysis and evaluation method and the real-time arm gesture analysis and evaluation method of claim 1;
the real-time arm gesture analysis and evaluation method comprises the following steps:
step 1, acquiring a T frame initial image within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial image as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t:
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
step 2, finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of the distances of corresponding points between the motion sequence of the arm skeleton nodes in the initial image and the motion sequence of the driver standard dynamic gesture sample in the motion sequence library of the driver standard dynamic gesture sample;
step 3, obtaining the score P of the dynamic gesture of the arm through the formula (4)arm:
3. A real-time palm gesture analysis and evaluation system based on an RGBD depth sensor is characterized by comprising:
the device comprises a palm center position determining module, a palm image acquiring module and a palm image acquiring module, wherein the palm center position determining module is used for acquiring T frames of initial images in a period of time by using an RGBD sensor, each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and the palm node coordinates of the T frames of initial images are determined;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
the palm region image extracting module is used for extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image:
the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di) Representing the palm node of the current initial image and the ith image in the rectangular areaThe absolute value of the difference between the distances from the prime point to the RGBD sensor, where threshold represents a threshold, 25 ≦ threshold ≦ 35, SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
a denoising module: the image processing device is used for performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
gesture recognition and evaluation module: the method is used for recognizing the gesture of the palm region in the T-frame de-noised palm region image through a neural network and obtaining the fraction P of the recognized gesture of the palm region through the formula (1)palm:
4. A real-time gesture analysis and evaluation system based on an RGBD depth sensor, comprising the real-time palm gesture analysis and evaluation system and the real-time arm gesture analysis and evaluation system of claim 3;
the real-time arm gesture analysis and evaluation system comprises:
the arm skeleton node motion sequence extraction module is used for acquiring T frame initial images within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial images as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t:
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
the dynamic gesture optimal matching module is used for finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of corresponding point distances between the motion sequence and the arm skeleton nodes in the initial image in the motion sequence library of the driver standard dynamic gesture sample;
an arm dynamic gesture evaluation module for obtaining the score P of the arm dynamic gesture through the formula (4)arm:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710523575.5A CN107463873B (en) | 2017-06-30 | 2017-06-30 | Real-time gesture analysis and evaluation method and system based on RGBD depth sensor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710523575.5A CN107463873B (en) | 2017-06-30 | 2017-06-30 | Real-time gesture analysis and evaluation method and system based on RGBD depth sensor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107463873A CN107463873A (en) | 2017-12-12 |
CN107463873B true CN107463873B (en) | 2020-02-21 |
Family
ID=60546461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710523575.5A Expired - Fee Related CN107463873B (en) | 2017-06-30 | 2017-06-30 | Real-time gesture analysis and evaluation method and system based on RGBD depth sensor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107463873B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6962878B2 (en) * | 2018-07-24 | 2021-11-05 | 本田技研工業株式会社 | Operation assistance system and operation assistance method |
CN110032957B (en) * | 2019-03-27 | 2023-10-17 | 长春理工大学 | Gesture spatial domain matching method based on skeleton node information |
CN110175566B (en) * | 2019-05-27 | 2022-12-23 | 大连理工大学 | Hand posture estimation system and method based on RGBD fusion network |
CN110717385A (en) * | 2019-08-30 | 2020-01-21 | 西安文理学院 | Dynamic gesture recognition method |
CN113657346A (en) * | 2021-08-31 | 2021-11-16 | 深圳市比一比网络科技有限公司 | Driver action recognition method based on combination of target detection and key point detection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923669A (en) * | 2008-07-18 | 2010-12-22 | 史迪芬·凯斯 | Intelligent adaptive design |
CN103914132A (en) * | 2013-01-07 | 2014-07-09 | 富士通株式会社 | Method and system for recognizing gestures based on fingers |
CN103926999A (en) * | 2013-01-16 | 2014-07-16 | 株式会社理光 | Palm opening and closing gesture recognition method and device and man-machine interaction method and device |
-
2017
- 2017-06-30 CN CN201710523575.5A patent/CN107463873B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923669A (en) * | 2008-07-18 | 2010-12-22 | 史迪芬·凯斯 | Intelligent adaptive design |
CN103914132A (en) * | 2013-01-07 | 2014-07-09 | 富士通株式会社 | Method and system for recognizing gestures based on fingers |
CN103926999A (en) * | 2013-01-16 | 2014-07-16 | 株式会社理光 | Palm opening and closing gesture recognition method and device and man-machine interaction method and device |
Non-Patent Citations (2)
Title |
---|
Finger detection and hand posture recognition based on depth information;Stergios Poularakis等;《IEEE International Conference on Acoustics, Speech and Processing》;20140531;1-5页 * |
基于Kinect骨架信息的交通警察手势识别;刘阳等;《计算机工程与应用》;20150331(第3期);157-161页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107463873A (en) | 2017-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107463873B (en) | Real-time gesture analysis and evaluation method and system based on RGBD depth sensor | |
US11643076B2 (en) | Forward collision control method and apparatus, electronic device, program, and medium | |
CN111611643B (en) | Household vectorization data acquisition method and device, electronic equipment and storage medium | |
WO2019232894A1 (en) | Complex scene-based human body key point detection system and method | |
CN109559330B (en) | Visual tracking method and device for moving target, electronic equipment and storage medium | |
CN108256421A (en) | A kind of dynamic gesture sequence real-time identification method, system and device | |
CN106934333B (en) | Gesture recognition method and system | |
CN106600625A (en) | Image processing method and device for detecting small-sized living thing | |
CN109145696B (en) | Old people falling detection method and system based on deep learning | |
Rahman et al. | Person identification using ear biometrics | |
Kalsh et al. | Sign language recognition system | |
CN111914832B (en) | SLAM method of RGB-D camera under dynamic scene | |
CN113608663B (en) | Fingertip tracking method based on deep learning and K-curvature method | |
CN110796101A (en) | Face recognition method and system of embedded platform | |
US9160986B2 (en) | Device for monitoring surroundings of a vehicle | |
Chansri et al. | Reliability and accuracy of Thai sign language recognition with Kinect sensor | |
CN115527269A (en) | Intelligent human body posture image identification method and system | |
KR20190050551A (en) | Apparatus and method for recognizing body motion based on depth map information | |
CN103426000A (en) | Method for detecting static gesture fingertip | |
CN102609727A (en) | Fire flame detection method based on dimensionless feature extraction | |
CN109492573A (en) | A kind of pointer read method and device | |
CN117409386A (en) | Garbage positioning method based on laser vision fusion | |
CN112381747A (en) | Terahertz and visible light image registration method and device based on contour feature points | |
CN103093481A (en) | Moving object detection method under static background based on watershed segmentation | |
CN111860084A (en) | Image feature matching and positioning method and device and positioning system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200221 |