CN107463873B - Real-time gesture analysis and evaluation method and system based on RGBD depth sensor - Google Patents

Real-time gesture analysis and evaluation method and system based on RGBD depth sensor Download PDF

Info

Publication number
CN107463873B
CN107463873B CN201710523575.5A CN201710523575A CN107463873B CN 107463873 B CN107463873 B CN 107463873B CN 201710523575 A CN201710523575 A CN 201710523575A CN 107463873 B CN107463873 B CN 107463873B
Authority
CN
China
Prior art keywords
palm
node
frame
initial image
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710523575.5A
Other languages
Chinese (zh)
Other versions
CN107463873A (en
Inventor
梁华刚
易生
孙凯
李怀德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changan University
Original Assignee
Changan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changan University filed Critical Changan University
Priority to CN201710523575.5A priority Critical patent/CN107463873B/en
Publication of CN107463873A publication Critical patent/CN107463873A/en
Application granted granted Critical
Publication of CN107463873B publication Critical patent/CN107463873B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time gesture analysis and evaluation method and system based on an RGBD depth sensor, which comprises a static gesture recognition and evaluation system of a train driver palm and a dynamic gesture recognition and evaluation system of a train driver arm, wherein the static gesture recognition and evaluation system of the train driver palm comprises a palm center position determining module, a palm area image extracting module, a denoising module and a gesture recognition and evaluation module; the train driver arm dynamic gesture recognition and evaluation system comprises an arm skeleton node motion sequence extraction module, a dynamic gesture optimal matching module and an arm dynamic gesture evaluation module. The method has strong robustness to environmental background and illumination, and adopts gesture pixel search based on palm nodes when detecting the palm gesture, so that the detection effect of the palm gesture is improved; the method and the device can monitor the gestures of the driver in real time to ensure the running safety of the train, can also avoid artificial monitoring of the gestures of the driver of the train, and reduce the consumption of human resources.

Description

Real-time gesture analysis and evaluation method and system based on RGBD depth sensor
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a real-time gesture analysis and evaluation method and system based on an RGBD depth sensor.
Background
The gesture recognition technology has important research value as one of the key technologies of the future human-computer interaction system and has wide application prospect. At present, the traditional gesture recognition method usually performs gesture detection and recognition on an input two-dimensional image, however, the method is sensitive to the input image, the gesture detection and recognition effect is good when the background is simple and the influence of ambient light is small, however, the gesture detection and recognition effect is sharply reduced when the background is complex and the change of light is large, and the application range is limited. In recent years, in order to overcome the defects of the traditional gesture recognition method, three-dimensional image sensing equipment is more and more favored by people, and the equipment can acquire not only an RGB image but also depth data of the image, so that the influence of complex environment background, illumination change and the like on gesture recognition can be avoided.
At present, gestures are widely applied in the traffic field, for example, a train driver needs to demonstrate the conditions of instrument detection in a train before, during and in the train to ensure the safe running of the train and prevent accidents, and a traffic police ensures the safe and smooth road traffic through a series of gestures. However, since the working environments of train drivers, traffic polices and other personnel are complex, the light changes greatly, the traditional method cannot effectively identify and evaluate the gestures of the drivers, and the standard gestures of the drivers are numerous, which not only include dynamic gestures on arms, but also include gestures in palm areas, thereby further increasing the difficulty of gesture identification. The traditional gesture recognition method mainly comprises two steps, namely firstly detecting a palm and an arm on a two-dimensional image, and secondly recognizing the detected gesture. Generally speaking, the quality of gesture detection directly affects the gesture recognition effect, usually the background environment and illumination will affect gesture detection, and it is a big difficulty to detect palms and arms on two-dimensional images, the traditional method is to train a large number of gesture samples to obtain classifiers, however, human hands are complex variants, gestures have the characteristics of diversity, ambiguity, time difference and the like, and it is difficult to train to obtain ideal gesture classifiers, so that the traditional gesture recognition method cannot be applied to gesture recognition of drivers and the like.
Disclosure of Invention
The invention aims to provide a real-time gesture analysis and evaluation system based on an RGBD depth sensor, which solves the problem of detecting palms and arms under complex background and illumination conditions, can analyze and evaluate various gestures of workers such as train drivers, traffic polices and the like, and has wide application prospect.
The technical scheme adopted by the invention is that,
a real-time palm gesture analysis and evaluation method based on an RGBD depth sensor comprises the following steps:
step 1, acquiring T frames of initial images within a period of time by using an RGBD sensor, wherein each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and determining the coordinates of the palm node of the T frames of initial images;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Figure GDA0001399495770000021
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
step 2, extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image, wherein the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
Figure GDA0001399495770000031
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
Figure GDA0001399495770000041
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
step 3, performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
step 4, recognizing the gesture of the palm area in the T-frame de-noised palm area image through a neural network, and obtaining the fraction P of the recognized gesture of the palm area through the formula (1)palm
Figure GDA0001399495770000042
In the formula (1), the reaction mixture is,
Figure GDA0001399495770000046
inputting the denoised palm region image of the T frame into the neural network to obtain the output result of the T frame of the neural network, wherein T is the denoised palm region imageThe total number of frames, round (·) represents an integer.
A real-time arm gesture analysis and evaluation method based on an RGBD depth sensor comprises the following steps:
step 1, acquiring a T frame initial image within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial image as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t
Figure GDA0001399495770000045
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
step 2, finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of the distances of corresponding points between the motion sequence of the arm skeleton nodes in the initial image and the motion sequence of the driver standard dynamic gesture sample in the motion sequence library of the driver standard dynamic gesture sample;
step 3, obtaining the score P of the dynamic gesture of the arm through the formula (4)arm
Figure GDA0001399495770000051
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,
Figure GDA0001399495770000052
Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
A real-time gesture analysis and evaluation method based on an RGBD depth sensor comprises the real-time palm gesture analysis and evaluation method of claim 1 and the real-time arm gesture analysis and evaluation method of claim 2.
A real-time palm gesture analysis and evaluation system based on an RGBD depth sensor comprises:
the device comprises a palm center position determining module, a palm image acquiring module and a palm image acquiring module, wherein the palm center position determining module is used for acquiring T frames of initial images in a period of time by using an RGBD sensor, each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and the palm node coordinates of the T frames of initial images are determined;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Figure GDA0001399495770000061
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiRepresenting the ith pixelAbscissa, yiThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
the palm region image extracting module is used for extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image:
the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
Figure GDA0001399495770000071
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkDenotes the k-th timeSet of gesture pixels searched, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
Figure GDA0001399495770000072
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
a denoising module: the image processing device is used for performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
gesture recognition and evaluation module: the method is used for recognizing the gesture of the palm region in the T-frame de-noised palm region image through a neural network and obtaining the fraction P of the recognized gesture of the palm region through the formula (1)palm
Figure GDA0001399495770000073
In the formula (1), the reaction mixture is,
Figure GDA0001399495770000084
and (3) inputting the denoised palm region image of the T frame into an output result of the T frame of the neural network, wherein T is the total frame number of the denoised palm region image, and round (·) represents an integer.
A real-time arm gesture analysis and evaluation system based on an RGBD depth sensor comprises:
the arm skeleton node motion sequence extraction module is used for acquiring T frame initial images within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial images as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t
Figure GDA0001399495770000083
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
the dynamic gesture optimal matching module is used for finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of corresponding point distances between the motion sequence and the arm skeleton nodes in the initial image in the motion sequence library of the driver standard dynamic gesture sample;
an arm dynamic gesture evaluation module for obtaining the score P of the arm dynamic gesture through the formula (4)arm
Figure GDA0001399495770000091
In equation (4), α is the DTW distance between the standard gesture sequence samplesFrom the average value of the values,
Figure GDA0001399495770000092
Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
A real-time gesture analysis and evaluation system based on RGBD depth sensors, comprising the real-time palm gesture analysis and evaluation system of claim 4 and the real-time arm gesture analysis and evaluation system of claim 5.
The invention has the advantages that
According to the invention, the RGBD depth sensor is adopted to obtain the depth data of the two-dimensional image, and key data such as the palm and the arm can be well extracted through a corresponding algorithm. Secondly, the method and the device can simultaneously recognize the palm gesture and the arm dynamic gesture of the driver, can perform normative evaluation on the driver gesture according to the output result of the recognition algorithm and give scores of the palm gesture and the arm dynamic gesture, not only can monitor the driver gesture in real time to ensure the running safety of the train, but also can avoid artificial monitoring on the train driver gesture and reduce the consumption of human resources.
Drawings
FIG. 1 is a flow chart of train driver palm area gesture recognition and evaluation;
FIG. 2 is a schematic view of nodes of a train driver's palm and arm;
FIG. 3 is a flow chart of train driver arm dynamic gesture recognition and evaluation;
fig. 4 is a diagram of an application scenario of the present invention.
Detailed Description
Train driver gestures usually include a palm region gesture and an arm portion dynamic gesture, and the palm region gesture and the arm portion dynamic gesture recognition process are respectively described in detail below.
Example 1
A real-time palm gesture analysis and evaluation method based on an RGBD depth sensor is characterized by comprising the following steps:
step 1, acquiring T frames of initial images within a period of time by using an RGBD sensor, wherein each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and determining the coordinates of the palm node of the T frames of initial images;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Figure GDA0001399495770000101
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
as shown in fig. 2, the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
as shown in fig. 4, the palm node coordinate P is in a coordinate system with the center of the RGBD sensor as the origin, the horizontal direction as the X axis, the vertical direction as the Y axis, and the direction of the sensor pointing to the driver as the Z axis;
because the RGBD sensor takes place node drift phenomenon easily when tracking human skeleton node, leads to the distance between palm and the sensor to have the deviation, in order to reduce the deviation, need to rectify initial palm node and obtain the accurate coordinate of palm node P:
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
step 2, extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image, wherein the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
Figure GDA0001399495770000111
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
because the W and the H in the rectangular search area can not be set too small, the change of the size of the gesture area is prevented from causing incomplete gesture detection, and the height and the width of the rectangular search area are as follows:
Figure GDA0001399495770000121
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
step 3, performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
the palm area image usually contains some noises, which include some burrs of the gesture edge and some holes inside the image, and in order to obtain a more accurate gesture image, it is necessary to perform dilation and erosion operations, the dilation can remove some burrs of the binary gesture image edge and scattered noise points, and the erosion can fill some holes inside the image.
Step 4, recognizing the gesture of the palm area in the T-frame de-noised palm area image through a neural network, and obtaining the fraction P of the recognized gesture of the palm area through the formula (1)palm
Figure GDA0001399495770000122
In the formula (1), the reaction mixture is,and (3) inputting the denoised palm region image of the T frame into an output result of the T frame of the neural network, wherein T is the total frame number of the denoised palm region image, and round (·) represents an integer.
Example 2
A real-time arm gesture analysis and evaluation method based on an RGBD depth sensor, as shown in FIG. 3, comprises the following steps:
step 1, acquiring a T frame initial image within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial image as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t
Figure GDA0001399495770000131
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
step 2, finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of the distances of corresponding points between the motion sequence of the arm skeleton nodes in the initial image and the motion sequence of the driver standard dynamic gesture sample in the motion sequence library of the driver standard dynamic gesture sample;
in this embodiment, the DTW algorithm is used to solve the problem of different lengths of the two motion sequences, and the motion sequence D of the standard dynamic gesture sample of the driver is seta=(Da 1,Da 2,...,Da T′) Motion sequence D of arm skeleton nodes in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn T) Let the point-to-point relationship between the two sequences be (k) to (phi)s(k),φa(k) Wherein 1 is less than or equal to phis(k)≤T,1≤φa(k) T 'is less than or equal to T', max (T, T ') < k is less than or equal to T + T', and the DTW algorithm aims to find out between two sequencesSuch that the sum of distances between corresponding points DTW (D) is givena,Dsn) Minimum:
Figure GDA0001399495770000132
step 3, obtaining the score P of the dynamic gesture of the arm through the formula (4)arm
Figure GDA0001399495770000141
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,
Figure GDA0001399495770000142
Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
Example 3
The embodiment provides a real-time gesture analysis and evaluation method based on an RGBD depth sensor based on embodiments 1 and 2, which includes the real-time palm gesture analysis and evaluation method provided in embodiment 1 and the real-time arm gesture analysis and evaluation method provided in embodiment 2. The embodiment can simultaneously recognize the palm gesture and the arm dynamic gesture of the driver, can perform normative evaluation on the gesture of the driver according to the output result of the recognition algorithm and give scores of the palm gesture and the arm dynamic gesture, can supervise the gesture of the driver in real time to ensure the running safety of a train, can avoid artificial gesture monitoring on the train driver, and reduces the consumption of human resources.
Example 4
The embodiment provides a static gesture recognition and evaluation system for train driver palms, as shown in fig. 1, including:
the device comprises a palm center position determining module, a palm image acquiring module and a palm image acquiring module, wherein the palm center position determining module is used for acquiring T frames of initial images in a period of time by using an RGBD sensor, each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and the palm node coordinates of the T frames of initial images are determined;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Figure GDA0001399495770000151
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
as shown in fig. 2, the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
as shown in fig. 4, the palm node coordinate P is in a coordinate system with the center of the RGBD sensor as the origin, the horizontal direction as the X axis, the vertical direction as the Y axis, and the direction of the sensor pointing to the driver as the Z axis;
because the RGBD sensor takes place node drift phenomenon easily when tracking human skeleton node, leads to the distance between palm and the sensor to have the deviation, in order to reduce the deviation, need to rectify initial palm node and obtain the accurate coordinate of palm node P:
the palm region image extracting module is used for extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image:
the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for a palm pixel point in a rectangular region with the width of W and the height of H, and filling the rectangular region with the palm pixel pointPutting the palm pixel points of the foot type (2) into a palm pixel point set SkObtaining a current palm area image;
Figure GDA0001399495770000161
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di,j) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
because the W and the H in the rectangular search area can not be set too small, the change of the size of the gesture area is prevented from causing incomplete gesture detection, and the height and the width of the rectangular search area are as follows:wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
a denoising module: the image processing device is used for performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
the palm area image usually contains some noises, which include some burrs of the gesture edge and some holes inside the image, and in order to obtain a more accurate gesture image, it is necessary to perform dilation and erosion operations, the dilation can remove some burrs of the binary gesture image edge and scattered noise points, and the erosion can fill some holes inside the image.
Gesture recognition and evaluation module: the method is used for recognizing the gesture of the palm region in the T-frame de-noised palm region image through a neural network and obtaining the fraction P of the recognized gesture of the palm region through the formula (1)palm
Figure GDA0001399495770000171
In the formula (1), the reaction mixture is,
Figure GDA0001399495770000175
and (3) inputting the denoised palm region image of the T frame into an output result of the T frame of the neural network, wherein T is the total frame number of the denoised palm region image, and round (·) represents an integer.
Example 5
The embodiment provides a dynamic gesture recognition and evaluation system for train driver arms, as shown in fig. 3, including:
the arm skeleton node motion sequence extraction module is used for acquiring T frame initial images within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial images as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t
Figure GDA0001399495770000174
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
the dynamic gesture optimal matching module is used for finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of corresponding point distances between the motion sequence and the arm skeleton nodes in the initial image in the motion sequence library of the driver standard dynamic gesture sample;
in this embodiment, the DTW algorithm is used to solve the problem of different lengths of the two motion sequences, and the motion sequence D of the standard dynamic gesture sample of the driver is seta=(Da 1,Da 2,...,Da T′) Motion sequence D of arm skeleton nodes in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn T) Let the point-to-point relationship between the two sequences be (k) to (phi)s(k),φa(k) Wherein 1 is less than or equal to phis(k)≤T,1≤φa(k) T ≦ T ', max (T, T ') ≦ k ≦ T + T ', and the DTW algorithm aims to find the optimal point pair relationship φ (k) between the two sequences, so that the sum of the distances between the corresponding points DTW (D)a,Dsn) Minimum:
an arm dynamic gesture evaluation module for obtaining the score P of the arm dynamic gesture through the formula (4)arm
Figure GDA0001399495770000182
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,
Figure GDA0001399495770000183
Da,Dbrepresenting any motion sequence in a motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, a, N, b is 1,2, a, N, a is not equal to b, and N is the total number of motion sequences in the motion sequence library of the driver standard dynamic gesture sample; dsn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T),DsnIs the motion sequence of the arm skeleton node in the initial image.
Example 6
In this embodiment, on the basis of embodiments 4 and 5, a real-time gesture analysis and evaluation system based on an RGBD depth sensor is provided, which includes the real-time palm gesture analysis and evaluation system provided in embodiment 4 and the real-time arm gesture analysis and evaluation system provided in embodiment 5. The embodiment can simultaneously recognize the palm gesture and the arm dynamic gesture of the driver, can perform normative evaluation on the gesture of the driver according to the output result of the recognition algorithm and give scores of the palm gesture and the arm dynamic gesture, can supervise the gesture of the driver in real time to ensure the running safety of a train, can avoid artificial gesture monitoring on the train driver, and reduces the consumption of human resources.

Claims (4)

1. A real-time palm gesture analysis and evaluation method based on an RGBD depth sensor is characterized by comprising the following steps:
step 1, acquiring T frames of initial images within a period of time by using an RGBD sensor, wherein each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and determining the coordinates of the palm node of the T frames of initial images;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Figure FDA0002221045120000011
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
step 2, extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image, wherein the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
Figure FDA0002221045120000021
in formula (2), k is 1,2Indicating the number of searches, dpRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di) Representing the absolute value of the difference between the distance from the palm node of the current initial image and the ith pixel point in the rectangular region to the RGBD sensor, wherein threshold represents a threshold value, is more than or equal to 25 and less than or equal to 35, and SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
Figure FDA0002221045120000022
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
step 3, performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
step 4, recognizing the gesture of the palm area in the T-frame de-noised palm area image through a neural network, and obtaining the fraction P of the recognized gesture of the palm area through the formula (1)palm
Figure FDA0002221045120000023
In the formula (1), the reaction mixture is,
Figure FDA0002221045120000031
Figure FDA0002221045120000032
inputting the denoised palm region image of the T frame into the output result of the T frame of the neural network, wherein T is the denoised output result of the T frame of the neural networkThe total frame number of the palm region image of (1), round (·) represents an integer.
2. A real-time gesture analysis and evaluation method based on an RGBD depth sensor is characterized by comprising the real-time palm gesture analysis and evaluation method and the real-time arm gesture analysis and evaluation method of claim 1;
the real-time arm gesture analysis and evaluation method comprises the following steps:
step 1, acquiring a T frame initial image within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial image as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t
Figure FDA0002221045120000033
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
step 2, finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of the distances of corresponding points between the motion sequence of the arm skeleton nodes in the initial image and the motion sequence of the driver standard dynamic gesture sample in the motion sequence library of the driver standard dynamic gesture sample;
step 3, obtaining the score P of the dynamic gesture of the arm through the formula (4)arm
Figure FDA0002221045120000041
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,
Figure FDA0002221045120000042
Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
3. A real-time palm gesture analysis and evaluation system based on an RGBD depth sensor is characterized by comprising:
the device comprises a palm center position determining module, a palm image acquiring module and a palm image acquiring module, wherein the palm center position determining module is used for acquiring T frames of initial images in a period of time by using an RGBD sensor, each frame of initial image in the T frames of initial images comprises a palm node, a wrist node and an elbow node, and the palm node coordinates of the T frames of initial images are determined;
the method comprises the following steps:
step 11, selecting one frame of initial image from the T frame of initial image as the current frame of initial image, and obtaining the coordinates of the palm node P in the current frame of initial image through the initial palm node
Figure FDA0002221045120000043
Wherein M represents the number of white pixel points in the region circle, M is a natural number greater than or equal to 1, and xiAbscissa, y, representing the ith pixeliThe ordinate, z, representing the ith pixeliRepresenting the distance from the ith pixel point to the RGBD sensor;
the region circle is an initial palm node P1A circle with the distance between the initial palm node and the wrist node as the radius and the center of the circle;
the palm node coordinate P is in a coordinate system which takes the center of the RGBD sensor as an origin, takes the horizontal direction as an X axis, takes the vertical direction as a Y axis, and takes the direction of the sensor pointing to the driver as a Z axis;
step 12, repeating step 11 until each frame initial image in the T frame initial images is used as a current frame initial image, and obtaining palm node coordinates of the T frame initial images;
the palm region image extracting module is used for extracting a palm region image of the T frame initial image according to the palm node coordinates of the T frame initial image:
the method comprises the following steps:
step 21, selecting one frame of initial image from the T frames of initial images as a current initial image, and the method for extracting the current initial image includes:
taking a palm node of the current initial image as a center, searching for palm pixel points in a rectangular region with the width of W and the height of H, and putting the palm pixel points satisfying the formula (2) into a palm pixel point set SkObtaining a current palm area image;
Figure FDA0002221045120000051
in formula (2), k is 1,2pRepresents the distance, g, between the palm node of the current initial image and the RGBD sensoriRepresents the ith pixel point, diRepresenting the distance between the ith pixel point in the rectangular area and the RGBD sensor, abs (d)p-di) Representing the palm node of the current initial image and the ith image in the rectangular areaThe absolute value of the difference between the distances from the prime point to the RGBD sensor, where threshold represents a threshold, 25 ≦ threshold ≦ 35, SkRepresenting the gesture pixel point set searched for the k time, Sk-1Representing the gesture pixel point set searched at the k-1 st time;
Figure FDA0002221045120000061
wherein (x)w,yw) And (x)e,ye) Respectively representing the coordinates of a wrist node and an elbow node in a palm node of the current initial image;
step 22, repeating the step 22 until a palm area image of the T-frame initial image is extracted;
a denoising module: the image processing device is used for performing expansion and corrosion operations on each frame of palm area image in the palm area image of the T frame initial image to obtain a T frame denoised palm area image;
gesture recognition and evaluation module: the method is used for recognizing the gesture of the palm region in the T-frame de-noised palm region image through a neural network and obtaining the fraction P of the recognized gesture of the palm region through the formula (1)palm
Figure FDA0002221045120000062
In the formula (1), the reaction mixture is,
Figure FDA0002221045120000063
and (3) inputting the denoised palm region image of the T frame into an output result of the T frame of the neural network, wherein T is the total frame number of the denoised palm region image, and round (·) represents an integer.
4. A real-time gesture analysis and evaluation system based on an RGBD depth sensor, comprising the real-time palm gesture analysis and evaluation system and the real-time arm gesture analysis and evaluation system of claim 3;
the real-time arm gesture analysis and evaluation system comprises:
the arm skeleton node motion sequence extraction module is used for acquiring T frame initial images within a period of time by using an RGBD sensor, selecting one frame initial image from the T frame initial images as a T frame initial image, and extracting an arm skeleton node motion sequence of the T frame initial image;
the method comprises the following steps:
the t-th frame initial image comprises an initial palm node P1 tWrist node P2 tElbow joint P3 tShoulder node P4 tAnd shoulder center node Ps tObtaining a node P by the formula (3)n tTo shoulder center node Ps tDistance D ofsn t
In formula (3), n is 1,2,3,4, T is 1,2sn tIndicating the node P in the initial image of the t-th framen tTo shoulder center node Ps tT is the total frame number of the initial image, xn t,yn t,zn tRespectively representing coordinate values of a palm node, a wrist node, an elbow node and a shoulder node in the t-th frame initial image; x is the number ofs t,ys t,zs tRepresenting the coordinates of the shoulder center node in the t-th frame initial image;
obtaining the motion sequence D of the arm skeleton node in the initial imagesn=(Dsn 1,Dsn 2,...,Dsn t,...,Dsn T) The arm skeleton nodes comprise palm nodes, wrist nodes, elbow nodes and shoulder nodes;
the dynamic gesture optimal matching module is used for finding the motion sequence of the driver standard dynamic gesture sample with the minimum sum of corresponding point distances between the motion sequence and the arm skeleton nodes in the initial image in the motion sequence library of the driver standard dynamic gesture sample;
an arm dynamic gesture evaluation module for obtaining the score P of the arm dynamic gesture through the formula (4)arm
Figure FDA0002221045120000072
In equation (4), α is the average of the DTW distances between the standard gesture sequence samples,
Figure FDA0002221045120000073
Da,Dbrepresenting any motion sequence in the motion sequence library of the driver standard dynamic gesture sample, wherein a is 1,2, and N, b is 1, 2.
CN201710523575.5A 2017-06-30 2017-06-30 Real-time gesture analysis and evaluation method and system based on RGBD depth sensor Expired - Fee Related CN107463873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710523575.5A CN107463873B (en) 2017-06-30 2017-06-30 Real-time gesture analysis and evaluation method and system based on RGBD depth sensor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710523575.5A CN107463873B (en) 2017-06-30 2017-06-30 Real-time gesture analysis and evaluation method and system based on RGBD depth sensor

Publications (2)

Publication Number Publication Date
CN107463873A CN107463873A (en) 2017-12-12
CN107463873B true CN107463873B (en) 2020-02-21

Family

ID=60546461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710523575.5A Expired - Fee Related CN107463873B (en) 2017-06-30 2017-06-30 Real-time gesture analysis and evaluation method and system based on RGBD depth sensor

Country Status (1)

Country Link
CN (1) CN107463873B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6962878B2 (en) * 2018-07-24 2021-11-05 本田技研工業株式会社 Operation assistance system and operation assistance method
CN110032957B (en) * 2019-03-27 2023-10-17 长春理工大学 Gesture spatial domain matching method based on skeleton node information
CN110175566B (en) * 2019-05-27 2022-12-23 大连理工大学 Hand posture estimation system and method based on RGBD fusion network
CN110717385A (en) * 2019-08-30 2020-01-21 西安文理学院 Dynamic gesture recognition method
CN113657346A (en) * 2021-08-31 2021-11-16 深圳市比一比网络科技有限公司 Driver action recognition method based on combination of target detection and key point detection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923669A (en) * 2008-07-18 2010-12-22 史迪芬·凯斯 Intelligent adaptive design
CN103914132A (en) * 2013-01-07 2014-07-09 富士通株式会社 Method and system for recognizing gestures based on fingers
CN103926999A (en) * 2013-01-16 2014-07-16 株式会社理光 Palm opening and closing gesture recognition method and device and man-machine interaction method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923669A (en) * 2008-07-18 2010-12-22 史迪芬·凯斯 Intelligent adaptive design
CN103914132A (en) * 2013-01-07 2014-07-09 富士通株式会社 Method and system for recognizing gestures based on fingers
CN103926999A (en) * 2013-01-16 2014-07-16 株式会社理光 Palm opening and closing gesture recognition method and device and man-machine interaction method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Finger detection and hand posture recognition based on depth information;Stergios Poularakis等;《IEEE International Conference on Acoustics, Speech and Processing》;20140531;1-5页 *
基于Kinect骨架信息的交通警察手势识别;刘阳等;《计算机工程与应用》;20150331(第3期);157-161页 *

Also Published As

Publication number Publication date
CN107463873A (en) 2017-12-12

Similar Documents

Publication Publication Date Title
CN107463873B (en) Real-time gesture analysis and evaluation method and system based on RGBD depth sensor
US11643076B2 (en) Forward collision control method and apparatus, electronic device, program, and medium
CN111611643B (en) Household vectorization data acquisition method and device, electronic equipment and storage medium
WO2019232894A1 (en) Complex scene-based human body key point detection system and method
CN109559330B (en) Visual tracking method and device for moving target, electronic equipment and storage medium
CN108256421A (en) A kind of dynamic gesture sequence real-time identification method, system and device
CN106934333B (en) Gesture recognition method and system
CN106600625A (en) Image processing method and device for detecting small-sized living thing
CN109145696B (en) Old people falling detection method and system based on deep learning
Rahman et al. Person identification using ear biometrics
Kalsh et al. Sign language recognition system
CN111914832B (en) SLAM method of RGB-D camera under dynamic scene
CN113608663B (en) Fingertip tracking method based on deep learning and K-curvature method
CN110796101A (en) Face recognition method and system of embedded platform
US9160986B2 (en) Device for monitoring surroundings of a vehicle
Chansri et al. Reliability and accuracy of Thai sign language recognition with Kinect sensor
CN115527269A (en) Intelligent human body posture image identification method and system
KR20190050551A (en) Apparatus and method for recognizing body motion based on depth map information
CN103426000A (en) Method for detecting static gesture fingertip
CN102609727A (en) Fire flame detection method based on dimensionless feature extraction
CN109492573A (en) A kind of pointer read method and device
CN117409386A (en) Garbage positioning method based on laser vision fusion
CN112381747A (en) Terahertz and visible light image registration method and device based on contour feature points
CN103093481A (en) Moving object detection method under static background based on watershed segmentation
CN111860084A (en) Image feature matching and positioning method and device and positioning system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200221