CN112084898B

CN112084898B - Assembly operation action recognition method based on static and dynamic separation

Info

Publication number: CN112084898B
Application number: CN202010863071.XA
Authority: CN
Inventors: 刘永; 杨明顺; 高新勤; 万鹏; 李斌鹏; 史晟睿; 乔琦; 王祥
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2024-02-09
Anticipated expiration: 2040-08-25
Also published as: CN112084898A

Abstract

The invention discloses an assembly operation action recognition method based on static and dynamic separation, which specifically comprises the following steps: collecting action gestures, and dividing original data corresponding to the action gestures into training samples and recognition samples; extracting effective data segments from the collected original data of the action gestures; calculating a characteristic threshold according to the action gestures in the training sample, and dividing the action gestures in the recognition sample according to the characteristic threshold; and respectively inputting the characteristic values of the gesture unchanged and gesture changed in the divided recognition samples into a KNN recognition model and a GMM-HMM recognition model for training to respectively obtain the recognition models of the gesture unchanged and gesture changed. According to the gesture recognition method, the characteristic values of the gesture invariant type gesture and the gesture variant type gesture are analyzed, the extraction of the threshold is completed, the gesture types are effectively distinguished according to the threshold, the recognition models of the gesture are respectively obtained, and the recognition accuracy and the recognition speed are improved.

Description

Assembly operation action recognition method based on static and dynamic separation

Technical Field

The invention belongs to the technical field of gesture motion recognition methods, and relates to an assembly operation motion recognition method based on static and dynamic separation.

Background

The gesture recognition technology is used as a key technology for realizing an emerging intelligent man-machine interaction system and a virtual reality system, and research theory and method of the gesture recognition technology are widely focused at home and abroad.

Whether based on gesture recognition research of wearable equipment or based on gesture recognition of computer vision, the recognition process of the wearable equipment comprises five steps of gesture data acquisition, gesture feature recognition, gesture tracking, gesture classification and gesture instruction mapping. The research of methods such as gesture feature recognition, gesture tracking, gesture classification and the like is a key for solving the problem of limb or gesture recognition, and high attention of students is obtained. Algorithms such as hidden Markov, support vector machine, neural network and deep learning become a main means of gesture recognition, and after a part of algorithms are improved to a certain extent, the accuracy of recognition can reach more than 95%, so that a solid foundation is laid for further research of human-machine cooperation.

In general, the research results about gesture recognition are quite rich and comprehensive, but have a few defects. In the aspects of data preprocessing and feature extraction, most researches are to directly set features for training, learning and recognition, and the feature values are not analyzed, so that the recognition effects can be distinguished due to the difference of the feature values; in the aspect of gesture recognition methods, most researches adopt a single recognition scheme, and the validity of the recognition scheme is rarely verified.

Disclosure of Invention

The invention aims to provide an assembly operation action recognition method based on static and dynamic separation, which is used for analyzing characteristic values of two gestures of a gesture invariant type and a gesture variant type to finish extraction of a threshold value, effectively distinguishing gesture types according to the threshold value, respectively obtaining recognition models of the two gestures, and improving recognition accuracy and recognition speed.

The technical scheme adopted by the invention is that the assembly operation action recognition method based on static and dynamic separation is implemented according to the following steps:

step 1, dividing a complete operation action into n action gestures, collecting M groups of original data by each action gesture, and dividing the original data corresponding to the action gesture into a training sample and an identification sample;

step 2, extracting effective data segments from the original data of each action gesture acquired in the step 1;

step 3, calculating a characteristic threshold according to the action gestures in the training sample, and dividing the action gestures in the recognition sample into gesture-changing gestures and gesture-unchanged gestures according to the characteristic threshold;

step 4, inputting the characteristic value of the gesture with unchanged gesture in the recognition sample divided in the step 3 into a KNN recognition model for training to obtain a recognition model of the gesture with unchanged gesture;

and 5, inputting the characteristic values of the gesture in the gesture change type gesture in the recognition sample divided in the step 3 into a GMM-HMM recognition model for training to obtain a recognition model of the gesture change type gesture.

The present invention is also characterized in that,

the raw data of each action gesture in step 1 includes five finger tip coordinates of a human hand: f (F) _ti (i=1, 2,3,4, 5), five heel joint coordinates of a human hand:finger length:Palm point coordinates: p, palm normal vector: h, pointing to the inner side of the palm, and pointing to the palm: f, the palm points to the direction of the finger, and the speed of the finger tips of five fingers: v (V) _ti (i=1, 2,3,4, 5), palm rate: v (V) _z 。

The extracting of the effective data segment in the step 2 specifically comprises the following steps:

setting a speed threshold V, respectively taking out an active segment interval with the speed value of palm center speed, thumb tip speed, index finger tip speed, middle finger tip speed, ring finger tip speed and little finger tip speed not smaller than the threshold V in continuous N frames, and taking the active segment interval in the original data as effective data to finish the extraction of the effective segment data, wherein M groups of the original data of the same action gesture correspondingly extract M groups of effective segment data.

The step 3 is specifically as follows: step 3.1, dividing each action gesture in the training sample into a gesture change type and a gesture invariant type according to the gesture of each action gesture;

step 3.2, calculating the upward pitching angle alpha of each finger plane in each frame of action gesture in each effective data segment corresponding to each action gesture according to the effective segment data extracted in the step 2 _i Finger opening degree beta _i Bending μ of finger _i And as a feature value, i=1, 2,3,4,5, each motion gesture j-th frame gesture feature describes a 15-dimensional feature vector:

O _j ＝f _j (α ₁ ，...，α ₅ ，β ₁ ，...，β ₅ ，μ ₁ ，...，μ ₅ )；

step 3.2, obtaining a difference value by taking the difference between the maximum value and the minimum value corresponding to each feature value in the N frames of motion gestures in the same effective data segment corresponding to the same motion gesture in the training sample, taking the difference value as a gesture state change amount feature value, and marking the obtained gesture state change amount feature vector as:

C _m ＝(α _1c ，...，α _5c ，β _1c ，...，β _5c ，μ _1c ，...，μ _5c )

wherein m=1, 2, M, C _m Representing a gesture state change amount feature vector corresponding to the mth valid data segment corresponding to the same action gesture, alpha _1c ，...，α _5c Respectively representing the minimum value of the upward inclination of each finger plane in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture; beta _1c ，...，β _5c Respectively representing the extremely poor value of each finger opening degree in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture; mu (mu) _1c ，...，μ _5c Respectively representing the extremely-difference value of the bending degree of each finger in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture;

step 3.3, comparing each characteristic value in the M characteristic vectors of the gesture state change amounts corresponding to each action gesture to obtain a maximum value and a minimum value of a difference value of the upward inclination of each finger plane, a difference value of the bending of the finger and a difference value of the bending of the finger corresponding to each action gesture respectively;

step 3.4, assuming that the training sample divided in step 3.1 has a gesture-changing action gestures, b gesture-unchanged action gestures, comparing the maximum value of the difference value of the upward degree of each finger plane, the difference value of the bending degree of the finger and the difference value of the bending degree of the finger corresponding to the b gesture-invariant action gestures divided in the step 3.1, taking the maximum value of the maximum values of the b difference values, comparing the corresponding difference value of the upward inclination degree of each finger plane, the difference value of the bending degree of the finger and the minimum value of the difference value of the bending degree of the finger, which are divided in the step 3.1, taking the minimum value of the minimum values of the a difference values, taking the difference between the maximum value of the maximum values of the b difference values corresponding to each characteristic value and the minimum value of the minimum values of the a difference values, taking the characteristic that the difference value is smaller than 0 as a threshold value distinguishing characteristic, and taking the gesture difference value corresponding to the characteristic as a characteristic threshold value;

and 3.5, selecting a feature threshold corresponding to any feature, calculating a very poor value corresponding to the feature of the action gesture in the recognition sample, if the feature threshold is large, recognizing the action gesture corresponding to the sample as a gesture change type gesture, and if the feature threshold is small, recognizing the action gesture corresponding to the sample as a gesture non-change type gesture.

Alpha in step 3.2 _1c ，...，α _5c The calculation method comprises the following steps: alpha _ic ＝α _maxc -α _minc ；

Wherein i=1, 2,3,4,5, α _maxc Representing maximum value of upward pitch of finger plane corresponding to N frames of motion gestures in the same effective data segment, alpha _minc Representing the minimum value of the upward pitch of the finger plane corresponding to N frames of action gestures in the same effective data segment;

β _1c ，...，β _5c the calculation method comprises the following steps: beta _ic ＝β _maxc -β _minc ；

Wherein beta is _maxc Representing the maximum value, beta, of the finger opening corresponding to N frames of motion gestures in the same effective data segment _minc Representing the minimum value of the finger opening corresponding to the N frames of action gestures in the same effective data segment;

μ _1c ，...，μ _5c the calculation method comprises the following steps: mu (mu) _ic ＝μ _maxc -μ _minc ；

Wherein mu _maxc Represents the maximum value, mu, of the bending degree of the finger corresponding to N frames of motion gestures in the same effective data segment _minc And representing the minimum value of the finger bending corresponding to the N frames of action gestures in the same effective data segment.

The calculation method of the upward camber of the finger plane comprises the following steps:

the calculation method of the finger opening degree comprises the following steps:

when i=5, let i+1=1, then β ₅ The angle value between the thumb and the little finger;

the calculation method of the finger curvature comprises the following steps:

L _i the length of a finger refers to the total length of three segments of condyles from the fingertip to the root joint of the finger.

The step 4 is specifically as follows:

the KNN algorithm is adopted as a static gesture recognition algorithm, a KNN recognition model is established, and 15-dimensional feature vectors of the gesture with the unchanged gesture in the recognition sample divided in the step 3 are extracted, namely: the j-th frame gesture feature of the motion gesture describes a 15-dimensional feature vector:

O _j ＝f _j (α ₁ ，...，α ₅ ，β ₁ ，...，β ₅ ，μ ₁ ，...，μ ₅ )

and then the gesture recognition model is sent into a KNN model for training, and a gesture recognition model with unchanged gesture is obtained.

The step 5 is specifically as follows:

establishing a GMM-HMM recognition model, and extracting 19-dimensional feature vectors of gesture change type gestures in recognition samples divided in the step 3, namely, gesture features of a j frame of action gestures:

g _j ＝f _j (α ₁ ，...，α ₅ ，β ₁ ，...，β ₅ ，μ ₁ ，...，μ ₅ ，w _hj ，w _fj ，a _j ，b _j )

then the gesture is sent into a GMM-HMM recognition model for training to obtain a gesture recognition model with a gesture of a gesture changing type, wherein w _hj Representing the palm direction h of the current frame _j Direction h corresponding to the palm center of the previous frame _j+1 Included angle w of _fj Representing the palm direction f of the current frame _j With the palm direction f of the previous frame _j+1 Is included in the plane of the first part; a, a _j 、b _j Two characteristic values of a two-dimensional displacement characteristic.

w _hj The calculation method of (1) is as follows:

w _fj the calculation method of (1) is as follows:

the two-dimensional displacement characteristic determining method comprises the following steps:

removing Y-axis data in the palm coordinate P three-dimensional data, taking the palm coordinate of the previous frame of continuous data as an origin, and taking the palm coordinate P of the current frame as an origin _xzj Projecting to an XOZ plane chain code disc to obtain P _xzj The region number where the projection is located is used as the chain code of the current frameValue a _j ；

Removing Z-axis data in the palm coordinate P three-dimensional data, taking the palm coordinate of the previous frame of continuous data as an origin, and taking the next frame of data P _xyj Projecting to an XOY plane chain code disk to obtain P _xyj The region number where the projection is located is used as the chain code value b of the current frame _j To form two-dimensional displacement characteristics [ a ] _j ,b _j ]；

The specific calculation is as follows:

where j=1, 2, …, N.

The beneficial effects of the invention are as follows: according to the invention, the hand description features are divided into hand displacement features, rotation features and gesture features, the original data are divided into gesture invariant and gesture variant, the state of the current hand and the change of the hand state in a continuous sequence can be completely described, the motion data can be effectively segmented by adopting a speed threshold, and the extraction task of effective segment data is completed; by analyzing the characteristic values of the two gestures, the characteristic types and the threshold values which are suitable for distinguishing the two different types of gestures are obtained, the extraction of the state change quantity threshold values is completed, the correct distinguishing of test data is realized, the distinguishing of the gesture types is effectively carried out, the modeling is carried out on the data which are not deformed in state by adopting a gesture recognition method (KNN) facing a single frame, the modeling is carried out on the state change type data by adopting a modeling method of GMM-HMM, the construction of a static and dynamic separation recognition model is completed, and the recognition accuracy and recognition speed are improved.

Drawings

FIG. 1 is a flow chart of an overall method for identifying assembly operation actions based on static and dynamic separation in the invention;

FIG. 2 is a diagram depicting palm orientation in the method for identifying assembly work actions based on static and dynamic separation of the present invention;

FIG. 3 is a diagram of a chain code wheel in the method for identifying the assembly operation action based on static and dynamic separation;

FIG. 4 is a schematic diagram of the GMM-HMM algorithm recognition in the static and dynamic separation-based assembly operation action recognition method of the present invention;

FIG. 5 is a plot of x, y, z coordinates of data points of a human hand in a resting state in an example of an assembly operation action recognition method based on static and dynamic separation in accordance with the present invention;

FIG. 6 is a plot of x, y, z coordinates of hand data points processed by a window moving average method in an example of an assembly operation action recognition method based on static and dynamic separation in accordance with the present invention;

FIG. 7 is a principal component analysis chart of PCA dimension reduction data in an example of an assembly operation action recognition method based on static and dynamic separation;

FIG. 8 is a diagram of valid segment data extraction in an example of an assembly job action recognition method based on static and dynamic separation in accordance with the present invention;

FIG. 9 is a graph comparing the finger lift with the characteristic values of the fingertip angles of nine gestures in an example of the assembly operation action recognition method based on static and dynamic separation;

FIG. 10 is a graph comparing nine characteristic values of bending of a gesture finger in an example of an assembly operation action recognition method based on static and dynamic separation according to the present invention;

FIG. 11 is a graph of nine characteristic hand orientation changes in an example of an assembly job action recognition method based on static and dynamic separation according to the present invention;

FIG. 12 is a graph of nine direction angle quantization characteristic values in an example of the method for recognizing the assembly operation action based on static and dynamic separation according to the present invention;

FIG. 13 is a graph of nine gesture single feature range comparisons in an example of an assembly job action recognition method based on static and dynamic separation of the present invention;

FIG. 14 is a diagram of classification of test data in an example of an assembly job action recognition method based on static and dynamic separation in accordance with the present invention.

Detailed Description

The invention will be described in detail below with reference to the drawings and the detailed description.

The invention relates to an assembly operation action recognition method based on static and dynamic separation, which is implemented according to the following steps, wherein the flow of the assembly operation action recognition method is shown in figure 1:

step 1, dividing a complete operation action into n action gestures, collecting M groups of original data by each action gesture, and dividing the original data corresponding to the action gesture into a training sample and an identification sample; wherein, the raw data of each action gesture comprises five finger tip coordinates of a human hand: f (F) _ti (i=1, 2,3,4, 5), five heel joint coordinates of a human hand: f (F) _bi (i=1, 2,3,4, 5), finger length: l (L) _i (i=1, 2,3,4, 5), palm point coordinates: p, palm normal vector: h, pointing to the inner side of the palm, and pointing to the palm: f, the palm points to the direction of the finger, as shown in fig. 2, the speed of the finger tips of the five fingers: v (V) _ti (i=1, 2,3,4, 5), palm rate: v (V) _z ；

Step 2, extracting effective data segments of the original data of each action gesture acquired in the step 1, specifically:

setting a speed threshold V, respectively taking out an active segment interval with the speed value of not less than the threshold V in N continuous frames, namely the palm center speed, the thumb tip speed, the index finger tip speed, the middle finger tip speed, the ring finger tip speed and the little finger tip speed, and taking the active segment interval in the original data as effective data to finish the extraction of the effective segment data, wherein M groups of the original data of the same action gesture correspondingly extract M groups of effective segment data;

the method comprises the following steps:

step 3.1, dividing each action gesture in the training sample into a gesture change type and a gesture invariant type according to the gesture of each action gesture;

step 3.2, calculating the upward pitching angle alpha of each finger plane in each frame of action gesture in each effective data segment corresponding to each action gesture according to the effective segment data extracted in the step 2 _i Finger opening degree beta _i AndBending mu of finger _i And as a feature value, i=1, 2,3,4,5, each motion gesture j-th frame gesture feature describes a 15-dimensional feature vector:

wherein m=1, 2, M, C _m Representing a gesture state change amount feature vector corresponding to the mth valid data segment corresponding to the same action gesture, alpha _1c ，...，α _5c Respectively representing the minimum value of the upward inclination of each finger plane in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture; beta _1c ，...，β _5c Respectively representing the extremely poor value of each finger opening degree in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture; mu (mu) _1c ，...，μ _5c Respectively representing the extreme value of the bending degree of each finger in N frames of motion gestures in the mth effective data segment corresponding to the same motion gesture, wherein alpha _1c ，...，α _5c The calculation method comprises the following steps:

α _ic ＝α _maxc -α _minc ；

Wherein mu _maxc Represents the maximum value, mu, of the bending degree of the finger corresponding to N frames of motion gestures in the same effective data segment _minc Representing the minimum value of the finger bending corresponding to the N frames of action gestures in the same effective data segment;

the calculation method of the finger curvature comprises the following steps:

L _i the length of the finger is the total length of three sections of condyles from the fingertip to the root joint of the finger;

Step 4, inputting the characteristic value of the gesture with unchanged gesture in the recognition sample divided in the step 3 into a KNN recognition model for training to obtain a recognition model of the gesture with unchanged gesture; the method comprises the following steps: the KNN algorithm is adopted as a static gesture recognition algorithm, a KNN recognition model is established, and 15-dimensional feature vectors of the gesture with the unchanged gesture in the recognition sample divided in the step 3 are extracted, namely: the j-th frame gesture feature of the motion gesture describes a 15-dimensional feature vector:

then the gesture recognition model is sent into a KNN model for training, and a gesture recognition model with unchanged gesture is obtained;

step 5, inputting the characteristic values of the gesture in the gesture change type gesture in the recognition sample divided in the step 3 into a GMM-HMM recognition model for training to obtain a gesture change type gesture recognition model, wherein a GMM-HMM algorithm recognition schematic diagram is shown in fig. 4, and specifically comprises the following steps:

then the gesture is sent into a GMM-HMM recognition model for training to obtain a gesture recognition model with a gesture of a gesture changing type, wherein w _hj Representing the palm direction h of the current frame _j Direction h corresponding to the palm center of the previous frame _j+1 Included angle w of _fj Representing the palm direction f of the current frame _j With the palm direction f of the previous frame _j+1 Is included in the plane of the first part; a, a _j 、b _j Two eigenvalues of a two-dimensional displacement feature, where w _hj The calculation method of (1) is as follows:

w _fj the calculation method of (1) is as follows:

removing Y-axis data in the palm coordinate P three-dimensional data, taking the palm coordinate of the previous frame of continuous data as an origin, and taking the palm coordinate P of the current frame as an origin _xzj The projected XOZ plane chain code disc, as shown in fig. 3, is 16 sector discs equally divided on the disc, and P is obtained _xzj The region number where the projection is located is used as the chain code value a of the current frame _j ；

Removing Z-axis data in palm coordinate P three-dimensional data to continuously data the palm sitting position of the previous frameMarked as origin, the next frame of data P _xyj Projecting to an XOY plane chain code disk to obtain P _xyj The region number where the projection is located is used as the chain code value b of the current frame _j To form two-dimensional displacement characteristics [ a ] _j ，b _j ]The method comprises the steps of carrying out a first treatment on the surface of the The specific calculation is as follows:

where j=1, 2, …, N.

Examples

In the assembly operation scene, nine gesture motion libraries are obtained by analyzing common bolt assembly cases and more complex part of ECU assembly cases of manufacturing enterprises, as shown in Table 1.

Table 1 hand-held action library

1. Action classification

For the summarized 9 hand-shaped actions, the hand-shaped actions are classified into gesture-changing type and gesture-unchanged type according to three description features of the hand displacement features, the rotation features and the gesture features, and specific classifications are shown in table 2.

TABLE 2 action classification

2. Data acquisition

And carrying out data acquisition according to the hand shape of the set operation action gesture library, acquiring gesture data of 4 subjects in total in experiments, wherein each person acquires 9 gestures defined in the text, each gesture comprises 50 samples, after the acquisition is completed, integrating all sample data and storing the sample data in a CSV file, 360 samples in 450 samples acquired by each person are used for training a model, and the rest 90 samples are used for testing the accuracy of the model. 26 parameters including the speed, position and time of the palm center and five fingers are collected through the Leap Motion. The names and meanings of the parameters at the time of experimental data collection are shown in Table 3.

Table 3 collection parameter annotation table

3. Data preprocessing

Firstly, extracting the characteristics of original data, then adopting a certain data dimension reduction method to solve the problem of multi-characteristic data redundancy, reducing the data magnitude, and finally using a data standardization method to improve the comparability of the data, wherein the main adopted methods are data regularity, PCA dimension reduction and Z-SCORE standardization.

(1) Denoising data: as shown in FIG. 5, a-c in FIG. 5 are respectively line diagrams of three-dimensional x, y and z coordinate values of the thumb root of continuous 1000 frames of data in a certain static state, and as shown in FIG. 5, the equipment acquisition values have the problems of larger fluctuation, discontinuous data acquisition values, too many data peak values and the like, and do not accord with the actual hand movement coordinate conditions. The original value is smoothed by using a K-approach method based on distance detection, the K-approach method filters the original data by adopting a window moving average mode, and the average value of continuous data frames is used for replacing the original value, so that the preprocessing of the data is completed. The K adjacent value parameter is taken as 20, the coordinate broken line processed by the window moving average method is shown in fig. 6, and the processed X, Y and Z coordinate data are closer to the actual condition and are more continuous.

(2) PCA dimension reduction: for the 15-dimensional pose invariant hand pose data of this example, the feature dimension is reduced from 15 dimensions to 5 dimensions by principal component analysis after dimension reduction as shown in fig. 7.

(3) Z-Score normalization: when the human hand expression model has a part of parameters as angle values, the acquired data are different in size due to the personalized difference of human hands when facing a new sample, and the recognition error caused by the personalized difference is reduced by a Z-Score method.

4. Analysis and acquisition of hand state thresholds

(1) Effective operation section segmentation and acquisition: the example sets the data segment with the continuous 20 frames larger than the 20000 speed threshold as the effective data segment in the operation sequence, and realizes the data segmentation of continuous operation. Taking screwing motion consisting of six motion gestures of hand extension, grabbing, moving, sleeving, rotating and releasing as an example, the example firstly obtains speed values of a palm center and a five finger fingertip, takes out 6 movable section intervals with speed values larger than 20000 in 20 continuous frames, and then takes the data of the section interval in the original data as effective data to finish the extraction of the effective section data. As shown in fig. 8, (a) in fig. 8 is the speed of five fingertips and palms of the raw data of the screwing operation, (b) is the length of each active segment interval after the extraction of the active segment, and (c) is the speed of the active segment after the extraction. As can be taken from fig. 8, the motion data can be effectively segmented by using the speed threshold, and the task of extracting the effective segment data is completed.

(2) Action threshold extraction based on morphological feature change: the example obtains a characteristic parameter image suitable for distinguishing gesture with unchanged gesture from gesture change type through experiments on the basis of the gesture feature, the hand displacement feature and the hand rotation feature, and compares and calculates characteristic values of two gesture state change amounts of a training sample to obtain a characteristic value most suitable for distinguishing threshold characteristic types. The 15-dimensional gesture features of the nine actions summarized herein are classified, extracted and plotted according to the kinds of feature values. The finger-to-finger angle for the nine actions is shown as f in FIG. 9 ₁₁ To f ₁₅ The finger lift is shown as f in the figure ₃₁ To f ₃₅ The abscissa in the figure is a time frame, and the ordinate is an angle value. As shown in fig. 9, a-f are angle characteristic values of six gestures including hand extension, object movement, sleeving, pressing in, pulling out and prying, g-i are angle characteristic values of three gestures including grabbing, rotating and releasing, respectively, as shown in fig. 9, the first six gestures are gesture without deformation of the gesture defined in the example, and the last three are gesture with deformation of the gesture. In FIG. 9 aIn the step f, the single-feature angle change value is smaller, so that the angle information is not obviously changed when the motion gesture moves, and in the step g-i of fig. 9, the single-feature angle change amplitude is larger, so that the angle information is obviously changed in the movement process. The bending degree of the finger is shown in fig. 10, and compared with the angle information in fig. 9, the bending degree of the finger in the first six diagrams in fig. 10 and the bending degree change information in the last three diagrams are not large, and the single characteristic range is within the change range of 0.4. However, for the front-back comparison, the change amplitude of a certain bending characteristic value of grabbing, rotating and releasing hands can still be obtained, and the change amplitude of bending of six gestures of stretching hands, moving objects, sleeving in, pressing in, pulling out and prying is obviously larger. In addition to the gesture feature, the example compares the hand orientation change value feature of nine actions with the direction angle quantization value, where the hand orientation change value feature is shown in fig. 11, and the abscissa in the figure is a time frame, and the ordinate is an angle value. The quantized value characteristic of the direction angle is shown in fig. 12, in which the abscissa represents a time frame and the ordinate represents a chain code value. The hand orientation feature and direction angle quantization value feature difference between the gesture of the gesture variation type and the gesture of the gesture non-variation type are small relative to the 15-dimensional human hand state feature, and thus it is difficult to distinguish by means of the two feature gestures. Through the above analysis, 15 gesture features may be temporarily determined, suitable as feature threshold categories to distinguish between two data types. The example carries out the extremely-poor value operation processing on the collected 40 multiplied by 5 group training sample data characteristic values, and obtains the characteristic value of the gesture state change quantity by calculating the difference between the maximum value and the minimum value in the effective section of the data. At this time, the very poor comparison value of the gesture characteristics of the gesture invariant gesture and the gesture variant gesture in the multiple groups of data can be obtained, and the comparison result is shown in table 4.

Table 4 gesture characteristics extremely poor contrast table of gesture invariant type gesture and gesture variant type gesture

The features 1, 4, 7, 10 and 13 compared in the table are respectively upper elevation angle feature extreme values of the thumb, the index finger, the middle finger, the ring finger and the pinky, the features 2, 5, 8, 11 and 14 are respectively bending ratio feature extreme values of the thumb, the index finger, the middle finger, the ring finger and the pinky, and the features 3, 6, 9, 12 and 15 are respectively three-dimensional angle feature extreme values of the thumb and the index finger, the index finger and the middle finger, the middle finger and the ring finger, and the pinky. The maximum, average and minimum values in the table refer to the maximum, average and minimum values of the characteristic difference values of the gesture samples in the total 160 groups of training samples of 4×40. Therefore, the visual observation shows that the characteristics of different gestures are greatly different, and the gesture can be effectively characterized.

In order to better compare the above features, the hand gesture difference value of the maximum value of the extreme differences in the previous six gesture non-deformation gestures is taken as a boundary line, and the maximum value of the first six gesture features and the minimum value of the last three gesture features are selected for drawing, so that a single feature comparison result is shown in fig. 13.

For all data sets, the example adopts the difference between the maximum value of the gesture with unchanged gesture and the minimum value of the gesture with changed gesture, and takes the characteristic with the difference less than 0 as the threshold distinguishing characteristic. The differences are shown in table 5.

TABLE 5 gesture-invariant gesture and gesture-variant gesture list feature difference List

As can be seen from the above analysis, when the maximum value of the single feature of the gesture range of the gesture with unchanged gesture is taken to be different from the minimum value of the single feature of the gesture with unchanged gesture, the feature 1, the feature 4 and the feature 13 can be obtained to be the gesture range of the gesture with unchanged gesture greater than the gesture range of the gesture with unchanged gesture, so that the feature 1, the feature 4 and the feature 13 are taken as the data types for distinguishing the gesture with unchanged gesture from the gesture with unchanged gesture, and at the moment, the three feature thresholds are respectively the gesture range values of 38.15, 30.50 and 14.6.

Test set 40 sets of data were tested using feature 1, feature 4, and feature 13, with the test results shown in fig. 14. In the figure, nine kinds of motion data of extending, moving, telescoping, pushing in, pulling out, prying, grabbing, rotating, and releasing are spliced in order, and then a threshold test experiment is performed, fig. 14 (a) and 14 (b) are differential graphs for classifying data by means of a feature value 1 and a feature value 4, and fig. 14 (c) is a data classification graph for a feature value 13. From fig. a and b, it can be derived that feature 1 and feature 4 have good distinguishability for the grab and release gestures in the gesture variant from the gesture invariant gestures. From fig. 14 (c), it can be derived that the feature 13 has good distinction between the gesture-change type rotation gesture and the gesture-non-change type gesture.

From the above test results, it can be obtained that the threshold value adopted can achieve 100% correct discrimination of test data in 360 samples in total of 4×10×9 of the experimental test. Thus, by combining 3 feature thresholds, it is possible to effectively distinguish gesture types.

5. Recognition experiment and recognition result analysis

5.1 gesture experiment with unchanged gesture

In the case experiment, firstly, the 6 gesture data which are obtained in advance and have no deformation are processed, then, the training data are preprocessed, the characteristics are extracted, the dimension is reduced, and the single frame data at any position in the effective data section are taken as input data. For gesture recognition with no variation of gesture, the total number of recognition samples is 6×40×4 total 960, and the total of 240 pieces of data of 6×10×4 are used for testing the accuracy of the obtained model.

The example sends the processed data into a KNN model, Z-Score is selected for data processing in KNN model construction, parameter K=5 is used for recognition model construction, the result shows that the model recognition rate reaches 100%, the time consumption is 0.62ms, and the recognition accuracy and recognition speed are greatly improved.

5.2 continuous operation gesture recognition experiments

The experiment is carried out on 3 gesture data with changed gestures, a 3-channel GMM-HMM recognition model is established, and recognition rate is obtained on test sample data. Aiming at a training sample of the GMM-HMM model, the model is different from a KNN model facing gesture recognition with unchanged gesture, and for gesture change type operation gestures, 4 features of hand gesture angle features and hand motion chain code features are added in operation gesture features, so that the hand gesture state features are increased from 15 dimensions to 19 dimensions.

When the GMM-HMM model is trained, one action sample is one data source, so that the total of training data and test data is 3×40×4 and total 480 pieces of 3×10×4 are used for the posture-changing type 3-channel GMM-HMM model test. For the acquired data, a 3-channel GMM-HMM recognition model is established in the experiment, the experimental result is shown in table 6, the recognition rate is 83.3%, and the time consumption is 2.85s.

TABLE 6 Complex actions Using GMM-HMM recognition rate and recognition timetable

Type of algorithm	Identifying objects	Data processing	Recognition rate	Run time
					GMM-HMM	Three gesture with changeable gesture	NormalizationTreatment of	83.3％	2.85s

The static and dynamic separation recognition model is comprehensively analyzed, and the comprehensive recognition rate is as follows: p=p _i ×(P _j ×6+P _k X 3)/9, wherein: p (P) _i -threshold discrimination accuracy; p (P) _j -a gesture invariant recognition rate; p (P) _k -a gesture-change recognition rate. The comprehensive recognition speed is as follows: t= (T _i ×6+T _j X 3)/9, wherein: t (T) _i -six poses are not deformed and are time-consuming to identify; t (T) _j Three gesture change type recognition is time-consuming. The comprehensive recognition rate and the speed of the static and dynamic separation recognition model are obtained by calculating the recognition speed and the recognition rate and are shown in table 7.

TABLE 7 comprehensive recognition rate and speedometer for static and dynamic separation recognition model

As can be seen from the table, the comprehensive recognition rate of the gestures of the KNN-GMM-HMM static and dynamic separation recognition model for recognizing nine gestures in the experiment is 94.33%, the comprehensive recognition speed is only 950ms, and it can be seen that the recognition model based on static and dynamic separation is effective.

The method can be used in the assembly operation gesture recognition process with a time sequence, and has the following beneficial effects: 1) According to the invention, the hand description features are divided into hand displacement features, rotation features and gesture features, and the original data are divided into gesture invariant and gesture variant, so that the state of the current hand and the change of the hand state in a continuous sequence can be completely described; 2) The invention adopts the speed threshold to effectively divide the motion data, and completes the extraction task of the effective segment data; 3) According to the method, the characteristic values of the two gestures are analyzed, the characteristic types and the threshold values which are suitable for distinguishing the two different types of gestures are obtained, the extraction of the state change quantity threshold value is completed, the correct distinguishing of the test data is realized, and the types of the gestures are effectively distinguished; 4) When the gesture is identified, the single-frame-oriented gesture identification method (KNN) is adopted for modeling aiming at the data with unchanged state, and the GMM-HMM modeling method is adopted for modeling aiming at the data with changed state, so that the static and dynamic separation identification model is constructed, and the identification accuracy and the identification speed are improved.

Claims

1. The assembling operation action recognition method based on static and dynamic separation is characterized by comprising the following steps of:

step 5, inputting the characteristic values of the gesture in the gesture change type gesture in the recognition sample divided in the step 3 into a GMM-HMM recognition model for training to obtain a recognition model of the gesture change type gesture;

the raw data of each action gesture in the step 1 includes five finger tip coordinates of a human hand: f (F) _ti (i=1, 2,3,4, 5), five heel joint coordinates of a human hand: f (F) _bi (i=1, 2,3,4, 5), finger length: l (L) _i (i=1, 2,3,4, 5), palm point coordinates: p, palm normal vector: h, pointing to the inner side of the palm, and pointing to the palm: f, the palm points to the direction of the finger, and the speed of the finger tips of five fingers: v (V) _ti (i=1, 2,3,4, 5), palm rate: v (V) _z ；

The extracting of the effective data segment in the step 2 specifically includes:

the step 3 specifically comprises the following steps:

O _j ＝f _j (α ₁ ,…,α ₅ ,β ₁ ,…,β ₅ ,μ ₁ ,…,μ ₅ )；

C _m ＝(α _1c ,…,α _5c ，β _1c ,…，β _5c ,μ _1c ,…,μ _5c )

wherein m=1, 2, …, M, C _m Representing a gesture state change amount feature vector corresponding to the mth valid data segment corresponding to the same action gesture, alpha _1c ,…,α _5c Respectively represent N frames of action gestures in the m-th effective data segment corresponding to the same action gestureA minimum value of the upward pitch of each finger plane; beta _1c ,…,β _5c Respectively representing the extremely poor value of each finger opening degree in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture; mu (mu) _1c ,…,μ _5c Respectively representing the extremely-difference value of the bending degree of each finger in the N frames of action gestures in the mth effective data segment corresponding to the same action gesture;

2. The static and dynamic separation-based assembly operation action recognition method according to claim 1, characterized in thatIn said step 3.2, α _1c ,…,α _5c The calculation method comprises the following steps:

α _ic ＝α _maxc -α _minc ；

β _1c ,…,β _5c the calculation method comprises the following steps:

β _ic ＝β _maxc -β _minc ；

μ _1c ,…,μ _5c the calculation method comprises the following steps:

μ _ic ＝μ _maxc -μ _minc ；

3. The method for identifying the assembly operation action based on static and dynamic separation according to claim 2, wherein the method for calculating the upward pitch of the finger plane is as follows:

when i=5Let i+1=1, then β ₅ The angle value between the thumb and the little finger;

the calculation method of the finger curvature comprises the following steps:

4. The static and dynamic separation-based assembly operation action recognition method according to claim 2, wherein the step 4 is specifically:

O _j ＝f _j (α ₁ ,…,α ₅ ,β ₁ ,…,β ₅ ,μ ₁ ,…,μ ₅ )

5. The static and dynamic separation-based assembly operation action recognition method according to claim 2, wherein the step 5 is specifically:

g _j ＝f _j (α ₁ ,…,α ₅ ,β ₁ ,…,β ₅ ,μ ₁ ,…,μ ₅ ,w _j ,w _fj ,a _j ,b _j )

6. The static and dynamic separation-based assembly operation action recognition method according to claim 5, wherein the w _hj The calculation method of (1) is as follows:

the w is _fj The calculation method of (1) is as follows:

7. the static and dynamic separation-based assembly operation action recognition method according to claim 5, wherein the two-dimensional displacement characteristic determination method is as follows:

removing Y-axis data in the palm coordinate P three-dimensional data, taking the palm coordinate of the previous frame of continuous data as an origin, and taking the palm coordinate P of the current frame as an origin _xzj Projecting to an XOZ plane chain code disc to obtain P _xzj The region number where the projection is located is used as the chain code value a of the current frame _j ；

The specific calculation is as follows:

where j=1, 2, …, N.