CN112527118A

CN112527118A - Head posture recognition method based on dynamic time warping

Info

Publication number: CN112527118A
Application number: CN202011485090.XA
Authority: CN
Inventors: 李淮周; 王宏; 李森; 曹祥红; 胡海燕; 武东辉; 温书沛; 吴彦福; 李晓彬
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-03-19
Anticipated expiration: 2040-12-16
Also published as: CN112527118B

Abstract

The invention provides a head posture identification method based on dynamic time warping, which comprises the following steps: acquiring characteristic data of acceleration and angular velocity of the head action posture in the X direction, the Y direction and the Z direction through an inertial sensor fixed on the head, and storing the characteristic data in a data set; preprocessing data in the data set, detecting the starting time and the ending time of head movements, and extracting movement intervals of the head movements; constructing a head action template; calculating a regular path according to the detected head action data and the obtained head action template data; and the standard template head action type corresponding to the minimum value of the regular path DTW is the head action type of the data to be identified. The head motion type of the test object can be accurately estimated by means of the acceleration and angular velocity information measured by the inertial sensor, and the identification accuracy rate of the head motion of the human body can be effectively improved; and the price is low, the data processing capacity is small, the reaction is fast, and the identification accuracy is high.

Description

Head posture recognition method based on dynamic time warping

Technical Field

The invention relates to the technical field of pattern recognition, in particular to a head posture recognition method based on dynamic time warping.

Background

With the development of artificial intelligence technology, the production life style of human is greatly changed, and the traditional keyboard and mouse input mode cannot meet the requirements of all people, such as people with unhealthy upper limbs. Therefore, the development of a head posture-based motion recognition technology attracts extensive attention of researchers.

The types of devices used for head pose calculation can be divided into two categories: one class of devices and methods based on wearing inertial sensors, such as the invention patent application with publication number CN103076045B, provides a head posture sensing device and method; the invention patent of CN105943052A provides a fatigue driving detection method and device based on a deflection angle; the method has the advantages of high precision and good real-time performance, and has the disadvantages that a user needs to wear an inertial sensor, and the biases are on posture estimation, and a head motion recognition technology is not provided; another method is based on a machine vision method, such as Tan et al, and the invention patent No. 0 is CN102737235A, and a head posture estimation method based on depth information and color images estimates the head posture through a camera or a depth camera.

A Dynamic Time Warping (DTW) is a dynamic programming based method, and is widely used in the field of voice and gesture recognition. The dynamic time warping algorithm can warp data under a time axis, and extension or shortening of a time sequence is achieved, so that better alignment is achieved, and accuracy and robustness of the algorithm are improved. Head movements can cause changes of movement time lengths due to personal habits and different current states, and are a typical unequal-length time sequence identification problem.

Disclosure of Invention

Aiming at the technical problems of large calculation amount and low accuracy rate of the existing head gesture recognition, the invention provides a head gesture recognition method based on dynamic time warping, different head actions are recognized by evaluating the time sequence warping path distance between the different actions and a standard template through a DTW method, the data processing amount is small, and the recognition accuracy rate is high.

In order to achieve the purpose, the technical scheme of the invention is realized as follows: a head posture recognition method based on dynamic time warping comprises the following steps:

step S1: data acquisition: acquiring characteristic data of acceleration and angular velocity of the head action posture in the X direction, the Y direction and the Z direction through an inertial sensor fixed on the head, and storing the characteristic data in a data set;

step S2: end point detection of head motion: preprocessing data in the data set, detecting the start time and the end time of head movement according to preprocessed head inertia data and angular speed information, and extracting a movement interval of the head movement;

step S3: calculating a head action time series template: constructing acceleration and angular velocity head motion templates in the X direction, the Y direction and the Z direction according to the head motion data detected by the end point detection in the step S2 and the related motion labels;

step S4: calculating a regular path: calculating a normalized path by the test set in the data set through the head motion data detected in the step S2 and the head motion template data obtained in the step S3;

step S5: judging the head action type: and the standard template head action type corresponding to the minimum value of the regular path DTW is the head action type of the data to be identified.

The inertial sensor is arranged on the glasses leg close to the front part of the head, when data are collected, a testee takes the glasses and sits on the bench to naturally and respectively make head action postures of nodding, pitching, left shaking, right shaking, left turning and right turning; the format of the data set is: [ data, label ], wherein, data is a 6-dimensional matrix which is acceleration and angular velocity of x, y and z axes of the sensor respectively, and the length is not fixed under different labels; label is a type variable, corresponding to 6 types of head actions.

The preprocessing method in step S2 includes:

step S21: data normalization

y'(t)＝arctan(x(t))*2/π (1)

Where y' (t) is normalized data, and x (t) is acceleration or angular velocity data collected by the inertial sensor.

Step S22: sliding median filtering

Wherein l is the median filter window length; l is 2N-1 to represent odd number, and N is a natural number set; l ═ 2n represents an even number, mean () is a median function, y (t) is a median value within the length of the sliding window, and y '(t- (l-1)/2: t + (l-1)/2) and y' (t-l/2: t + l/2-1) represent data of length l for data normalization, respectively.

The method for detecting the end point of the head action in step S2 includes: determining the start time of the head action as:

wherein the content of the first and second substances,

is t atThe general description of the angular velocity changes in each direction reflects the general degree of change, ang, of the angle of the head motion_x(t)、ang_y(t) and ang_z(t) representing the angular velocity components in the X direction, the Y direction and the Z direction on the three-dimensional coordinate axis respectively; ang_minIs a threshold for head start action; t is t_startIs the start time of the head movement;

determining a head action end time:

among them, sum ([ t-t ]_min,t))＜ang_min) Calculates the ang (t) at [ t-t_minT) less than a threshold ang in a time interval_minThe number of (2); t is t_minIs the minimum time of duration of the head movement; fs is the sampling frequency of the sensor; if the values of all the sampling points in the minimum duration are less than the threshold ang after the start of the head movement_minThe head movement is considered to be finished, and the finish time is t_end；

Judging the validity of the head action:

(t_end-t_start＞t_min) And (t)_end-t_start＞t_max) There is a head action;

wherein, t_minIs the minimum time for which the head movement lasts; t is t_maxIs the maximum time that the head movement lasts.

Acquiring 26 head action data, and randomly dividing the data into a training set and a testing set, wherein 18 persons exist in the training set and 8 persons exist in the testing set;

if the data processed in step S2 belongs to a training set or individual dependency acquisition template data, calculating a head movement time series template through step S3; if the motion sequence belongs to the test set or the real-time data collected in real time, the head motion type to which the motion sequence belongs is judged through the DTW value through the steps S4 and S5.

The implementation method of the step S3 is as follows:

step S31: according to the time series of the head actions extracted in the step S2, each action time series and the label thereof are obtained according to the set threshold;

step S32: for a head movement in the training set, let a group of data in the time sequence be S_a＝{s₁,s₂,…,s_a}，S_aThe matrix is a 6X a matrix, and the row vectors of the matrix respectively correspond to the acceleration and the angular velocity in the X direction, the Y direction and the Z direction; the column vectors correspond to head motion features; the total time series set of the training set is S ═ S_a,S_b,…,S_nN is the number of the actions in the training set; a. b, … and k represent the sequence S_a、S_b、S_nLength of (d);

step S33: let the sequence length vector be S_lenWhen the length of the template time sequence is T, { a, b, …, n }, the length of the template time sequence is T_len＝median(S_len) Wherein, mean () is a median function;

step S34: let the standard template of the head action be T_iWherein, i is 1,2, …,6, corresponding to six head motion types; the matrix is a 6X matrix, and the row vectors of the matrix respectively correspond to the acceleration and the angular velocity in the X direction, the Y direction and the Z direction; the column vector corresponds to the head motion characteristic, and the length x is determined according to the data length in the training set; by means of the mean value formula

To obtain T_ik，T_ikFront T of_lenThe data is used as a standard template time sequence of the action, wherein T_ikRepresenting the kth line of data in the ith action template; s_jkA kth line of data representing a jth object action type; due to S_jkThe duration of the action is not equal among the testers, and the pair S of the binary () function is used_jkCarrying out binarization {1, 0}, thereby calculating the number of elements at the same position;

step S34: repeating the steps S32, S33, standard templates of other action types can be obtained.

The implementation method of the step S4 is as follows:

step S41: calculating a distance matrix D:let the time sequence of the head action in the test set be S ═ S₁,s₂,…,s_n}; the data of the template to be matched with the time sequence of the template as the standard template is T ═ T₁,t₂,…,t_m}; the Euclidean distance between any two points is

Wherein s is_iIs the ith column vector in the time series S; t is t_jIs the jth column vector in the standard template T; s_ikIs the kth row element of any ith column vector in time series S; t is t_jkIs the kth row element of any jth column vector in the standard template T; calculating all the possibilities to form an n multiplied by m distance matrix D; the method is changed into the method for solving the shortest path problem from the starting point D (1,1) to the end point D (n, m) by applying a dynamic programming method.

Step S42: let regular path W be { W ═ W₁,w₂,w₃,…,w_yIn which w_eRepresents the distance between a point in the time series S and the standard template T, y is the warping path length, range: y is more than or equal to max (m, n) and less than or equal to m + n; obtaining an optimal planning path according to the constraint conditions of the regular path;

step S43: and calculating by adopting a dynamic planning idea of accumulated distance to solve the optimal regular path.

The steps S41, S42, and S43 are repeated to calculate DTW values between the head movement time series S and the movement time series of the 6 standard templates, respectively.

The regular path needs to satisfy several constraints:

boundary conditions: the path is from the starting point w₁D (1,1) to end point w_y＝D(n,m)；

Continuity: if w_e-1D (a, b), then the next point w of the path_eD (a ', b') needs to satisfy | a-a '| ≦ 1, | b-b' | ≦ 1, i.e., no match across a certain dot zone;

monotonicity: if w_e-1D (a, b), then the next point w of the path_eD (a ', b') is required to satisfy that a '-a ≧ 0, b' -b ≧ 0, i.e., the point on the regular path W must proceed monotonically with time；

Thus, know from w_e-1There are only three paths to the next point for D (a, b): d (a +1, b), D (a +1, b +1), D (a, b + 1). Then the optimal warping path is:

the cumulative distance is:

r(e,f)＝d(s_e,t_f)+min{r(e-1,f),r(e-1,f-1),r(e,f-1)}；

wherein e is 1,2,3, …, n; f is 1,2,3, …, m; s_eRepresenting the e-th column vector in the matrix S to be detected; t is t_fAn e-th column vector representing a certain head action in the template matrix T to be detected; r (e, f) is the cumulative distance.

The invention has the beneficial effects that: the invention provides a head motion endpoint detection and head motion identification method based on dynamic time warping, which is characterized in that an inertial sensor placed at a glasses leg or an ear is used for collecting acceleration, angular velocity and the like in the direction of X, Y, Z head motion, the collected angular velocity data is calculated to obtain a resultant angular velocity, a threshold method is used for carrying out endpoint automatic detection, and abnormal data are removed. The invention supports to generate an individual-dependent head action template or introduce an empirical head action template, then calculates the DTW of the dynamic time warping path of each template data for the automatically detected head action test data, and compares the DTW with the minimum DTW value to determine the action type. The head motion type of the test object, such as nodding, pitching, left shaking, right shaking and the like, can be accurately estimated by means of the acceleration and angular velocity information measured by the inertial sensor, and the identification accuracy of the head motion of the human body can be effectively improved. Compared with the head action recognition technology based on a camera and a depth camera, the invention has the advantages of low price, small data processing amount, quick response and high recognition accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a schematic diagram of head motion gesture types according to the present invention.

FIG. 3 is a time series template of the head movements of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, a head pose recognition method based on dynamic time warping includes the following steps:

step S1: data acquisition: the characteristic data of the acceleration and the angular velocity of the head action posture in the X direction, the Y direction and the Z direction are collected through an inertial sensor fixed on the head and stored in a data set.

The head posture change is sensed through the inertial sensor, the type of the collected data is respectively the acceleration and the angular velocity in the direction of X, Y, Z, and the data is 6 types of characteristic data, and the inertial sensor is arranged on the glasses legs close to the front part of the head, as shown in the normal view in the middle of the figure 2. When data is collected, a subject takes the glasses and sits on the stool to naturally and respectively make head actions of nodding, pitching, left shaking, right shaking, left turning and right turning, which are respectively shown as a view close to the outside in fig. 1.

In order to verify the effectiveness of the method provided by the invention, 26 persons of head motion data are collected and randomly divided into a training set and a testing set for testing, wherein the training set comprises 18 persons and the testing set comprises 8 persons. The data set format is: [ data, label ], wherein, data is a 6-dimensional matrix which is acceleration and angular velocity of x, y and z axes of the sensor respectively, and the length is not fixed under different labels; label is a type variable, corresponding to 6 types of head actions.

Step S2: end point detection of head motion: preprocessing data in the data set, detecting the start time and the end time of head movement according to preprocessed head inertia data and angular velocity information, and extracting the movement interval of the head movement, wherein the steps are as follows:

step S21: data normalization

y'(t)＝arctan(x(t))*2/π (1)

Step S22: sliding median filtering

Wherein l is the median filter window length; l is 2N-1 to represent odd number, and N is a natural number set; l ═ 2n represents an even number, mean () is a median function, y (t) is a median value within the length of the sliding window, and y '(t- (l-1)/2: t + (l-1)/2) and y' (t-l/2: t + l/2-1) represent data of length l for data normalization, respectively. The sliding median filter has the function of reducing the salt and pepper noise of the inertial sensor, so that the possibility of misjudgment of subsequent action identification and endpoint detection is reduced.

Step S23: determining the start time of the head action as:

wherein the content of the first and second substances,

is the overall description of the angular velocity change of each direction at the time t, and reflects the overall change degree of the angle of the head action, ang_x(t)、ang_y(t) and ang_z(t) representing the angular velocity components in the X direction, the Y direction and the Z direction on the three-dimensional coordinate axis respectively; ang_minIs a threshold for head start action; t is t_startIs the start time of the head movement.

Step S24: determining a head action end time:

among them, sum ([ t-t ]_min,t))＜ang_min) Calculates the ang (t) at [ t-t_minT) less than a threshold ang in a time interval_minThe number of (2); t is t_minIs the minimum time of duration of the head movement; fs is the sampling frequency of the sensor. If the values of all the sampling points in the minimum duration are less than the threshold ang after the start of the head movement_minThe head movement is considered to be finished, and the finish time is t_end。

Step S25: judging that the head action is effective:

(t_end-t_start＞t_min) And (t)_end-t_start＞t_max) Presence of head movement (5)

Wherein, t_minThe minimum time of the head action duration is used for eliminating the spike noise of the waveform; t is t_maxIs the longest duration of the head action for rejecting actions of abnormal or incomplete duration. Thereby completing the extraction of the head motion data, if the data belongs to a training set or the data of the template collected by the individual, the head motion time sequence template can be calculated through the step S3; if the motion sequence belongs to the test set or the real-time data collected in real time, the head motion type to which the motion sequence belongs is judged through the DTW values through the steps S4 and S5.

Step S3: calculating a head action time series template: constructing acceleration and angular velocity head motion templates in the X direction, the Y direction and the Z direction according to the head motion data detected by the end point detection in the step S2 and the related motion labels, and comprising the following steps:

step (ii) ofS31: according to the endpoint detection method described in step 2 of the invention, the time sequence of the head action is extracted, and the threshold value selection reference values are respectively as follows: l ═ 0.2s, ang_min＝0.2rad/s，t_min＝0.6s，t_maxEach action time series and its label are obtained 3s, fs 100 Hz.

Step S32: for example, one head movement in the training set is taken as an example, a group of data in the time sequence is S_a＝{s₁,s₂,…,s_aA is the length of the group of data, S_aThe matrix is a 6X a matrix, and the row vectors of the matrix respectively correspond to the acceleration and the angular velocity in the X direction, the Y direction and the Z direction; the column vectors correspond to head motion features. The total time series set of the training set is S ═ S_a,S_b,…,S_nN is the number of the actions in the training set; a, b, …, k represent the length of the sequences located, respectively.

Step S33: let the sequence length vector be S_lenWhen the length of the template time sequence is T, { a, b, …, n }, the length of the template time sequence is T_len＝median(S_len) Wherein mean () is the median function.

Step S34: let the standard template of the head action be T_iWherein, i is 1,2, …,6, corresponding to six head motion types; the matrix is a 6X matrix, and the row vectors of the matrix respectively correspond to the acceleration and the angular velocity in the X direction, the Y direction and the Z direction; the column vectors correspond to head motion features and the length x is determined according to the length of the data in the training set. Can be calculated by mean value formula

To obtain T_ik，T_ikFront T of_lenThe data is used as a standard template time sequence of the action, wherein T_ikRepresenting the kth line of data in the ith action template; s_jkA kth line of data representing a jth object action type; due to S_jkThe duration of the action is not equal among the testers, and the pair S of the binary () function is used_jkBinarization {1, 0} is performed to calculate the number of same-position elements.

Step S34: repeating the steps S32 and S33 to obtainTo standard templates for other action types, such as fig. 3. Acc in FIG. 3_x(t)、acc_y(t) and acc_z(t) and acc (t) represent the acceleration in the direction X, Y, Z and the resultant acceleration, respectively; ang_x(t)、ang_y(t) and ang_z(t)、ang_t(t) indicates the angular velocity in the direction X, Y, Z and the resultant angular velocity.

The user can also make corresponding head motion data for a plurality of times through system voice prompt, and the personal dependency head motion time sequence template is calculated according to the steps S31, S32, S33 and S34.

Step S4: calculating a normalized path, wherein the test set calculates the normalized path by using the head motion data detected in step S2 and the head motion template data obtained in step S3, and the detailed steps are as follows:

step S41: a distance matrix D is calculated. Let the time sequence of the head action in the test set be S ═ S₁,s₂,…,s_nIs a 6 × n matrix; the data of the template to be matched with the time sequence of the template as the standard template is T ═ T₁,t₂,…,t_mIs a 6 × m matrix; the Euclidean distance between any two points is

Wherein s is_iIs the ith column vector in the matrix S; t is t_jIs the jth column vector in the matrix T; s_ikIs the kth row element of any ith column vector in vector S; t is t_jkIs the kth row element of any jth column vector in vector T. All the possibilities are calculated to form an n x m distance matrix D. Therefore, the two time series similarity problems are converted into a problem of solving the shortest path from the starting point D (1,1) to the end point D (n, m) by applying a dynamic programming method, that is, a regular path (warp path), which is denoted by W.

Step S42: let regular path W be { W ═ W₁,w₂,w₃,…,w_yIn which w_eRepresents the distance between a point in the time series S and the standard template T, y is the warping path length, range: y is more than or equal to max (m, n) and less than or equal to m + n. And the following constraints need to be satisfied:

monotonicity: if w_e-1D (a, b), then the next point w of the path_eD (a ', b') needs to be such that a '-a ≧ 0, b' -b ≧ 0, i.e., the point on W must progress monotonically with time;

step S43: in order to solve the optimal regular path, i.e., the solution (6), a dynamic programming idea of cumulative distance (cumulative distance) is adopted for calculation, and the cumulative distance formula is defined as:

r(e,f)＝d(s_e,t_f)+min{r(e-1,f),r(e-1,f-1),r(e,f-1)} (7)

wherein e is 1,2,3, …, n; f is 1,2,3, …, m; s_eRepresenting the e-th column vector in the matrix S to be detected; t is t_fAn e-th column vector representing a certain head action of the template matrix T to be detected; r (e, f) is an accumulated distance which is actually a recursion relation, so that the optimal warping path distance of the two groups of time series of S and T is DTW (S, T) which is r (n, m), and the measurement problem of time series similarity caused by non-uniform time series length and non-aligned characteristic position is solved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A head posture recognition method based on dynamic time warping is characterized by comprising the following steps:

2. The method for recognizing the head posture based on the dynamic time warping as claimed in claim 1, wherein the inertial sensors are mounted on the glasses legs near the front of the head, and when collecting data, the subject takes the glasses and sits on the stool to naturally and respectively make head actions of nodding, pitching, left shaking, right shaking, left turning and right turning; the format of the data set is: [ data, label ], wherein, data is a 6-dimensional matrix which is acceleration and angular velocity of x, y and z axes of the sensor respectively, and the length is not fixed under different labels; label is a type variable, corresponding to 6 types of head actions.

3. The method for recognizing head pose based on dynamic time warping as claimed in claim 1, wherein the preprocessing in step S2 comprises:

step S21: data normalization

y'(t)＝arctan(x(t))*2/π (1)

Step S22: sliding median filtering

4. The head pose recognition method based on dynamic time warping as claimed in claim 1 or 3, wherein the method of detecting the end point of the head action in step S2 is: determining the start time of the head action as:

wherein the content of the first and second substances,

is the overall description of the angular velocity change of each direction at the time t, and reflects the overall change degree of the angle of the head action, ang_x(t)、ang_y(t) and ang_z(t) representing the angular velocity components in the X direction, the Y direction and the Z direction on the three-dimensional coordinate axis respectively; ang_minIs a threshold for head start action; t is t_startIs the start time of the head movement;

determining a head action end time:

Judging the validity of the head action:

(t_end-t_start＞t_min) And (t)_end-t_start＞t_max) There is a head action;

5. The method for recognizing the head pose based on the dynamic time warping as claimed in claim 4, wherein the head motion data of 26 persons are collected and randomly divided into a training set and a testing set, wherein 18 persons exist in the training set and 8 persons exist in the testing set;

6. The head pose recognition method based on dynamic time warping as claimed in claim 5, wherein the step S3 is implemented by:

7. The head pose recognition method based on dynamic time warping as claimed in claim 5, wherein the step S4 is implemented by:

step S41: calculating a distance matrix D: let the time sequence of the head action in the test set be S ═ S₁,s₂,…,s_n}; the data of the template to be matched with the time sequence of the template as the standard template is T ═ T₁,t₂,…,t_m}; the Euclidean distance between any two points is

8. The method for head pose recognition based on dynamic time warping as claimed in claim 7, wherein the warping path needs to satisfy the following constraints:

monotonicity: if w_e-1D (a, b), then the next point w of the path_eD (a ', b') needs to be such that a '-a ≧ 0, b' -b ≧ 0, i.e., the point on the regular path W must proceed monotonically with time;

the cumulative distance is:

r(e,f)＝d(s_e,t_f)+min{r(e-1,f),r(e-1,f-1),r(e,f-1)}；