CN112464808A

CN112464808A - Rope skipping posture and number identification method based on computer vision

Info

Publication number: CN112464808A
Application number: CN202011352942.8A
Authority: CN
Inventors: 杨婷; 肖利
Original assignee: Hangzhou Shufeng Technology Co ltd; Chengdu Ruima Technology Co ltd
Current assignee: Hangzhou Shufeng Technology Co ltd; Chengdu Ruima Technology Co ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-03-09
Anticipated expiration: 2040-11-26
Also published as: CN112464808B

Abstract

The invention discloses a rope skipping posture and number identification method based on computer vision, which trains a skeleton sequence topological graph structure of each frame of image with extracted features by using an ST-CGN model; performing softmax on the features extracted by the ST-CGN network, outputting action classification and confidence, and counting rope skipping actions if the action classification is output as rope skipping; and calculating cosine similarity between the standard motion characteristic vector and the current motion characteristic vector, and judging that the current motion is the standard motion when the cosine similarity is greater than a preset threshold value. The invention can judge whether the rope skipping posture is correct by using a GCN and cosine similarity calculation method, and can calculate the number of skipping ropes by using a maximum value or minimum value method.

Description

Rope skipping posture and number identification method based on computer vision

Technical Field

The invention relates to the field of human body action analysis, in particular to a rope skipping posture and number identification method based on computer vision.

Background

The skipping rope can achieve aerobic and anaerobic exercise effects at the same time, exercise muscles of a plurality of parts, exercise coordination and balance sense, and enhance cardio-pulmonary function. The long-time pattern rope skipping training can effectively improve the bone density of the juveniles and obviously enhance physical qualities such as strength, explosive force and the like. The rope skipping posture recognition means that the rope skipping posture is recognized and compared with the standard posture, and the wrong rope skipping posture is corrected. The rope skipping number statistics refers to the number statistics aiming at different rope skipping methods.

The method adopted in the prior art is that a human body key point recognition algorithm is used for recognizing human body key points on images of continuous video frames in a computer vision mode, rope skipping actions and postures are recognized through GCN as input, and wrong actions are pre-warned by comparing the rope skipping actions and the postures with standard postures, so that harm to a human body caused by an improper rope skipping mode is avoided. And further counting the number of skipping ropes in real time according to the change rule of the key points. The defects that the manual concentration degree is limited, the counter strongly depends on rope skipping, and the requirement of audio frequency on the surrounding environment is high are eliminated.

The GCN algorithm can keep translation invariance on data with a non-Euclidean structure, and the core of the GCN is based on the spectral decomposition of a Laplace matrix. The GCN is also a neural network layer, and the propagation modes among layers are as follows:

wherein the content of the first and second substances,

i is an identity matrix;

is that

A degree matrix of (c); h is a characteristic of each layer, and H is X for the input layer; δ is a non-linear activation function.

The human body key points and the adjacent relation naturally form a topological structure, the GCN can effectively extract the space characteristics of a topological graph compared with the common CNN, a group of specific actions (not only rope skipping actions) of the human body is strongly related to the relation between each key point, and different characteristics of topological graphs of various different action key points can be effectively extracted through the GCN, so that higher accuracy can be obtained by extracting the characteristics of the human body key points by applying the GCN.

Disclosure of Invention

The invention aims to: the rope skipping posture and number recognition method based on computer vision is provided, whether the rope skipping posture is correct or not can be judged by using a GCN and cosine similarity calculation method, the number of rope skipping can be calculated by using a maximum value or a minimum value method, and the problems are solved.

The technical scheme adopted by the invention is as follows:

a rope skipping posture and number identification method based on computer vision comprises the following steps of:

step S1: collecting a video of a human body rope skipping;

step S2: detecting key points of a human body in each frame of a video to obtain the coordinate positions and the confidence degrees of the coordinates of 18 key points of the human body in each frame, and numbering the key points as 0, 1, … and 17 in sequence;

step S3: processing coordinate data of the key points, and normalizing the coordinates of the key points to-0.5;

step S4: constructing a framework sequence graph structure by using the coordinates and the confidence degrees of the coordinates of the key frames of the T frames and the key points in each frame, and recording that a time-space graph of the framework sequence is G (V, E), and a node V (Vti) T1. i ═ 0., n-1}, the feature vector f (vti) of the ith node of the tth frame { xti, yti, scoreti }, (xti, yti) is the processed coordinates of the ith keypoint, and scoreti is the confidence of the coordinates of the ith keypoint;

two edge sets are constructed, one edge set is a space structure edge Es { vtj | (i, j) ∈ H }, H is a set of naturally connected human body joints and a time sequence, an edge set EF { vtiv (T +1) i }, vtiv (T +1) i is a connecting edge of two continuous frames of the same key point, the input dimension of the network is (N, C, V, T), wherein N is a batch of sample size, C is the number of the key points 18, V is 3 and represents the processed coordinate and confidence coefficient (x, y, score) of each key point, and T is the number of the key frames.

Step S5: extracting features by using a spatial map convolution and a time sequence convolution, wherein the spatial map convolution is as follows:

wherein the normalization term Z_ti(v_tj)＝|{v_tk|l_ti(v_tk)＝l_ti(v_tj)}|，p(v_ti，v_tj) As a sampling function, p (v)_ti，v_tj)＝v_tjThe set of neighboring pixels is defined as B (v)_ti)＝{v_tj|d(v_tj，v_ti) D ≦ D }, wherein D (v)_tj，v_ti) Refers to from v_tjTo v_tiThe shortest distance of (d);

sampling function p (v)_ti，v_tj) The obtained neighboring pixels are divided into different subsets, each subset has a digital label, and a neighboring pixel is mapped to a corresponding subset label l_ti：B(v_ti) → 0, K-1, weight equation w (v &_ti，v_tj)＝w′(l_ti(v_tj))；

Extending the model of the space domain into the time domain to obtain a sampling function of the time chart convolution as follows:

wherein gamma is the size of the time domain convolution kernel, and the weight equation is

Step S6: training a skeleton sequence topological graph structure of each frame of image with the extracted features by using an ST-CGN model;

step S7: performing softmax on the features extracted by the ST-CGN network, outputting action classification and confidence, and counting rope skipping actions if the action classification is output as rope skipping;

step S8: the y coordinate set of 18 human key points of the rope skipping action starting and ending time period is S { (y10, y 20.,. y (T-1)0, yT0), (y11, y 21.,. y (T-1)1, yT1),.., (y116, y 216.,. y (T-1)16, yT16), (y117, y 217.,. y (T-1)17, yT17) }, wherein yti is the y coordinate of the ith node of the tth frame, and the smoothing operation is performed on each element in the set S, namely averaging every continuous n numbers to obtain a new set S1;

solving a maximum value set of each element in the S1 and recording the maximum value set as Ma and a minimum value set of each element as Mi; the extreme value is calculated in such a way that if a_iIs a maximum value, then a_iSatisfies the following conditions: a is_i>a_i-1And a is_i≥a_i+1(ii) a If a_iIs a minimum value, then a_iSatisfies the following conditions: a is_i<a_i-1And a is_i≦a_i+1；

Counting the lengths of the elements Ma and Mi, counting the lengths to generate a dictionary record D, wherein the length of a key of the D is the length, the value is the number of the counted lengths, the values of the D are sorted from large to small, and the key with the largest value minus 1 is the number of the skipping ropes;

step S9: constructing a skeleton included angle vector to calculate cosine similarity, and comparing a characteristic vector included angle S_AAn (a), (b), (c) (d), an (a), (b), (c) (d) vectors representing keypoints a and keypoints b

Vectors with keypoint c and keypoint d

The included angle of (A);

step S10: computing standard motion feature vectors

With current motion feature vector

Cosine similarity of (2), x_iAs feature vectors

The ith component of (a); y is_iAs feature vectors

When the cosine similarity is greater than a preset threshold value, the current action is judged to be a standard action.

To better implement the present solution, further, the feature vector in step S9

The specific feature vector angle set is as follows: s_A(ii) An (0) (1) - (1) (5), An (0) (1) - (1) (2), An (1) (2) - (1) (5), An (1) (2) - (2) (3), An (2) (3) - (3) (4), An (1) (5) - (5) (6), An (5) (6) - (6) (7), An (1) (2) - (1) (8), An (1) (5) - (1) (11), An (1) (8) - (8) (9), An (8) (9) - (9) (10), An (1) (11) - (11) (12), An (11) (12) - (12) (13), An (2) (8) - (2) (3), An (5) (11) - (5) (6) }, the 18 key points are numbered as nose 0, neck 1, right shoulder 2, right elbow 3, right hand face 4, left shoulder 5, left elbow 6, left hand face 7, right hip 8, right knee 9, right ankle 10, left hip 11, left knee 12, left ankle 13, right eye 14, left eye 15, right ear 16, and left ear 17.

In order to better implement the present solution, further, the cosine similarity is calculated in step S10 by calculating the cosine of the eigenvector included angle, and calculating the variance or standard deviation between the cosine and the cosine of the preset eigenvector included angle, and the preset threshold is generally 0.68 when calculating the variance.

In order to better implement the present solution, in step S6, the method for training the skeleton sequence topological graph structure of each frame of image with extracted features by using the ST-CGN model includes: firstly, performing batch normalization on input data, then obtaining 256-dimensional feature vectors of each sequence after passing through 9 ST-GCN units and then a global posing, and finally classifying by using a SoftMax function to obtain a final label;

each ST-GCN adopts a Resnet structure, 64 channels are output from the first three layers, 128 channels are output from the middle three layers, 256 channels are output from the last three layers, and after each ST-CGN structure, the characteristics dropout and the strides of the 4 th and 7 th time domain convolution layers are randomly set to be 2 according to the probability of 0.5;

training with SGD resulted in a learning rate of 0.01, with a 10% reduction in learning rate per 20 epochs.

In order to better implement the present solution, further, the method for normalizing the coordinates of the key points to-0.5 in step S3 specifically includes: processing the position (x0, y0) of each keypoint to obtain new keypoint coordinates (x, y)

x＝(x0/w)-0.5，y＝(y0/h)-0.5

Where w is the image width and h is the image height.

Many prior art solutions use the variance or standard deviation of coordinates directly to determine whether the rope skipping action is standard, and we know that the standard deviation reflects the degree of dispersion of a data set, and the variance is used to measure the degree of deviation between a random variable and its mathematical expectation (i.e., mean). In the prior art, the score of the human key point is obtained by adding the two indexes and taking a threshold, and the threshold is set according to the score, so as to judge whether the human key point is accurate or not, and aim at a single target. The cosine similarity measurement method is used in the scheme to calculate the cosine value of the included angle between two vectors to evaluate the similarity of the two vectors, and is specific to two targets, and further, in rope skipping, whether the action is standard or not is strongly related to the included angle between limbs of a human body, and the relation between the action and the coordinate position, namely the height of the rope skipping or the possible left-right movement in the rope skipping process is not large, so that the similarity between the current action and the standard action can be more effectively compared by using the included angle between the limbs of the human body as the vector for calculating the cosine similarity, and whether the current action is standard or not is judged.

In addition, in a general key point sequence input behavior recognition algorithm of N frames of video frames, a sliding window method is adopted, assuming that an input video sequence is N frames, the size of a sliding window is N frames, the number of moving steps is s frames, the N frames of video sequences with the size of the sliding window is input by taking a 0 th frame as a starting point to enter a network, then the window is moved backwards by s frames, and then the N frames of key point sequences with the size of the sliding window are obtained, and the method continuously slides on the video sequence to be used as the input of a behavior recognition network. The behavior recognition network adopts a space-time graph convolution network, a GCN (graph convolution network) is used for extracting spatial local information on a spatial level, a TCN (time convolution network) is used for implicitly learning time dynamic information on a temporal level, namely for each key point, not only the adjacent key point of each key point on the spatial level but also the adjacent key point of each key point on the temporal level are considered, and the concept of the neighborhood is expanded to the space-time dimension, so that the key points are more accurately recognized.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. the rope skipping posture and number identification method based on computer vision can judge whether the rope skipping posture is correct or not by using a GCN and cosine similarity algorithm;

2. the rope skipping posture and number identification method based on computer vision uses a maximum value or minimum value method to calculate the number of skipping ropes, and is convenient and rapid.

Drawings

In order to more clearly illustrate the technical solution, the drawings needed to be used in the embodiments are briefly described below, and it should be understood that, for those skilled in the art, other related drawings can be obtained according to the drawings without creative efforts, wherein:

FIG. 1 is a diagram of the ST-CGN network architecture of the present invention;

FIG. 2 is a schematic diagram of 18 key points of the human body of the present invention;

FIG. 3 is a schematic flow diagram of the present invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and therefore should not be considered as a limitation to the scope of protection. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The present invention will be described in detail with reference to fig. 1 to 3.

Example 1

A rope skipping posture and number recognition method based on computer vision is disclosed, as shown in fig. 3, and comprises the following steps which are carried out in sequence:

step S1: collecting a video of a human body rope skipping;

wherein the normalization term Z_ti(v_tj)＝|{v_tk|l_ti(v_tk)＝l_ti(v_tj) I, the base equivalent to the corresponding subset, p (v)_ti，v_tj) As a sampling function, p (v)_ti，v_tj)＝v_tjThe set of neighboring pixels is defined as B (v)_ti)＝{v_tj|d(v_tj，v_ti) ≧ D }, wherein D (v)_tj，v_ti) Refers to from v_tjTo v_tiThe shortest distance of (d);

Vectors with keypoint c and keypoint d

The included angle of (A);

step S10: computing standard motion feature vectors

With current motion feature vector

Cosine similarity of (2), x_iAs feature vectors

The ith component of (a); y is_iAs feature vectors

Example 2

The feature vector in the step S9

The specific feature vector angle set is as follows: s_A(ii) An (0) (1) - (1) (5), An (0) (1) - (1) (2), An (1) (2) - (1) (5), An (1) (2) - (2) (3), An (2) (3) - (3) (4), An (1) (5) - (5) (6), An (5) (6) - (6) (7), An (1) (2) - (1) (8), An (1) (5) - (1) (11), An (1) (8) - (8) (9), An (8) (9) - (9) (10), An (1) (11) - (11) (12), An (11) (12) - (12) (13), An (2) (8) - (2) (3), An (5) (11) - (5) (6) }, the 18 key points are numbered as nose 0, neck 1, right shoulder 2, right elbow 3, right wrist 4, left shoulder 5, left elbow 6, left wrist 7, right hip 8, right knee 9, right ankle 10, left hip 11, left knee 12, left ankle 13, right eye 14, left eye 15, right ear 16, and left ear 17.

Further, the cosine similarity is calculated in step S10 by calculating the cosine of the eigenvector included angle, and calculating the variance or standard deviation of the cosine of the eigenvector included angle and the cosine of the preset eigenvector included angle, and the preset threshold is generally 0.68 when calculating the variance.

Further, the method for training the skeleton sequence topological graph structure of each frame of image with extracted features by using the ST-CGN model in step S6 includes: firstly, performing batch normalization on input data, then obtaining 256-dimensional feature vectors of each sequence after passing through 9 ST-GCN units and then a global posing, and finally classifying by using a SoftMax function to obtain a final label;

Further, the method for normalizing the coordinates of the key points to-0.5 in the step S3 specifically includes: processing the position (x0, y0) of each keypoint to obtain new keypoint coordinates (x, y)

x＝(x0/w)-0.5，y＝(y0/h)-0.5

Where w is the image width and h is the image height.

Other parts of this embodiment are the same as those of embodiment 1, and thus are not described again.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A rope skipping posture and number identification method based on computer vision is characterized in that: comprises the following steps which are carried out in sequence:

step S1: collecting a video of a human body rope skipping;

wherein the normalization term Z_tt(v_tj)＝|{v_tk|l_ti(v_tk)＝l_ti(v_tj)}|，p(v_ti，v_tj) As a sampling function, p (v)_ti，v_tj)＝v_tjThe set of neighboring pixels is defined as B (v)_ti)＝{v_tj|d(v_tj，v_tt) D ≦ D }, wherein D (v)_tj，v_ti) Refers to from v_tjTo v_tiThe shortest distance of (d);

Vectors with keypoint c and keypoint d

The included angle of (A);

step S10: computing standard motion feature vectors

With current motion feature vector

Cosine similarity of (2), x_iAs feature vectors

The ith component of (a); y is_iAs feature vectors

2. The rope skipping posture and number recognition method based on computer vision according to claim 1, characterized in that: the feature vector in the step S9

The specific feature vector angle set is as follows: s_A(ii) An (0) (1) - (1) (5), An (0) (1) - (1) (2), An (1) (2) - (1) (5), An (1) (2) - (2) (3), An (2) (3) - (3) (4), An (1) (5) - (5) (6), An (5) (6) - (6) (7), An (1) (2) - (1) (8), An (1) (5) - (1) (11), An (1) (8) - (8) (9), An (8) (9) - (9) (10), An (1) (11) - (11) (12), An (11) (12) - (12) (13), An (2) (8) - (2) (3), An (5) (11) - (5) (6) }, wherein 18 key points are numbered as nose 0, neck 1, right shoulder 2, right elbow 3, right hand face 4, left shoulder 5, left elbow 6, left hand face 7, right hip 8, right knee 9, right ankle 10, left hip 11, left knee 12, left knee cap and right knee cap,Left ankle 13, right eye 14, left eye 15, right ear 16, left ear 17.

3. The rope skipping posture and number recognition method based on computer vision according to claim 1, characterized in that: the cosine similarity in step S10 is calculated by calculating the cosine of the included angle of the feature vector and calculating the variance or standard deviation of the cosine of the included angle of the feature vector and the preset feature vector.

4. The rope skipping posture and number recognition method based on computer vision according to claim 1, characterized in that: the method for training the skeleton sequence topological graph structure of each frame of image with extracted features by using the ST-CGN model in the step S6 comprises the following steps: firstly, performing batch normalization on input data, then obtaining 256-dimensional feature vectors of each sequence after passing through 9 ST-GCN units and then a global posing, and finally classifying by using a SoftMax function to obtain a final label;

5. The rope skipping posture and number recognition method based on computer vision according to claim 1, characterized in that: the method for normalizing the coordinates of the key points to-0.5 in the step S3 specifically comprises the following steps: processing the position (x0, y0) of each keypoint to obtain new keypoint coordinates (x, y)

x＝(x0/w)-0.5，y＝(y0/h)-0.5

Where w is the image width and h is the image height.