CN112446313A - Volleyball action recognition method based on improved dynamic time warping algorithm - Google Patents
Volleyball action recognition method based on improved dynamic time warping algorithm Download PDFInfo
- Publication number
- CN112446313A CN112446313A CN202011306032.6A CN202011306032A CN112446313A CN 112446313 A CN112446313 A CN 112446313A CN 202011306032 A CN202011306032 A CN 202011306032A CN 112446313 A CN112446313 A CN 112446313A
- Authority
- CN
- China
- Prior art keywords
- human body
- point
- volleyball
- key point
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009471 action Effects 0.000 title claims abstract description 139
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000001514 detection method Methods 0.000 claims abstract description 59
- 230000033001 locomotion Effects 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 24
- 238000004364 calculation method Methods 0.000 claims description 21
- 230000008859 change Effects 0.000 claims description 7
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 claims description 6
- 238000012360 testing method Methods 0.000 abstract description 10
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000011160 research Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000011161 development Methods 0.000 abstract description 3
- 239000000463 material Substances 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 230000001737 promoting effect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 12
- 230000036544 posture Effects 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 9
- 238000010606 normalization Methods 0.000 description 6
- 230000001186 cumulative effect Effects 0.000 description 4
- 230000010354 integration Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 210000003423 ankle Anatomy 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 210000000707 wrist Anatomy 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a volleyball action recognition method based on an improved dynamic time warping algorithm, which comprises the following steps: firstly, acquiring a real-time action video of volleyball movement; secondly, performing attitude estimation and target detection processing on the motion video to obtain a time sequence of key points of the human body in the motion video and a time sequence of each key point of the human body in the standard volleyball motion video; and finally, aligning and calculating the distance between the sequences by an improved dynamic time warping algorithm, and judging the volleyball action accuracy according to the sequence distance between the sequences and the distance between the sequences. The invention solves the judgment of standard volleyball actions, greatly improves the efficiency, saves a large amount of manpower, material resources and financial resources and also can lighten the work burden of teachers. The implementation method is simple, clear in thought, good in economic value and worthy of popularization and application. The invention combines the research content in the field of artificial intelligence with the actual volleyball test, realizes the application of the technology on the ground and is beneficial to promoting the combination and development of production and research.
Description
Technical Field
The invention relates to a volleyball action recognition method based on an improved dynamic time warping algorithm, and belongs to the field of artificial intelligence.
Background
In recent years, with the rapid development of volleyball sports and the emphasis of China on volleyball sports, volleyball teaching and training become more and more important, and many schools develop corresponding courses for volleyball teaching.
When the volleyball test is carried out, a teacher scores the scores according to the standard degree of student action, a large amount of time and manpower can be consumed in the process, the workload of the teacher is invisibly increased, and meanwhile, the fairness of the test can be influenced to a certain degree by manually scoring the volleyball action test. If this process can realize automatic processing through artificial intelligence technique, can effectively improve the efficiency of score calculation undoubtedly, save a large amount of manpower, material resources and financial resources, improve the fairness of volleyball examination simultaneously.
At present, when volleyball action recognition is carried out, accuracy of volleyball action judgment is reduced due to the fact that a target detection part is not added, and meanwhile, efficiency of volleyball action recognition is reduced by using a traditional dynamic time warping algorithm.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a volleyball action recognition method based on an improved dynamic time warping algorithm.
Interpretation of terms:
1. as shown in fig. 4, the human body has 18 key points, and the 18 key points are specifically 0 nose, 1 neck, 2 right shoulder, 3 right elbow, 4 right wrist, 5 left shoulder, 6 left elbow, 7 left wrist, 8 right hip, 9 right knee, 10 right ankle, 11 left hip, 12 left knee, 13 left ankle, 14 right eye, 15 left eye, 16 right ear, and 17 left ear.
2. The pose estimation network OpenPose is a human body pose recognition project, is an open source library developed by the United states Carnasky Meilong University (CMU) based on a convolutional neural network and supervised learning, can realize pose estimation of human body actions, finger motions and the like, has excellent robustness, and is the first real-time multi-user two-dimensional pose estimation application based on deep learning in the world. The OpenPose network structure is shown in fig. 1 and mainly comprises two branches, wherein one branch is used for predicting a confidence map and detecting the positions of key points, the other branch is used for detecting effective connection among the key points, each branch has t stages which are increasingly detailed, each stage fuses feature maps, the position confidence map and the connection in a picture are extracted through a convolutional neural network, and the two stages are combined to output the posture of each person through reasoning and map matching.
3. The target detection networks Yolov3, Yolov3 are classical target detection methods, and can directly use a neural network to output the detection result, namely the position and the class probability of the detection frame, so that the speed is high and the real-time effect can be achieved, the Yolov3 network structure is shown in figure 2, the input of the network is a picture, the output layers of the network are 3, and the output is made on feature graphs with different sizes. Darknet53 is the basic structure of the network, DBL is the basic component of YOLOv3, the corresponding structure is shown in FIG. 1, wherein Conv is the convolutional layer, BN is Batch Normalization, Leakyrelu is an activation function, and YOLOv3 adopts a residual structure, so that the network structure can be deeper.
The technical scheme of the invention is as follows:
a volleyball action recognition method based on an improved dynamic time warping algorithm comprises the following steps:
firstly, acquiring a real-time action video of volleyball movement through a camera;
secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;
thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;
and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.
According to the present invention, preferably, the attitude estimation and target detection processing includes the steps of:
inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.
The rectangle ABCD is a human body detection frame, and the position coordinates of the human body key points in the image are converted into the position coordinates in the human body frame according to the position of the human body detection frame, for example, the conversion coordinate calculation formula of the 8 th key point of the human body isX and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. After the key point is converted, the coordinate of the key point of the next frame is subtracted by the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the video has a frames in total, a-1 new human body key point coordinates are formed in the video for each human body key point. Similarly, a standard volleyball motion video with the frame number b will form new human body key point coordinates for each human body key point b-1.
According to the optimization of the invention, the time sequence of the human body key points in the action video is acquired, and the human body detection frame rectangle ABCD is obtained after the posture estimation and the target detection processing are set; the method comprises the following steps:
A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame, wherein the calculation formula of the position coordinates of any key point t in the 18 human key points in the human detection frame is shown as a formula (I):
in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, x andy ranges from 0 to 1, xD、yDRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original imageC、yARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection framet、ytRespectively is the abscissa and ordinate of the human body key point t in the original image, and the upper left corner in the original image is the (0,0) point position;
the normalization processing of the key point coordinates can be realized through the conversion, the consequences that the detected key points of the human body are inaccurate and the calculation of the action score is inaccurate due to the size of the human body, the distance of a camera when the video is collected and different positions of the human body in the video are avoided, and the result is more reasonable.
B. Subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;
assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinates of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence (namely the sequence of frames in the video) to obtain a human body key point time sequence with the length of a-1;
assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence (namely the sequence of frames in the video), so that a human body key point time sequence with the length of B-1 is obtained.
According to the invention, the distance between the sequences is aligned and calculated through an improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, which comprises the following steps:
comparing action video and standard volleyball test action video after obtaining human body key point time sequenceThe similarity of the ball motion videos is because the time sequences have different lengths, and at this time, a Dynamic Time Warping (DTW) algorithm is used to align the time sequences, but DTW always has the problems of high time complexity and ill-conditioned warping, and the existing improved algorithm is as shown in fig. 6, and a rectangular coordinate system is constructed, wherein OA and OC are respectively on the x axis and the y axis of the rectangular coordinate system, and OA is n and represents the length of the standard volleyball motion time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE. Therefore, the search range of the regular path can be limited in the parallelogram OEBD, all positions of the original distance matrix OABC do not need to be calculated, wherein n and m are respectively the length of the standard volleyball action sequence and the student volleyball examination action sequence, and the slopes of OE and OD are respectively 0.5 and 2, and the area of the parallelogram can be easily calculated as
C. Constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to beThus the difference between the area of the existing improved algorithm and the improved algorithm of the present invention isThe standard volleyball action is completed within at least 3s, the frame rate of the collected video is 25 frames/s, and the examination action and the standard volleyball action of the students are performed simultaneouslyThe difference between the video frame numbers of the ball motion (i.e. the difference between the time length sequences of the two) will not exceed 20 frames, then according to the existing conditionsCan be derived fromThe improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.
The accumulated distance calculation formula of the original dynamic time integration algorithm isWhere D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, rijIs the euclidean distance of i and j in the two sequences.
D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):
in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), rijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r isijIs the cosine distance between the ith point in the time sequence of the human key point of the standard volleyball action video with the length of a-1 and the jth point in the time sequence of the human key point of the action video with the length of b-1;
the polygon OFGBHI limits the calculation search range of formula (ii) in the polygon OFGBHI, and the area of the polygon affects the search range and time of the algorithm, thereby affecting the operation efficiency of the dynamic time warping algorithm. The cosine distance can represent the difference of volleyball actions, and is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the deviation degree of the regular path to the diagonal, so that the regular path is more reasonable.
E. And after the accumulative distance of each human body key point is calculated, adding the accumulative distances to obtain an average value as a final accumulative distance, judging the similarity between the two sequences according to the size of the final accumulative distance, wherein the closer the distance, the higher the similarity is, when the accumulative distance D (i, j) is less than 0.5, the standard volleyball action is judged, and otherwise, the standard volleyball action is not judged.
The invention has the beneficial effects that:
1. the invention solves the judgment of standard volleyball actions, liberates manpower, greatly improves the efficiency, saves a large amount of manpower, material resources and financial resources and also lightens the workload of teachers. The implementation method is simple, clear in thought, good in economic value and worthy of popularization and application.
2. The invention combines the research content in the field of artificial intelligence with the actual volleyball test, realizes the application of the technology on the ground and is beneficial to promoting the combination and development of production and research.
3. The invention improves the original dynamic time warping algorithm, shortens the operation time, is simpler than the existing improved algorithm, and improves the operation efficiency of the algorithm.
Drawings
FIG. 1 is a schematic diagram of a network structure of an attitude estimation network OpenPose;
FIG. 2 is a schematic diagram of a network structure of the target detection network YOLOv 3;
FIG. 3 is a schematic flow chart of a volleyball action recognition method based on an improved dynamic time warping algorithm according to the present invention;
FIG. 4 is a schematic diagram of 18 key points of a human body according to the present invention;
FIG. 5 is a schematic diagram of human body for human body key point and target detection according to the present invention;
FIG. 6 is a schematic diagram of an improved algorithm for raw dynamic time warping;
FIG. 7 is a schematic diagram of an improved algorithm for dynamic time warping according to the present invention;
FIG. 8 is a schematic diagram of a human body key point sequence in a standard motion video and a volleyball test motion video according to the present invention;
FIG. 9 is a graph showing the alignment of two sequences of FIG. 8 according to the present invention;
FIG. 10 is a diagram illustrating an image of a frame in the video 1 according to the embodiment;
FIG. 11 is a diagram illustrating an image of a frame in the video 2 according to the embodiment;
fig. 12 is a schematic diagram of a frame image in the video 3 according to the embodiment.
Detailed Description
The present invention will be further described by way of examples, but not limited thereto, with reference to the accompanying drawings.
Example 1
A volleyball action recognition method based on an improved dynamic time warping algorithm is shown in FIG. 3, and comprises the following steps:
firstly, acquiring a real-time action video of volleyball movement through a camera;
secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;
thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;
and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.
Example 2
A volleyball action recognition method based on an improved dynamic time warping algorithm is characterized in that:
the attitude estimation and target detection processing is carried out, and the method comprises the following steps:
inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; the 18 key points of the human body are shown in figure 4; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.
The result after the pose estimation and target detection processing is shown in fig. 5, where rectangle ABCD is the detection frame of human body, and the position coordinates of the human body key points in the image are converted into the position coordinates in the human body frame according to the position of the human body detection frame, for example, the formula for calculating the 8 th key point conversion coordinates of human body isX and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. After the key point is converted, the coordinate of the key point of the next frame is subtracted by the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the video has a frames in total, a-1 new human body key point coordinates are formed in the video for each human body key point. Similarly, a standard volleyball motion video with the frame number b will form new human body key point coordinates for each human body key point b-1.
Acquiring a time sequence of human body key points in the action video, and setting posture estimation and target detection processing to obtain a human body detection frame rectangle ABCD; the method comprises the following steps:
A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame, wherein the calculation formula of the position coordinates of any key point t in the 18 human key points in the human detection frame is shown as a formula (I):
in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, the value ranges of x and y are both 0 to 1, and xD、yDRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original imageC、yARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection framet、ytRespectively is the abscissa and ordinate of the human body key point t in the original image, and the upper left corner in the original image is the (0,0) point position;
the normalization processing of the key point coordinates can be realized through the conversion, the consequences that the detected key points of the human body are inaccurate and the calculation of the action score is inaccurate due to the size of the human body, the distance of a camera when the video is collected and different positions of the human body in the video are avoided, and the result is more reasonable.
B. Subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;
assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinates of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence (namely the sequence of frames in the video) to obtain a human body key point time sequence with the length of a-1;
assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence (namely the sequence of frames in the video), so that a human body key point time sequence with the length of B-1 is obtained.
The distance between the sequences is aligned and calculated through an improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, and the method comprises the following steps:
after obtaining the time sequence of the human body key points, the similarity degree of the volleyball test action video and the standard volleyball action video needs to be compared, because the time sequence has the problem of different lengths, at the moment, a Dynamic Time Warping (DTW) algorithm is used for aligning the time sequence, but the DTW algorithm always has the problems of high time complexity and ill-conditioned warping, the existing improved algorithm is shown in fig. 6, a rectangular coordinate system is constructed, OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE. Therefore, the search range of the regular path can be limited in the parallelogram OEBD, all positions of the original distance matrix OABC do not need to be calculated, wherein n and m are respectively the length of the standard volleyball action sequence and the student volleyball examination action sequence, and the slopes of OE and OD are respectively 0.5 and 2, and the area of the parallelogram can be easily calculated as
The improved dynamic time warping algorithm of the present invention is shown in FIG. 7.
C. Constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to beThus existingThe difference between the area of the improved algorithm and the improved algorithm of the present invention isThe time for finishing a standard volleyball action is at least 3s, the frame rate of the collected video is 25 frames/s, and the difference between the number of the video frames of the student examination action and the standard volleyball action (namely the difference between the time length sequences of the two) does not exceed 20 frames, so that the student examination action and the standard volleyball action are finished according to the existing conditionsCan be derived fromThe improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.
The accumulated distance calculation formula of the original dynamic time integration algorithm isWhere D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, rijIs the euclidean distance of i and j in the two sequences.
D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):
in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), rijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r isijIs a lengthIs the cosine distance between the ith point in the time sequence of the human key points of the standard volleyball action video of a-1 and the jth point in the time sequence of the human key points of the action video with the length of b-1;
the polygon OFGBHI limits the calculation search range of formula (ii) in the polygon OFGBHI, and the area of the polygon affects the search range and time of the algorithm, thereby affecting the operation efficiency of the dynamic time warping algorithm. The cosine distance can represent the difference of volleyball actions, and is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the deviation degree of the regular path to the diagonal, so that the regular path is more reasonable. The two sequences are shown in fig. 8, and the alignment result of the two sequences after calculating the corresponding sequence distance is shown in fig. 9;
E. and after the accumulative distance of each human body key point is calculated, adding the accumulative distances to obtain an average value as a final accumulative distance, judging the similarity between the two sequences according to the size of the final accumulative distance, wherein the closer the distance, the higher the similarity is, when the accumulative distance D (i, j) is less than 0.5, the standard volleyball action is judged, and otherwise, the standard volleyball action is not judged.
FIG. 6 is a schematic diagram of an improved algorithm for raw dynamic time warping; and constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is equal to n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE.
FIG. 7 is a schematic diagram of an improved algorithm for dynamic time warping according to the present invention; firstly, a rectangular coordinate system is constructed, wherein OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is equal to n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed with OA and OC as sides, respectively, and point F, G, H, I is on each of the sides OA, AB, BC, OC.
OF=0.1AO
OI=0.1CO
BH=0.1BC
BG=0.1BA
As shown in fig. 4, the distribution of key points of the human body is shown, and the human body has 18 key points. The positions of the key points of the human body are estimated through postures in deep learning. The pose estimation uses openpos. By processing each frame in the video through attitude estimation, the positions of all human body key points of students in each frame can be obtained. Meanwhile, the video is subjected to target detection, YOLOv3 is used for target detection, results of target detection and human key points are shown in FIG. 5, a rectangle ABCD is a detection frame of a human body, position coordinates of the human key points in an image are converted into position coordinates in the human body frame according to the positions of the detection frame of the human body, for example, a calculation formula of the 8 th key point conversion coordinates of the human body is thatX and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. Because the score calculation is needed according to the correctness of the change of the volleyball action direction of the student during the volleyball action test, after the key point conversion is finished, the coordinate of the key point of the next frame is subtracted from the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the new coordinates of each human key point of a frame of the video are shared, a time sequence with the length of a-1 is formed, and 18 key points form 18 time sequences.
After obtaining the time sequence of the key points of the human body, the similarity degree of the volleyball test action video and the standard volleyball action video needs to be compared, because the event sequence has the problem of different lengths, at the moment, the Dynamic Time Warping (DTW) algorithm is used for aligning the time sequence, but the DTW algorithm always has the problems of high time complexity and ill-conditioned warping, the existing improved algorithm is shown in figure 7, the search range of the warping path is limited in the parallelogram OEBD, and the original warping path does not need to be subjected to the original warping algorithmCalculating all positions of an incoming distance matrix OABC, wherein n and m are the lengths of a standard volleyball action sequence and a student volleyball examination action sequence respectively, and the slopes of OE and OD are 0.5 and 2 respectively, and according to coordinates in the figure, the area of a parallelogram can be easily calculated to beFig. 7 shows an improved dynamic time warping algorithm OF the present invention, in which OF is 0.1AO, OI is 0.1CO, BH is 0.1BC, and BG is 0.1BA, the present invention limits the search range OF the warped path to polygon OFGBHI, and the area OF the polygon OFGBHI can be determined asThus the difference between the area of the existing improved algorithm and the improved algorithm of the present invention isAt least 3s of time is needed for finishing a standard volleyball action, the frame rate of collected video is 25 frames/s, and the difference between the video frame number of the student examination action and the video frame number of the standard volleyball action (namely the difference between the video frame number and the video frame number of the standard volleyball action) does not exceed 20 frames, so that the student examination action and the standard volleyball action are finished according to the existingCan be derived fromThe improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.
The accumulated distance calculation formula of the original dynamic time integration algorithm isWhere D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, rijIs the euclidean distance of i and j in the two sequences. The improved dynamic time warping algorithm of the invention has the accumulated distance calculation formula ofD (i, j) is the cumulative cosine distance from the starting point to the point (i, j), rijThe cosine distance between the ith point in the time sequence of the human key point of the standard volleyball action video with the length of a-1 and the jth point in the time sequence of the human key point of the action video with the length of b-1 is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the degree of deviation of the regular path to the diagonal line, so that the regular path is more reasonable. The two sequences are shown in fig. 8, the alignment result of the two sequences after the corresponding sequence distance is calculated is shown in fig. 9, then the similarity between the two sequences is judged according to the distance, the closer the distance is, the higher the similarity is, the more standard the action is, otherwise, the lower the similarity is, the less standard the action is, the average value is added after the accumulative distance of each human key point is calculated to be the final accumulative distance, when the accumulative distance is less than 0.5, the standard volleyball action is judged, otherwise, the difference is not. Table 1 is a comparison of the computation times of the existing modified algorithm and the modified algorithm of the present invention, and it can be seen that the present invention indeed shortens the computation time of the existing modified dynamic time warping algorithm.
TABLE 1
Table 2 shows the criterions of volleyball movements automatically calculated according to the method of the present invention for different classmate volleyball exams.
TABLE 2
In table 2, fig. 10 shows a certain frame image in video 1, fig. 11 shows a certain frame image in video 2, and fig. 12 shows a certain frame image in video 3.
Claims (6)
1. A volleyball action recognition method based on an improved dynamic time warping algorithm is characterized by comprising the following steps:
firstly, acquiring a real-time action video of volleyball movement;
secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;
thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;
and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.
2. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein the posture estimation and target detection processing comprises the following steps:
inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.
3. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein a time sequence of key points of a human body in an action video is obtained, and a human body detection frame rectangle ABCD is obtained after posture estimation and target detection processing are set; the method comprises the following steps:
A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame;
B. subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;
assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinate of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence to obtain a human body key point time sequence with the length of a-1;
assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence to obtain a human body key point time sequence with the length of B-1.
4. The volleyball action recognition method based on the improved dynamic time warping algorithm according to claim 3, wherein converting the position coordinates of the human body key points in the image into the position coordinates of the human body key points in the human body detection frame according to the human body detection frame means: the calculation formula of the position coordinates of any key point t in the 18 human body key points in the human body detection frame is shown as the formula (I):
in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, the value ranges of x and y are both 0 to 1, and xD、yDRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original imageC、yARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection framet、ytRespectively the horizontal lines of the key points t of the human body in the original imageCoordinates and vertical coordinates, and the upper left corner in the original image is the position of (0,0) point.
5. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein the distance between the sequences is calculated by aligning through the improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, comprising the following steps:
C. constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to be
D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):
in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), rijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r isijIs a standard volleyball action video human body key point time sequence with the length of a-1The cosine distance between the ith point and the jth point in the time sequence of the action video human body key point with the length of b-1;
E. and after the accumulative distance of each human body key point is calculated, adding the accumulative distances and taking the average value as the final accumulative distance, and judging as the standard volleyball action when the accumulative distance D (i, j) is less than 0.5, otherwise, judging not to be the standard volleyball action.
6. A volleyball motion recognition method based on an improved dynamic time warping algorithm according to any one of claims 1-5, wherein a real-time motion video of volleyball motion is obtained by a camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011306032.6A CN112446313A (en) | 2020-11-20 | 2020-11-20 | Volleyball action recognition method based on improved dynamic time warping algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011306032.6A CN112446313A (en) | 2020-11-20 | 2020-11-20 | Volleyball action recognition method based on improved dynamic time warping algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112446313A true CN112446313A (en) | 2021-03-05 |
Family
ID=74737537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011306032.6A Pending CN112446313A (en) | 2020-11-20 | 2020-11-20 | Volleyball action recognition method based on improved dynamic time warping algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112446313A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113095248A (en) * | 2021-04-19 | 2021-07-09 | 中国石油大学(华东) | Technical action correction method for badminton |
CN113268626A (en) * | 2021-05-26 | 2021-08-17 | 中国人民武装警察部队特种警察学院 | Data processing method and device, electronic equipment and storage medium |
CN113409374A (en) * | 2021-07-12 | 2021-09-17 | 东南大学 | Character video alignment method based on motion registration |
CN116821713A (en) * | 2023-08-31 | 2023-09-29 | 山东大学 | Shock insulation efficiency evaluation method and system based on multivariable dynamic time warping algorithm |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104504731A (en) * | 2014-12-19 | 2015-04-08 | 西安理工大学 | Human motion synthesis method based on motion diagram |
CN107358171A (en) * | 2017-06-22 | 2017-11-17 | 华中师范大学 | A kind of gesture identification method based on COS distance and dynamic time warping |
CN109460702A (en) * | 2018-09-14 | 2019-03-12 | 华南理工大学 | Passenger's abnormal behaviour recognition methods based on human skeleton sequence |
CN109934881A (en) * | 2017-12-19 | 2019-06-25 | 华为技术有限公司 | Image encoding method, the method for action recognition and computer equipment |
CN111626137A (en) * | 2020-04-29 | 2020-09-04 | 平安国际智慧城市科技股份有限公司 | Video-based motion evaluation method and device, computer equipment and storage medium |
CN111860196A (en) * | 2020-06-24 | 2020-10-30 | 富泰华工业(深圳)有限公司 | Hand operation action scoring device and method and computer readable storage medium |
-
2020
- 2020-11-20 CN CN202011306032.6A patent/CN112446313A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104504731A (en) * | 2014-12-19 | 2015-04-08 | 西安理工大学 | Human motion synthesis method based on motion diagram |
CN107358171A (en) * | 2017-06-22 | 2017-11-17 | 华中师范大学 | A kind of gesture identification method based on COS distance and dynamic time warping |
CN109934881A (en) * | 2017-12-19 | 2019-06-25 | 华为技术有限公司 | Image encoding method, the method for action recognition and computer equipment |
CN109460702A (en) * | 2018-09-14 | 2019-03-12 | 华南理工大学 | Passenger's abnormal behaviour recognition methods based on human skeleton sequence |
CN111626137A (en) * | 2020-04-29 | 2020-09-04 | 平安国际智慧城市科技股份有限公司 | Video-based motion evaluation method and device, computer equipment and storage medium |
CN111860196A (en) * | 2020-06-24 | 2020-10-30 | 富泰华工业(深圳)有限公司 | Hand operation action scoring device and method and computer readable storage medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113095248A (en) * | 2021-04-19 | 2021-07-09 | 中国石油大学(华东) | Technical action correction method for badminton |
CN113095248B (en) * | 2021-04-19 | 2022-10-25 | 中国石油大学(华东) | Technical action correcting method for badminton |
CN113268626A (en) * | 2021-05-26 | 2021-08-17 | 中国人民武装警察部队特种警察学院 | Data processing method and device, electronic equipment and storage medium |
CN113268626B (en) * | 2021-05-26 | 2024-04-26 | 中国人民武装警察部队特种警察学院 | Data processing method, device, electronic equipment and storage medium |
CN113409374A (en) * | 2021-07-12 | 2021-09-17 | 东南大学 | Character video alignment method based on motion registration |
CN113409374B (en) * | 2021-07-12 | 2024-05-10 | 东南大学 | Character video alignment method based on action registration |
CN116821713A (en) * | 2023-08-31 | 2023-09-29 | 山东大学 | Shock insulation efficiency evaluation method and system based on multivariable dynamic time warping algorithm |
CN116821713B (en) * | 2023-08-31 | 2023-11-24 | 山东大学 | Shock insulation efficiency evaluation method and system based on multivariable dynamic time warping algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112446313A (en) | Volleyball action recognition method based on improved dynamic time warping algorithm | |
Bheda et al. | Using deep convolutional networks for gesture recognition in american sign language | |
WO2020155873A1 (en) | Deep apparent features and adaptive aggregation network-based multi-face tracking method | |
US20220101654A1 (en) | Method for recognizing actions, device and storage medium | |
CN110738161A (en) | face image correction method based on improved generation type confrontation network | |
CN107292813A (en) | A kind of multi-pose Face generation method based on generation confrontation network | |
CN110135277B (en) | Human behavior recognition method based on convolutional neural network | |
CN113516005B (en) | Dance action evaluation system based on deep learning and gesture estimation | |
CN114299559A (en) | Finger vein identification method based on lightweight fusion global and local feature network | |
CN110210462A (en) | A kind of bionical hippocampus cognitive map construction method based on convolutional neural networks | |
CN112381045A (en) | Lightweight human body posture recognition method for mobile terminal equipment of Internet of things | |
CN113111857A (en) | Human body posture estimation method based on multi-mode information fusion | |
CN112257639A (en) | Student learning behavior identification method based on human skeleton | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN107644203A (en) | A kind of feature point detecting method of form adaptive classification | |
CN111507276B (en) | Construction site safety helmet detection method based on hidden layer enhanced features | |
CN116012942A (en) | Sign language teaching method, device, equipment and storage medium | |
Xu et al. | Isolated Word Sign Language Recognition Based on Improved SKResNet-TCN Network | |
CN115761901A (en) | Horse riding posture detection and evaluation method | |
CN115346640A (en) | Intelligent monitoring method and system for closed-loop feedback of functional rehabilitation training | |
CN114120371A (en) | System and method for diagram recognition and action correction | |
CN111353509B (en) | Key point extractor generation method of visual SLAM system | |
CN111738099B (en) | Face automatic detection method based on video image scene understanding | |
Sun et al. | Chinese sign language key action recognition based on extenics immune neural network | |
CN114187663A (en) | Method for controlling unmanned aerial vehicle by posture based on radar detection gray level graph and neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210305 |