CN112446313A - Volleyball action recognition method based on improved dynamic time warping algorithm - Google Patents

Volleyball action recognition method based on improved dynamic time warping algorithm Download PDF

Info

Publication number
CN112446313A
CN112446313A CN202011306032.6A CN202011306032A CN112446313A CN 112446313 A CN112446313 A CN 112446313A CN 202011306032 A CN202011306032 A CN 202011306032A CN 112446313 A CN112446313 A CN 112446313A
Authority
CN
China
Prior art keywords
human body
point
volleyball
key point
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011306032.6A
Other languages
Chinese (zh)
Inventor
周斌
吕传栋
周洪超
张艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202011306032.6A priority Critical patent/CN112446313A/en
Publication of CN112446313A publication Critical patent/CN112446313A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a volleyball action recognition method based on an improved dynamic time warping algorithm, which comprises the following steps: firstly, acquiring a real-time action video of volleyball movement; secondly, performing attitude estimation and target detection processing on the motion video to obtain a time sequence of key points of the human body in the motion video and a time sequence of each key point of the human body in the standard volleyball motion video; and finally, aligning and calculating the distance between the sequences by an improved dynamic time warping algorithm, and judging the volleyball action accuracy according to the sequence distance between the sequences and the distance between the sequences. The invention solves the judgment of standard volleyball actions, greatly improves the efficiency, saves a large amount of manpower, material resources and financial resources and also can lighten the work burden of teachers. The implementation method is simple, clear in thought, good in economic value and worthy of popularization and application. The invention combines the research content in the field of artificial intelligence with the actual volleyball test, realizes the application of the technology on the ground and is beneficial to promoting the combination and development of production and research.

Description

Volleyball action recognition method based on improved dynamic time warping algorithm
Technical Field
The invention relates to a volleyball action recognition method based on an improved dynamic time warping algorithm, and belongs to the field of artificial intelligence.
Background
In recent years, with the rapid development of volleyball sports and the emphasis of China on volleyball sports, volleyball teaching and training become more and more important, and many schools develop corresponding courses for volleyball teaching.
When the volleyball test is carried out, a teacher scores the scores according to the standard degree of student action, a large amount of time and manpower can be consumed in the process, the workload of the teacher is invisibly increased, and meanwhile, the fairness of the test can be influenced to a certain degree by manually scoring the volleyball action test. If this process can realize automatic processing through artificial intelligence technique, can effectively improve the efficiency of score calculation undoubtedly, save a large amount of manpower, material resources and financial resources, improve the fairness of volleyball examination simultaneously.
At present, when volleyball action recognition is carried out, accuracy of volleyball action judgment is reduced due to the fact that a target detection part is not added, and meanwhile, efficiency of volleyball action recognition is reduced by using a traditional dynamic time warping algorithm.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a volleyball action recognition method based on an improved dynamic time warping algorithm.
Interpretation of terms:
1. as shown in fig. 4, the human body has 18 key points, and the 18 key points are specifically 0 nose, 1 neck, 2 right shoulder, 3 right elbow, 4 right wrist, 5 left shoulder, 6 left elbow, 7 left wrist, 8 right hip, 9 right knee, 10 right ankle, 11 left hip, 12 left knee, 13 left ankle, 14 right eye, 15 left eye, 16 right ear, and 17 left ear.
2. The pose estimation network OpenPose is a human body pose recognition project, is an open source library developed by the United states Carnasky Meilong University (CMU) based on a convolutional neural network and supervised learning, can realize pose estimation of human body actions, finger motions and the like, has excellent robustness, and is the first real-time multi-user two-dimensional pose estimation application based on deep learning in the world. The OpenPose network structure is shown in fig. 1 and mainly comprises two branches, wherein one branch is used for predicting a confidence map and detecting the positions of key points, the other branch is used for detecting effective connection among the key points, each branch has t stages which are increasingly detailed, each stage fuses feature maps, the position confidence map and the connection in a picture are extracted through a convolutional neural network, and the two stages are combined to output the posture of each person through reasoning and map matching.
3. The target detection networks Yolov3, Yolov3 are classical target detection methods, and can directly use a neural network to output the detection result, namely the position and the class probability of the detection frame, so that the speed is high and the real-time effect can be achieved, the Yolov3 network structure is shown in figure 2, the input of the network is a picture, the output layers of the network are 3, and the output is made on feature graphs with different sizes. Darknet53 is the basic structure of the network, DBL is the basic component of YOLOv3, the corresponding structure is shown in FIG. 1, wherein Conv is the convolutional layer, BN is Batch Normalization, Leakyrelu is an activation function, and YOLOv3 adopts a residual structure, so that the network structure can be deeper.
The technical scheme of the invention is as follows:
a volleyball action recognition method based on an improved dynamic time warping algorithm comprises the following steps:
firstly, acquiring a real-time action video of volleyball movement through a camera;
secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;
thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;
and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.
According to the present invention, preferably, the attitude estimation and target detection processing includes the steps of:
inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.
The rectangle ABCD is a human body detection frame, and the position coordinates of the human body key points in the image are converted into the position coordinates in the human body frame according to the position of the human body detection frame, for example, the conversion coordinate calculation formula of the 8 th key point of the human body is
Figure BDA0002788370670000021
X and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. After the key point is converted, the coordinate of the key point of the next frame is subtracted by the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the video has a frames in total, a-1 new human body key point coordinates are formed in the video for each human body key point. Similarly, a standard volleyball motion video with the frame number b will form new human body key point coordinates for each human body key point b-1.
According to the optimization of the invention, the time sequence of the human body key points in the action video is acquired, and the human body detection frame rectangle ABCD is obtained after the posture estimation and the target detection processing are set; the method comprises the following steps:
A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame, wherein the calculation formula of the position coordinates of any key point t in the 18 human key points in the human detection frame is shown as a formula (I):
Figure BDA0002788370670000031
in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, x andy ranges from 0 to 1, xD、yDRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original imageC、yARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection framet、ytRespectively is the abscissa and ordinate of the human body key point t in the original image, and the upper left corner in the original image is the (0,0) point position;
the normalization processing of the key point coordinates can be realized through the conversion, the consequences that the detected key points of the human body are inaccurate and the calculation of the action score is inaccurate due to the size of the human body, the distance of a camera when the video is collected and different positions of the human body in the video are avoided, and the result is more reasonable.
B. Subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;
assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinates of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence (namely the sequence of frames in the video) to obtain a human body key point time sequence with the length of a-1;
assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence (namely the sequence of frames in the video), so that a human body key point time sequence with the length of B-1 is obtained.
According to the invention, the distance between the sequences is aligned and calculated through an improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, which comprises the following steps:
comparing action video and standard volleyball test action video after obtaining human body key point time sequenceThe similarity of the ball motion videos is because the time sequences have different lengths, and at this time, a Dynamic Time Warping (DTW) algorithm is used to align the time sequences, but DTW always has the problems of high time complexity and ill-conditioned warping, and the existing improved algorithm is as shown in fig. 6, and a rectangular coordinate system is constructed, wherein OA and OC are respectively on the x axis and the y axis of the rectangular coordinate system, and OA is n and represents the length of the standard volleyball motion time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE. Therefore, the search range of the regular path can be limited in the parallelogram OEBD, all positions of the original distance matrix OABC do not need to be calculated, wherein n and m are respectively the length of the standard volleyball action sequence and the student volleyball examination action sequence, and the slopes of OE and OD are respectively 0.5 and 2, and the area of the parallelogram can be easily calculated as
Figure BDA0002788370670000032
C. Constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to be
Figure BDA0002788370670000041
Thus the difference between the area of the existing improved algorithm and the improved algorithm of the present invention is
Figure BDA0002788370670000042
The standard volleyball action is completed within at least 3s, the frame rate of the collected video is 25 frames/s, and the examination action and the standard volleyball action of the students are performed simultaneouslyThe difference between the video frame numbers of the ball motion (i.e. the difference between the time length sequences of the two) will not exceed 20 frames, then according to the existing conditions
Figure BDA0002788370670000043
Can be derived from
Figure BDA0002788370670000044
The improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.
The accumulated distance calculation formula of the original dynamic time integration algorithm is
Figure BDA0002788370670000045
Where D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, rijIs the euclidean distance of i and j in the two sequences.
D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):
Figure BDA0002788370670000046
in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), rijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r isijIs the cosine distance between the ith point in the time sequence of the human key point of the standard volleyball action video with the length of a-1 and the jth point in the time sequence of the human key point of the action video with the length of b-1;
the polygon OFGBHI limits the calculation search range of formula (ii) in the polygon OFGBHI, and the area of the polygon affects the search range and time of the algorithm, thereby affecting the operation efficiency of the dynamic time warping algorithm. The cosine distance can represent the difference of volleyball actions, and is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the deviation degree of the regular path to the diagonal, so that the regular path is more reasonable.
E. And after the accumulative distance of each human body key point is calculated, adding the accumulative distances to obtain an average value as a final accumulative distance, judging the similarity between the two sequences according to the size of the final accumulative distance, wherein the closer the distance, the higher the similarity is, when the accumulative distance D (i, j) is less than 0.5, the standard volleyball action is judged, and otherwise, the standard volleyball action is not judged.
The invention has the beneficial effects that:
1. the invention solves the judgment of standard volleyball actions, liberates manpower, greatly improves the efficiency, saves a large amount of manpower, material resources and financial resources and also lightens the workload of teachers. The implementation method is simple, clear in thought, good in economic value and worthy of popularization and application.
2. The invention combines the research content in the field of artificial intelligence with the actual volleyball test, realizes the application of the technology on the ground and is beneficial to promoting the combination and development of production and research.
3. The invention improves the original dynamic time warping algorithm, shortens the operation time, is simpler than the existing improved algorithm, and improves the operation efficiency of the algorithm.
Drawings
FIG. 1 is a schematic diagram of a network structure of an attitude estimation network OpenPose;
FIG. 2 is a schematic diagram of a network structure of the target detection network YOLOv 3;
FIG. 3 is a schematic flow chart of a volleyball action recognition method based on an improved dynamic time warping algorithm according to the present invention;
FIG. 4 is a schematic diagram of 18 key points of a human body according to the present invention;
FIG. 5 is a schematic diagram of human body for human body key point and target detection according to the present invention;
FIG. 6 is a schematic diagram of an improved algorithm for raw dynamic time warping;
FIG. 7 is a schematic diagram of an improved algorithm for dynamic time warping according to the present invention;
FIG. 8 is a schematic diagram of a human body key point sequence in a standard motion video and a volleyball test motion video according to the present invention;
FIG. 9 is a graph showing the alignment of two sequences of FIG. 8 according to the present invention;
FIG. 10 is a diagram illustrating an image of a frame in the video 1 according to the embodiment;
FIG. 11 is a diagram illustrating an image of a frame in the video 2 according to the embodiment;
fig. 12 is a schematic diagram of a frame image in the video 3 according to the embodiment.
Detailed Description
The present invention will be further described by way of examples, but not limited thereto, with reference to the accompanying drawings.
Example 1
A volleyball action recognition method based on an improved dynamic time warping algorithm is shown in FIG. 3, and comprises the following steps:
firstly, acquiring a real-time action video of volleyball movement through a camera;
secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;
thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;
and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.
Example 2
A volleyball action recognition method based on an improved dynamic time warping algorithm is characterized in that:
the attitude estimation and target detection processing is carried out, and the method comprises the following steps:
inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; the 18 key points of the human body are shown in figure 4; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.
The result after the pose estimation and target detection processing is shown in fig. 5, where rectangle ABCD is the detection frame of human body, and the position coordinates of the human body key points in the image are converted into the position coordinates in the human body frame according to the position of the human body detection frame, for example, the formula for calculating the 8 th key point conversion coordinates of human body is
Figure BDA0002788370670000061
X and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. After the key point is converted, the coordinate of the key point of the next frame is subtracted by the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the video has a frames in total, a-1 new human body key point coordinates are formed in the video for each human body key point. Similarly, a standard volleyball motion video with the frame number b will form new human body key point coordinates for each human body key point b-1.
Acquiring a time sequence of human body key points in the action video, and setting posture estimation and target detection processing to obtain a human body detection frame rectangle ABCD; the method comprises the following steps:
A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame, wherein the calculation formula of the position coordinates of any key point t in the 18 human key points in the human detection frame is shown as a formula (I):
Figure BDA0002788370670000062
in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, the value ranges of x and y are both 0 to 1, and xD、yDRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original imageC、yARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection framet、ytRespectively is the abscissa and ordinate of the human body key point t in the original image, and the upper left corner in the original image is the (0,0) point position;
the normalization processing of the key point coordinates can be realized through the conversion, the consequences that the detected key points of the human body are inaccurate and the calculation of the action score is inaccurate due to the size of the human body, the distance of a camera when the video is collected and different positions of the human body in the video are avoided, and the result is more reasonable.
B. Subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;
assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinates of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence (namely the sequence of frames in the video) to obtain a human body key point time sequence with the length of a-1;
assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence (namely the sequence of frames in the video), so that a human body key point time sequence with the length of B-1 is obtained.
The distance between the sequences is aligned and calculated through an improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, and the method comprises the following steps:
after obtaining the time sequence of the human body key points, the similarity degree of the volleyball test action video and the standard volleyball action video needs to be compared, because the time sequence has the problem of different lengths, at the moment, a Dynamic Time Warping (DTW) algorithm is used for aligning the time sequence, but the DTW algorithm always has the problems of high time complexity and ill-conditioned warping, the existing improved algorithm is shown in fig. 6, a rectangular coordinate system is constructed, OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE. Therefore, the search range of the regular path can be limited in the parallelogram OEBD, all positions of the original distance matrix OABC do not need to be calculated, wherein n and m are respectively the length of the standard volleyball action sequence and the student volleyball examination action sequence, and the slopes of OE and OD are respectively 0.5 and 2, and the area of the parallelogram can be easily calculated as
Figure BDA0002788370670000071
The improved dynamic time warping algorithm of the present invention is shown in FIG. 7.
C. Constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to be
Figure BDA0002788370670000072
Thus existingThe difference between the area of the improved algorithm and the improved algorithm of the present invention is
Figure BDA0002788370670000073
The time for finishing a standard volleyball action is at least 3s, the frame rate of the collected video is 25 frames/s, and the difference between the number of the video frames of the student examination action and the standard volleyball action (namely the difference between the time length sequences of the two) does not exceed 20 frames, so that the student examination action and the standard volleyball action are finished according to the existing conditions
Figure BDA0002788370670000081
Can be derived from
Figure BDA0002788370670000082
The improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.
The accumulated distance calculation formula of the original dynamic time integration algorithm is
Figure BDA0002788370670000083
Where D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, rijIs the euclidean distance of i and j in the two sequences.
D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):
Figure BDA0002788370670000084
in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), rijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r isijIs a lengthIs the cosine distance between the ith point in the time sequence of the human key points of the standard volleyball action video of a-1 and the jth point in the time sequence of the human key points of the action video with the length of b-1;
the polygon OFGBHI limits the calculation search range of formula (ii) in the polygon OFGBHI, and the area of the polygon affects the search range and time of the algorithm, thereby affecting the operation efficiency of the dynamic time warping algorithm. The cosine distance can represent the difference of volleyball actions, and is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the deviation degree of the regular path to the diagonal, so that the regular path is more reasonable. The two sequences are shown in fig. 8, and the alignment result of the two sequences after calculating the corresponding sequence distance is shown in fig. 9;
E. and after the accumulative distance of each human body key point is calculated, adding the accumulative distances to obtain an average value as a final accumulative distance, judging the similarity between the two sequences according to the size of the final accumulative distance, wherein the closer the distance, the higher the similarity is, when the accumulative distance D (i, j) is less than 0.5, the standard volleyball action is judged, and otherwise, the standard volleyball action is not judged.
FIG. 6 is a schematic diagram of an improved algorithm for raw dynamic time warping; and constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is equal to n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE.
FIG. 7 is a schematic diagram of an improved algorithm for dynamic time warping according to the present invention; firstly, a rectangular coordinate system is constructed, wherein OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is equal to n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed with OA and OC as sides, respectively, and point F, G, H, I is on each of the sides OA, AB, BC, OC.
OF=0.1AO
OI=0.1CO
BH=0.1BC
BG=0.1BA
As shown in fig. 4, the distribution of key points of the human body is shown, and the human body has 18 key points. The positions of the key points of the human body are estimated through postures in deep learning. The pose estimation uses openpos. By processing each frame in the video through attitude estimation, the positions of all human body key points of students in each frame can be obtained. Meanwhile, the video is subjected to target detection, YOLOv3 is used for target detection, results of target detection and human key points are shown in FIG. 5, a rectangle ABCD is a detection frame of a human body, position coordinates of the human key points in an image are converted into position coordinates in the human body frame according to the positions of the detection frame of the human body, for example, a calculation formula of the 8 th key point conversion coordinates of the human body is that
Figure BDA0002788370670000091
X and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. Because the score calculation is needed according to the correctness of the change of the volleyball action direction of the student during the volleyball action test, after the key point conversion is finished, the coordinate of the key point of the next frame is subtracted from the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the new coordinates of each human key point of a frame of the video are shared, a time sequence with the length of a-1 is formed, and 18 key points form 18 time sequences.
After obtaining the time sequence of the key points of the human body, the similarity degree of the volleyball test action video and the standard volleyball action video needs to be compared, because the event sequence has the problem of different lengths, at the moment, the Dynamic Time Warping (DTW) algorithm is used for aligning the time sequence, but the DTW algorithm always has the problems of high time complexity and ill-conditioned warping, the existing improved algorithm is shown in figure 7, the search range of the warping path is limited in the parallelogram OEBD, and the original warping path does not need to be subjected to the original warping algorithmCalculating all positions of an incoming distance matrix OABC, wherein n and m are the lengths of a standard volleyball action sequence and a student volleyball examination action sequence respectively, and the slopes of OE and OD are 0.5 and 2 respectively, and according to coordinates in the figure, the area of a parallelogram can be easily calculated to be
Figure BDA0002788370670000092
Fig. 7 shows an improved dynamic time warping algorithm OF the present invention, in which OF is 0.1AO, OI is 0.1CO, BH is 0.1BC, and BG is 0.1BA, the present invention limits the search range OF the warped path to polygon OFGBHI, and the area OF the polygon OFGBHI can be determined as
Figure BDA0002788370670000093
Thus the difference between the area of the existing improved algorithm and the improved algorithm of the present invention is
Figure BDA0002788370670000094
At least 3s of time is needed for finishing a standard volleyball action, the frame rate of collected video is 25 frames/s, and the difference between the video frame number of the student examination action and the video frame number of the standard volleyball action (namely the difference between the video frame number and the video frame number of the standard volleyball action) does not exceed 20 frames, so that the student examination action and the standard volleyball action are finished according to the existing
Figure BDA0002788370670000101
Can be derived from
Figure BDA0002788370670000102
The improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.
The accumulated distance calculation formula of the original dynamic time integration algorithm is
Figure BDA0002788370670000103
Where D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, rijIs the euclidean distance of i and j in the two sequences. The improved dynamic time warping algorithm of the invention has the accumulated distance calculation formula of
Figure BDA0002788370670000104
D (i, j) is the cumulative cosine distance from the starting point to the point (i, j), rijThe cosine distance between the ith point in the time sequence of the human key point of the standard volleyball action video with the length of a-1 and the jth point in the time sequence of the human key point of the action video with the length of b-1 is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the degree of deviation of the regular path to the diagonal line, so that the regular path is more reasonable. The two sequences are shown in fig. 8, the alignment result of the two sequences after the corresponding sequence distance is calculated is shown in fig. 9, then the similarity between the two sequences is judged according to the distance, the closer the distance is, the higher the similarity is, the more standard the action is, otherwise, the lower the similarity is, the less standard the action is, the average value is added after the accumulative distance of each human key point is calculated to be the final accumulative distance, when the accumulative distance is less than 0.5, the standard volleyball action is judged, otherwise, the difference is not. Table 1 is a comparison of the computation times of the existing modified algorithm and the modified algorithm of the present invention, and it can be seen that the present invention indeed shortens the computation time of the existing modified dynamic time warping algorithm.
TABLE 1
Figure BDA0002788370670000105
Table 2 shows the criterions of volleyball movements automatically calculated according to the method of the present invention for different classmate volleyball exams.
TABLE 2
Figure BDA0002788370670000106
Figure BDA0002788370670000111
In table 2, fig. 10 shows a certain frame image in video 1, fig. 11 shows a certain frame image in video 2, and fig. 12 shows a certain frame image in video 3.

Claims (6)

1. A volleyball action recognition method based on an improved dynamic time warping algorithm is characterized by comprising the following steps:
firstly, acquiring a real-time action video of volleyball movement;
secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;
thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;
and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.
2. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein the posture estimation and target detection processing comprises the following steps:
inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.
3. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein a time sequence of key points of a human body in an action video is obtained, and a human body detection frame rectangle ABCD is obtained after posture estimation and target detection processing are set; the method comprises the following steps:
A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame;
B. subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;
assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinate of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence to obtain a human body key point time sequence with the length of a-1;
assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence to obtain a human body key point time sequence with the length of B-1.
4. The volleyball action recognition method based on the improved dynamic time warping algorithm according to claim 3, wherein converting the position coordinates of the human body key points in the image into the position coordinates of the human body key points in the human body detection frame according to the human body detection frame means: the calculation formula of the position coordinates of any key point t in the 18 human body key points in the human body detection frame is shown as the formula (I):
Figure FDA0002788370660000021
in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, the value ranges of x and y are both 0 to 1, and xD、yDRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original imageC、yARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection framet、ytRespectively the horizontal lines of the key points t of the human body in the original imageCoordinates and vertical coordinates, and the upper left corner in the original image is the position of (0,0) point.
5. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein the distance between the sequences is calculated by aligning through the improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, comprising the following steps:
C. constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to be
Figure FDA0002788370660000022
D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):
Figure FDA0002788370660000023
in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), rijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r isijIs a standard volleyball action video human body key point time sequence with the length of a-1The cosine distance between the ith point and the jth point in the time sequence of the action video human body key point with the length of b-1;
E. and after the accumulative distance of each human body key point is calculated, adding the accumulative distances and taking the average value as the final accumulative distance, and judging as the standard volleyball action when the accumulative distance D (i, j) is less than 0.5, otherwise, judging not to be the standard volleyball action.
6. A volleyball motion recognition method based on an improved dynamic time warping algorithm according to any one of claims 1-5, wherein a real-time motion video of volleyball motion is obtained by a camera.
CN202011306032.6A 2020-11-20 2020-11-20 Volleyball action recognition method based on improved dynamic time warping algorithm Pending CN112446313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011306032.6A CN112446313A (en) 2020-11-20 2020-11-20 Volleyball action recognition method based on improved dynamic time warping algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011306032.6A CN112446313A (en) 2020-11-20 2020-11-20 Volleyball action recognition method based on improved dynamic time warping algorithm

Publications (1)

Publication Number Publication Date
CN112446313A true CN112446313A (en) 2021-03-05

Family

ID=74737537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011306032.6A Pending CN112446313A (en) 2020-11-20 2020-11-20 Volleyball action recognition method based on improved dynamic time warping algorithm

Country Status (1)

Country Link
CN (1) CN112446313A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095248A (en) * 2021-04-19 2021-07-09 中国石油大学(华东) Technical action correction method for badminton
CN113268626A (en) * 2021-05-26 2021-08-17 中国人民武装警察部队特种警察学院 Data processing method and device, electronic equipment and storage medium
CN113409374A (en) * 2021-07-12 2021-09-17 东南大学 Character video alignment method based on motion registration
CN116821713A (en) * 2023-08-31 2023-09-29 山东大学 Shock insulation efficiency evaluation method and system based on multivariable dynamic time warping algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504731A (en) * 2014-12-19 2015-04-08 西安理工大学 Human motion synthesis method based on motion diagram
CN107358171A (en) * 2017-06-22 2017-11-17 华中师范大学 A kind of gesture identification method based on COS distance and dynamic time warping
CN109460702A (en) * 2018-09-14 2019-03-12 华南理工大学 Passenger's abnormal behaviour recognition methods based on human skeleton sequence
CN109934881A (en) * 2017-12-19 2019-06-25 华为技术有限公司 Image encoding method, the method for action recognition and computer equipment
CN111626137A (en) * 2020-04-29 2020-09-04 平安国际智慧城市科技股份有限公司 Video-based motion evaluation method and device, computer equipment and storage medium
CN111860196A (en) * 2020-06-24 2020-10-30 富泰华工业(深圳)有限公司 Hand operation action scoring device and method and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504731A (en) * 2014-12-19 2015-04-08 西安理工大学 Human motion synthesis method based on motion diagram
CN107358171A (en) * 2017-06-22 2017-11-17 华中师范大学 A kind of gesture identification method based on COS distance and dynamic time warping
CN109934881A (en) * 2017-12-19 2019-06-25 华为技术有限公司 Image encoding method, the method for action recognition and computer equipment
CN109460702A (en) * 2018-09-14 2019-03-12 华南理工大学 Passenger's abnormal behaviour recognition methods based on human skeleton sequence
CN111626137A (en) * 2020-04-29 2020-09-04 平安国际智慧城市科技股份有限公司 Video-based motion evaluation method and device, computer equipment and storage medium
CN111860196A (en) * 2020-06-24 2020-10-30 富泰华工业(深圳)有限公司 Hand operation action scoring device and method and computer readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095248A (en) * 2021-04-19 2021-07-09 中国石油大学(华东) Technical action correction method for badminton
CN113095248B (en) * 2021-04-19 2022-10-25 中国石油大学(华东) Technical action correcting method for badminton
CN113268626A (en) * 2021-05-26 2021-08-17 中国人民武装警察部队特种警察学院 Data processing method and device, electronic equipment and storage medium
CN113268626B (en) * 2021-05-26 2024-04-26 中国人民武装警察部队特种警察学院 Data processing method, device, electronic equipment and storage medium
CN113409374A (en) * 2021-07-12 2021-09-17 东南大学 Character video alignment method based on motion registration
CN113409374B (en) * 2021-07-12 2024-05-10 东南大学 Character video alignment method based on action registration
CN116821713A (en) * 2023-08-31 2023-09-29 山东大学 Shock insulation efficiency evaluation method and system based on multivariable dynamic time warping algorithm
CN116821713B (en) * 2023-08-31 2023-11-24 山东大学 Shock insulation efficiency evaluation method and system based on multivariable dynamic time warping algorithm

Similar Documents

Publication Publication Date Title
CN112446313A (en) Volleyball action recognition method based on improved dynamic time warping algorithm
Bheda et al. Using deep convolutional networks for gesture recognition in american sign language
WO2020155873A1 (en) Deep apparent features and adaptive aggregation network-based multi-face tracking method
US20220101654A1 (en) Method for recognizing actions, device and storage medium
CN110738161A (en) face image correction method based on improved generation type confrontation network
CN107292813A (en) A kind of multi-pose Face generation method based on generation confrontation network
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN113516005B (en) Dance action evaluation system based on deep learning and gesture estimation
CN114299559A (en) Finger vein identification method based on lightweight fusion global and local feature network
CN110210462A (en) A kind of bionical hippocampus cognitive map construction method based on convolutional neural networks
CN112381045A (en) Lightweight human body posture recognition method for mobile terminal equipment of Internet of things
CN113111857A (en) Human body posture estimation method based on multi-mode information fusion
CN112257639A (en) Student learning behavior identification method based on human skeleton
CN112906520A (en) Gesture coding-based action recognition method and device
CN107644203A (en) A kind of feature point detecting method of form adaptive classification
CN111507276B (en) Construction site safety helmet detection method based on hidden layer enhanced features
CN116012942A (en) Sign language teaching method, device, equipment and storage medium
Xu et al. Isolated Word Sign Language Recognition Based on Improved SKResNet-TCN Network
CN115761901A (en) Horse riding posture detection and evaluation method
CN115346640A (en) Intelligent monitoring method and system for closed-loop feedback of functional rehabilitation training
CN114120371A (en) System and method for diagram recognition and action correction
CN111353509B (en) Key point extractor generation method of visual SLAM system
CN111738099B (en) Face automatic detection method based on video image scene understanding
Sun et al. Chinese sign language key action recognition based on extenics immune neural network
CN114187663A (en) Method for controlling unmanned aerial vehicle by posture based on radar detection gray level graph and neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210305