CN112446313A

CN112446313A - Volleyball action recognition method based on improved dynamic time warping algorithm

Info

Publication number: CN112446313A
Application number: CN202011306032.6A
Authority: CN
Inventors: 周斌; 吕传栋; 周洪超; 张艺
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-03-05

Abstract

The invention relates to a volleyball action recognition method based on an improved dynamic time warping algorithm, which comprises the following steps: firstly, acquiring a real-time action video of volleyball movement; secondly, performing attitude estimation and target detection processing on the motion video to obtain a time sequence of key points of the human body in the motion video and a time sequence of each key point of the human body in the standard volleyball motion video; and finally, aligning and calculating the distance between the sequences by an improved dynamic time warping algorithm, and judging the volleyball action accuracy according to the sequence distance between the sequences and the distance between the sequences. The invention solves the judgment of standard volleyball actions, greatly improves the efficiency, saves a large amount of manpower, material resources and financial resources and also can lighten the work burden of teachers. The implementation method is simple, clear in thought, good in economic value and worthy of popularization and application. The invention combines the research content in the field of artificial intelligence with the actual volleyball test, realizes the application of the technology on the ground and is beneficial to promoting the combination and development of production and research.

Description

Volleyball action recognition method based on improved dynamic time warping algorithm

Technical Field

The invention relates to a volleyball action recognition method based on an improved dynamic time warping algorithm, and belongs to the field of artificial intelligence.

Background

In recent years, with the rapid development of volleyball sports and the emphasis of China on volleyball sports, volleyball teaching and training become more and more important, and many schools develop corresponding courses for volleyball teaching.

When the volleyball test is carried out, a teacher scores the scores according to the standard degree of student action, a large amount of time and manpower can be consumed in the process, the workload of the teacher is invisibly increased, and meanwhile, the fairness of the test can be influenced to a certain degree by manually scoring the volleyball action test. If this process can realize automatic processing through artificial intelligence technique, can effectively improve the efficiency of score calculation undoubtedly, save a large amount of manpower, material resources and financial resources, improve the fairness of volleyball examination simultaneously.

At present, when volleyball action recognition is carried out, accuracy of volleyball action judgment is reduced due to the fact that a target detection part is not added, and meanwhile, efficiency of volleyball action recognition is reduced by using a traditional dynamic time warping algorithm.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a volleyball action recognition method based on an improved dynamic time warping algorithm.

Interpretation of terms:

1. as shown in fig. 4, the human body has 18 key points, and the 18 key points are specifically 0 nose, 1 neck, 2 right shoulder, 3 right elbow, 4 right wrist, 5 left shoulder, 6 left elbow, 7 left wrist, 8 right hip, 9 right knee, 10 right ankle, 11 left hip, 12 left knee, 13 left ankle, 14 right eye, 15 left eye, 16 right ear, and 17 left ear.

2. The pose estimation network OpenPose is a human body pose recognition project, is an open source library developed by the United states Carnasky Meilong University (CMU) based on a convolutional neural network and supervised learning, can realize pose estimation of human body actions, finger motions and the like, has excellent robustness, and is the first real-time multi-user two-dimensional pose estimation application based on deep learning in the world. The OpenPose network structure is shown in fig. 1 and mainly comprises two branches, wherein one branch is used for predicting a confidence map and detecting the positions of key points, the other branch is used for detecting effective connection among the key points, each branch has t stages which are increasingly detailed, each stage fuses feature maps, the position confidence map and the connection in a picture are extracted through a convolutional neural network, and the two stages are combined to output the posture of each person through reasoning and map matching.

3. The target detection networks Yolov3, Yolov3 are classical target detection methods, and can directly use a neural network to output the detection result, namely the position and the class probability of the detection frame, so that the speed is high and the real-time effect can be achieved, the Yolov3 network structure is shown in figure 2, the input of the network is a picture, the output layers of the network are 3, and the output is made on feature graphs with different sizes. Darknet53 is the basic structure of the network, DBL is the basic component of YOLOv3, the corresponding structure is shown in FIG. 1, wherein Conv is the convolutional layer, BN is Batch Normalization, Leakyrelu is an activation function, and YOLOv3 adopts a residual structure, so that the network structure can be deeper.

The technical scheme of the invention is as follows:

a volleyball action recognition method based on an improved dynamic time warping algorithm comprises the following steps:

firstly, acquiring a real-time action video of volleyball movement through a camera;

secondly, respectively carrying out posture estimation and target detection processing on the obtained action video and the standard volleyball action video to obtain the coordinates of key points of the human body;

thirdly, acquiring a time sequence of the human body key points in the action video and a time sequence of each human body key point in the standard volleyball action video according to the obtained coordinates of the human body key points;

and finally, aligning and calculating the distance between the time sequence of the key points of the human body in the action video and the time sequence of each key point of the human body in the standard volleyball action video through an improved dynamic time warping algorithm, and judging the accuracy of volleyball action in the action video.

According to the present invention, preferably, the attitude estimation and target detection processing includes the steps of:

inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.

The rectangle ABCD is a human body detection frame, and the position coordinates of the human body key points in the image are converted into the position coordinates in the human body frame according to the position of the human body detection frame, for example, the conversion coordinate calculation formula of the 8 th key point of the human body is

X and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. After the key point is converted, the coordinate of the key point of the next frame is subtracted by the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the video has a frames in total, a-1 new human body key point coordinates are formed in the video for each human body key point. Similarly, a standard volleyball motion video with the frame number b will form new human body key point coordinates for each human body key point b-1.

According to the optimization of the invention, the time sequence of the human body key points in the action video is acquired, and the human body detection frame rectangle ABCD is obtained after the posture estimation and the target detection processing are set; the method comprises the following steps:

A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame, wherein the calculation formula of the position coordinates of any key point t in the 18 human key points in the human detection frame is shown as a formula (I):

in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, x andy ranges from 0 to 1, x_D、y_DRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original image_C、y_ARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection frame_t、y_tRespectively is the abscissa and ordinate of the human body key point t in the original image, and the upper left corner in the original image is the (0,0) point position;

the normalization processing of the key point coordinates can be realized through the conversion, the consequences that the detected key points of the human body are inaccurate and the calculation of the action score is inaccurate due to the size of the human body, the distance of a camera when the video is collected and different positions of the human body in the video are avoided, and the result is more reasonable.

B. Subtracting the position coordinate of the human body key point of the previous frame from the position coordinate of the human body key point of the next frame in the human body detection frame to obtain a new human body key point coordinate, wherein the coordinate represents the direction change condition of the human body key point;

assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinates of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence (namely the sequence of frames in the video) to obtain a human body key point time sequence with the length of a-1;

assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence (namely the sequence of frames in the video), so that a human body key point time sequence with the length of B-1 is obtained.

According to the invention, the distance between the sequences is aligned and calculated through an improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, which comprises the following steps:

comparing action video and standard volleyball test action video after obtaining human body key point time sequenceThe similarity of the ball motion videos is because the time sequences have different lengths, and at this time, a Dynamic Time Warping (DTW) algorithm is used to align the time sequences, but DTW always has the problems of high time complexity and ill-conditioned warping, and the existing improved algorithm is as shown in fig. 6, and a rectangular coordinate system is constructed, wherein OA and OC are respectively on the x axis and the y axis of the rectangular coordinate system, and OA is n and represents the length of the standard volleyball motion time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE. Therefore, the search range of the regular path can be limited in the parallelogram OEBD, all positions of the original distance matrix OABC do not need to be calculated, wherein n and m are respectively the length of the standard volleyball action sequence and the student volleyball examination action sequence, and the slopes of OE and OD are respectively 0.5 and 2, and the area of the parallelogram can be easily calculated as

C. Constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on an x axis and a y axis of the rectangular coordinate system, and OA is equal to n and represents the length of a human body key point time sequence of a standard volleyball action; OC ═ m, the sequence length between human body key point time sequences representing the acquired motion, a rectangle OABC is constructed with OA and OC as sides, respectively, points F, G, H, I are on the sides OA, AB, BC and OC, respectively, OF ═ 0.1AO, OI ═ 0.1CO, BH ═ 0.1BC and BG ═ 0.1BA, the search range OF the regular path is limited to polygon OFGBHI, the area OF the polygon OFGBHI is found to be

Thus the difference between the area of the existing improved algorithm and the improved algorithm of the present invention is

The standard volleyball action is completed within at least 3s, the frame rate of the collected video is 25 frames/s, and the examination action and the standard volleyball action of the students are performed simultaneouslyThe difference between the video frame numbers of the ball motion (i.e. the difference between the time length sequences of the two) will not exceed 20 frames, then according to the existing conditions

Can be derived from

The improved dynamic time warping algorithm in the invention can be proved to be smaller and take less time than the search range of the existing improved dynamic time warping algorithm.

The accumulated distance calculation formula of the original dynamic time integration algorithm is

Where D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, r_ijIs the euclidean distance of i and j in the two sequences.

D. The calculation formula of the accumulated distance by improving the dynamic time warping algorithm is shown as the formula (II):

in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), r_ijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r is_ijIs the cosine distance between the ith point in the time sequence of the human key point of the standard volleyball action video with the length of a-1 and the jth point in the time sequence of the human key point of the action video with the length of b-1;

the polygon OFGBHI limits the calculation search range of formula (ii) in the polygon OFGBHI, and the area of the polygon affects the search range and time of the algorithm, thereby affecting the operation efficiency of the dynamic time warping algorithm. The cosine distance can represent the difference of volleyball actions, and is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the deviation degree of the regular path to the diagonal, so that the regular path is more reasonable.

E. And after the accumulative distance of each human body key point is calculated, adding the accumulative distances to obtain an average value as a final accumulative distance, judging the similarity between the two sequences according to the size of the final accumulative distance, wherein the closer the distance, the higher the similarity is, when the accumulative distance D (i, j) is less than 0.5, the standard volleyball action is judged, and otherwise, the standard volleyball action is not judged.

The invention has the beneficial effects that:

1. the invention solves the judgment of standard volleyball actions, liberates manpower, greatly improves the efficiency, saves a large amount of manpower, material resources and financial resources and also lightens the workload of teachers. The implementation method is simple, clear in thought, good in economic value and worthy of popularization and application.

2. The invention combines the research content in the field of artificial intelligence with the actual volleyball test, realizes the application of the technology on the ground and is beneficial to promoting the combination and development of production and research.

3. The invention improves the original dynamic time warping algorithm, shortens the operation time, is simpler than the existing improved algorithm, and improves the operation efficiency of the algorithm.

Drawings

FIG. 1 is a schematic diagram of a network structure of an attitude estimation network OpenPose;

FIG. 2 is a schematic diagram of a network structure of the target detection network YOLOv 3;

FIG. 3 is a schematic flow chart of a volleyball action recognition method based on an improved dynamic time warping algorithm according to the present invention;

FIG. 4 is a schematic diagram of 18 key points of a human body according to the present invention;

FIG. 5 is a schematic diagram of human body for human body key point and target detection according to the present invention;

FIG. 6 is a schematic diagram of an improved algorithm for raw dynamic time warping;

FIG. 7 is a schematic diagram of an improved algorithm for dynamic time warping according to the present invention;

FIG. 8 is a schematic diagram of a human body key point sequence in a standard motion video and a volleyball test motion video according to the present invention;

FIG. 9 is a graph showing the alignment of two sequences of FIG. 8 according to the present invention;

FIG. 10 is a diagram illustrating an image of a frame in the video 1 according to the embodiment;

FIG. 11 is a diagram illustrating an image of a frame in the video 2 according to the embodiment;

fig. 12 is a schematic diagram of a frame image in the video 3 according to the embodiment.

Detailed Description

The present invention will be further described by way of examples, but not limited thereto, with reference to the accompanying drawings.

Example 1

A volleyball action recognition method based on an improved dynamic time warping algorithm is shown in FIG. 3, and comprises the following steps:

Example 2

A volleyball action recognition method based on an improved dynamic time warping algorithm is characterized in that:

the attitude estimation and target detection processing is carried out, and the method comprises the following steps:

inputting each frame of image in the action video into a posture estimation network OpenPose for posture estimation, and outputting position coordinates of 18 human key points in the image; the 18 key points of the human body are shown in figure 4; meanwhile, each frame of image in the motion video is input into the target detection network YOLOv3 for target detection, and the position coordinates of the human body detection frame are output.

The result after the pose estimation and target detection processing is shown in fig. 5, where rectangle ABCD is the detection frame of human body, and the position coordinates of the human body key points in the image are converted into the position coordinates in the human body frame according to the position of the human body detection frame, for example, the formula for calculating the 8 th key point conversion coordinates of human body is

Acquiring a time sequence of human body key points in the action video, and setting posture estimation and target detection processing to obtain a human body detection frame rectangle ABCD; the method comprises the following steps:

in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, the value ranges of x and y are both 0 to 1, and x_D、y_DRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original image_C、y_ARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection frame_t、y_tRespectively is the abscissa and ordinate of the human body key point t in the original image, and the upper left corner in the original image is the (0,0) point position;

The distance between the sequences is aligned and calculated through an improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, and the method comprises the following steps:

after obtaining the time sequence of the human body key points, the similarity degree of the volleyball test action video and the standard volleyball action video needs to be compared, because the time sequence has the problem of different lengths, at the moment, a Dynamic Time Warping (DTW) algorithm is used for aligning the time sequence, but the DTW algorithm always has the problems of high time complexity and ill-conditioned warping, the existing improved algorithm is shown in fig. 6, a rectangular coordinate system is constructed, OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE. Therefore, the search range of the regular path can be limited in the parallelogram OEBD, all positions of the original distance matrix OABC do not need to be calculated, wherein n and m are respectively the length of the standard volleyball action sequence and the student volleyball examination action sequence, and the slopes of OE and OD are respectively 0.5 and 2, and the area of the parallelogram can be easily calculated as

The improved dynamic time warping algorithm of the present invention is shown in FIG. 7.

Thus existingThe difference between the area of the improved algorithm and the improved algorithm of the present invention is

The time for finishing a standard volleyball action is at least 3s, the frame rate of the collected video is 25 frames/s, and the difference between the number of the video frames of the student examination action and the standard volleyball action (namely the difference between the time length sequences of the two) does not exceed 20 frames, so that the student examination action and the standard volleyball action are finished according to the existing conditions

Can be derived from

in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), r_ijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r is_ijIs a lengthIs the cosine distance between the ith point in the time sequence of the human key points of the standard volleyball action video of a-1 and the jth point in the time sequence of the human key points of the action video with the length of b-1;

the polygon OFGBHI limits the calculation search range of formula (ii) in the polygon OFGBHI, and the area of the polygon affects the search range and time of the algorithm, thereby affecting the operation efficiency of the dynamic time warping algorithm. The cosine distance can represent the difference of volleyball actions, and is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the deviation degree of the regular path to the diagonal, so that the regular path is more reasonable. The two sequences are shown in fig. 8, and the alignment result of the two sequences after calculating the corresponding sequence distance is shown in fig. 9;

FIG. 6 is a schematic diagram of an improved algorithm for raw dynamic time warping; and constructing a rectangular coordinate system, wherein OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is equal to n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed by taking OA and OC as sides, wherein the slopes of OE and BD are 0.5, the slopes of OD and BE are 2, a point D is the intersection point of OD and BD, and a point E is the intersection point of OE and BE.

FIG. 7 is a schematic diagram of an improved algorithm for dynamic time warping according to the present invention; firstly, a rectangular coordinate system is constructed, wherein OA and OC are respectively arranged on the x axis and the y axis of the rectangular coordinate system, and OA is equal to n and represents the length of the standard volleyball action time sequence. OC ═ m, which represents the motion video time series length. A rectangle OABC is constructed with OA and OC as sides, respectively, and point F, G, H, I is on each of the sides OA, AB, BC, OC.

OF＝0.1AO

OI＝0.1CO

BH＝0.1BC

BG＝0.1BA

As shown in fig. 4, the distribution of key points of the human body is shown, and the human body has 18 key points. The positions of the key points of the human body are estimated through postures in deep learning. The pose estimation uses openpos. By processing each frame in the video through attitude estimation, the positions of all human body key points of students in each frame can be obtained. Meanwhile, the video is subjected to target detection, YOLOv3 is used for target detection, results of target detection and human key points are shown in FIG. 5, a rectangle ABCD is a detection frame of a human body, position coordinates of the human key points in an image are converted into position coordinates in the human body frame according to the positions of the detection frame of the human body, for example, a calculation formula of the 8 th key point conversion coordinates of the human body is that

X and y are respectively horizontal and vertical coordinates of each point in the image, the position of the (0,0) point is arranged at the upper left corner in the image, normalization processing of the coordinates of the key points can be realized through conversion, and the consequences of inaccurate detected human key points and inaccurate calculation of action scores caused by the size of a human body, the distance of a camera when the video is collected, and different positions of the human body in the video are avoided, so that the result is more reasonable. Because the score calculation is needed according to the correctness of the change of the volleyball action direction of the student during the volleyball action test, after the key point conversion is finished, the coordinate of the key point of the next frame is subtracted from the coordinate of the previous frame to obtain a new coordinate, wherein the new coordinate represents the direction change condition of the key point. If the new coordinates of each human key point of a frame of the video are shared, a time sequence with the length of a-1 is formed, and 18 key points form 18 time sequences.

After obtaining the time sequence of the key points of the human body, the similarity degree of the volleyball test action video and the standard volleyball action video needs to be compared, because the event sequence has the problem of different lengths, at the moment, the Dynamic Time Warping (DTW) algorithm is used for aligning the time sequence, but the DTW algorithm always has the problems of high time complexity and ill-conditioned warping, the existing improved algorithm is shown in figure 7, the search range of the warping path is limited in the parallelogram OEBD, and the original warping path does not need to be subjected to the original warping algorithmCalculating all positions of an incoming distance matrix OABC, wherein n and m are the lengths of a standard volleyball action sequence and a student volleyball examination action sequence respectively, and the slopes of OE and OD are 0.5 and 2 respectively, and according to coordinates in the figure, the area of a parallelogram can be easily calculated to be

Fig. 7 shows an improved dynamic time warping algorithm OF the present invention, in which OF is 0.1AO, OI is 0.1CO, BH is 0.1BC, and BG is 0.1BA, the present invention limits the search range OF the warped path to polygon OFGBHI, and the area OF the polygon OFGBHI can be determined as

At least 3s of time is needed for finishing a standard volleyball action, the frame rate of collected video is 25 frames/s, and the difference between the video frame number of the student examination action and the video frame number of the standard volleyball action (namely the difference between the video frame number and the video frame number of the standard volleyball action) does not exceed 20 frames, so that the student examination action and the standard volleyball action are finished according to the existing

Can be derived from

Where D (i, j) is the cumulative Euclidean distance from the starting point to the (i, j) point, r_ijIs the euclidean distance of i and j in the two sequences. The improved dynamic time warping algorithm of the invention has the accumulated distance calculation formula of

D (i, j) is the cumulative cosine distance from the starting point to the point (i, j), r_ijThe cosine distance between the ith point in the time sequence of the human key point of the standard volleyball action video with the length of a-1 and the jth point in the time sequence of the human key point of the action video with the length of b-1 is more suitable for application in the scene, and the coefficient 0.9 in 0.9D (i-1, j-1) can control the degree of deviation of the regular path to the diagonal line, so that the regular path is more reasonable. The two sequences are shown in fig. 8, the alignment result of the two sequences after the corresponding sequence distance is calculated is shown in fig. 9, then the similarity between the two sequences is judged according to the distance, the closer the distance is, the higher the similarity is, the more standard the action is, otherwise, the lower the similarity is, the less standard the action is, the average value is added after the accumulative distance of each human key point is calculated to be the final accumulative distance, when the accumulative distance is less than 0.5, the standard volleyball action is judged, otherwise, the difference is not. Table 1 is a comparison of the computation times of the existing modified algorithm and the modified algorithm of the present invention, and it can be seen that the present invention indeed shortens the computation time of the existing modified dynamic time warping algorithm.

TABLE 1

Table 2 shows the criterions of volleyball movements automatically calculated according to the method of the present invention for different classmate volleyball exams.

TABLE 2

In table 2, fig. 10 shows a certain frame image in video 1, fig. 11 shows a certain frame image in video 2, and fig. 12 shows a certain frame image in video 3.

Claims

1. A volleyball action recognition method based on an improved dynamic time warping algorithm is characterized by comprising the following steps:

firstly, acquiring a real-time action video of volleyball movement;

2. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein the posture estimation and target detection processing comprises the following steps:

3. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein a time sequence of key points of a human body in an action video is obtained, and a human body detection frame rectangle ABCD is obtained after posture estimation and target detection processing are set; the method comprises the following steps:

A. converting the position coordinates of the human key points in the image into the position coordinates of the human key points in the human detection frame according to the human detection frame;

assuming that the obtained action video has a frame image, processing the action video in the steps A to B to obtain a-1 new human body key point coordinate of each human body key point, and arranging the a-1 new human body key point coordinates of each human body key point according to a time sequence to obtain a human body key point time sequence with the length of a-1;

assuming that the standard volleyball action video has B frames of images, B-1 new human body key point coordinates of each human body key point are obtained after the processing of the steps A to B, and the B-1 new human body key point coordinates of each human body key point are arranged according to a time sequence to obtain a human body key point time sequence with the length of B-1.

4. The volleyball action recognition method based on the improved dynamic time warping algorithm according to claim 3, wherein converting the position coordinates of the human body key points in the image into the position coordinates of the human body key points in the human body detection frame according to the human body detection frame means: the calculation formula of the position coordinates of any key point t in the 18 human body key points in the human body detection frame is shown as the formula (I):

in the formula (I), t is more than or equal to 0 and less than or equal to 17, x and y are new horizontal and vertical coordinates formed, the value ranges of x and y are both 0 to 1, and x_D、y_DRespectively is the abscissa, the ordinate and the x of the D point in the human body detection frame in the original image_C、y_ARespectively representing the abscissa of the point C in the original image and the ordinate, x, of the point A in the original image in the human body detection frame_t、y_tRespectively the horizontal lines of the key points t of the human body in the original imageCoordinates and vertical coordinates, and the upper left corner in the original image is the position of (0,0) point.

5. The volleyball action recognition method based on the improved dynamic time warping algorithm as claimed in claim 1, wherein the distance between the sequences is calculated by aligning through the improved dynamic time warping algorithm, and the volleyball action accuracy is judged according to the sequence distance between the sequences, comprising the following steps:

in formula (II), D (i, j) is the accumulated cosine distance from the starting point to the point (i, j), r_ijThe cosine distance between the human key point time sequence with the length of a-1 and the cosine distance between the point i and the point j in the human key point time sequence with the length of b-1, the starting point is the first point of the two time sequences of the human key point time sequence of the standard volleyball action video and the human key point time sequence of the action video, and the point (i, j) refers to the point j in the time sequence of the point i in the human key point time sequence of the standard volleyball action video and the human key point of the action video with the length of b-1; r is_ijIs a standard volleyball action video human body key point time sequence with the length of a-1The cosine distance between the ith point and the jth point in the time sequence of the action video human body key point with the length of b-1;

E. and after the accumulative distance of each human body key point is calculated, adding the accumulative distances and taking the average value as the final accumulative distance, and judging as the standard volleyball action when the accumulative distance D (i, j) is less than 0.5, otherwise, judging not to be the standard volleyball action.

6. A volleyball motion recognition method based on an improved dynamic time warping algorithm according to any one of claims 1-5, wherein a real-time motion video of volleyball motion is obtained by a camera.