CN114550071A

CN114550071A - Method, device and medium for automatically identifying and capturing track and field video action key frames

Info

Publication number: CN114550071A
Application number: CN202210280271.1A
Authority: CN
Inventors: 林平; 李瀚懿
Original assignee: Beijing Yiti Technology Co ltd
Current assignee: One Body Technology Co.,Ltd.
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-05-27
Anticipated expiration: 2042-03-22
Also published as: CN114550071B

Abstract

The invention discloses a method, a device and a medium for automatically identifying and capturing a track and field video action key frame, wherein the method comprises the following steps: automatically identifying and capturing to obtain four groups of action key frames based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model respectively; selecting from four groups of action key frames in sequence to form a first set according to each key action; removing the action key frames beyond the range by using the key frame confidence space; and outputting the final result of automatic identification and capture of the key frames according to the number of the residual action key frames in the first set. According to the method and the device, the action key frames are automatically identified and captured, the efficiency of identifying and capturing the action key frames of the track and field video is improved, and the accuracy of identification is improved through matching of the key frame confidence space and the key action semantics.

Description

Method, device and medium for automatically identifying and capturing track and field video action key frames

Technical Field

The invention relates to the technical field of video analysis, in particular to a method and a device for automatically identifying and capturing a field video action key frame and a computer readable medium.

Background

With the development of science and technology, more and more Artificial Intelligence (AI) analysis technologies are applied to track and field project training, and the daily training of players is scientifically and reasonably guided by calculating the running parameters of the players, such as the pace, the stride, the instantaneous speed, the landing time, the flight time and the like, so that the training effect is improved.

AI analysis of track and field videos is based on key actions of various items, such as start-up, run-through of 100 meter items; the first jump (single foot jump), the second jump (striding jump), the last jump (jump), the soaring peak and the landing of the reunion; and the key actions of starting, crossing, emptying, landing and the like of the 110 columns of items.

At present, action key frames of track and field project video AI analysis are marked manually, and the efficiency is low.

In view of this, there is a need for improving the manual marking of action key frames in the AI analysis of the existing track and field videos, so as to automatically identify and capture the key frames from the track and field videos, and improve the efficiency of the AI analysis of the track and field key actions.

Disclosure of Invention

In view of the above drawbacks, the technical problem to be solved by the present invention is to provide a method and a device for automatically identifying and capturing an action key frame of a track and field video, so as to solve the problem of low efficiency caused by manually marking the action key frame at present.

Therefore, the invention provides a method for automatically identifying and capturing a track and field video action key frame, which comprises the following steps:

automatically identifying and capturing a first group, a second group, a third group and a fourth group of action key frames of each key action based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position region model respectively;

selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset by each motion item to form a first set corresponding to each key action;

removing action key frames beyond the range of the confidence space from the first set by using the key frame confidence space of each key action;

outputting a final result of automatic identification and capture of the key frames according to the number of the residual action key frames in the first set; wherein if only one action key frame remains in the first set, the action key frame is the final result; otherwise, selecting one action key frame as a final result according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, wherein the key action semantics are obtained by deep learning of the skeleton posture and key actions of the human body key frame.

In the above method, preferably, if a unique action key frame cannot be selected as a final result according to the matching degree between the skeleton information in each action key frame in the first set and the key action semantics, averaging the frame numbers of all the action key frames in the first set to obtain an average value, and taking an action key frame closest to the average value as the final result.

In the above method, preferably, the automatic identification and capture of motion key frames based on the key motion model comprises the following steps:

presetting a key action model of each key action, wherein the key action model comprises the movement direction of each key skeleton point in each key action and is respectively defined as the preset movement direction;

judging the motion direction of the corresponding key skeleton point in the current video frame based on the difference between the key skeleton point position in the current video frame and the key skeleton point position in the next frame, and respectively defining the motion direction as the current motion direction;

and judging whether the current video frame is an action key frame or not by comparing whether the current motion direction is the same as the preset motion direction or not.

In the above method, preferably, frame-assisted judgment is performed on the automatic identification and capture of motion keys based on the key motion model through the sports item, the height of the center of mass of the human skeleton and the position of the athlete.

In the above method, preferably, the automatic identification and capture of the action key frame based on the action trend includes the following steps:

presetting an action trend model which comprises a track direction for representing the possible movement of the centroid;

capturing a portrait activity area from the track and field video by using a rectangular frame;

in the portrait activity area, acquiring the barycenter position of the athlete in each frame of video image by using a human body skeleton attitude algorithm;

based on the action trend model, the seed distribution of the centroids of the skeletons in the track and field video frames in the preset number in the possible moving track direction is obtained by utilizing a seed optimization algorithm, the action trend of the athlete is obtained according to the seed distribution of the centroids in the possible moving track direction, and the initial starting point of the action trend is an action key frame.

In the above method, preferably, the automatically identifying and grabbing the action key frame based on the predefined location area comprises the following steps:

establishing each key position area of each motion item by performing big data analysis on the track and field video;

and intercepting the clearest and most complete video frame in the track and field video in the key position area as an action key frame.

In the above-described method, it is preferable that,

key actions for a 100 meter running project include: a start run action and a mid-way run action;

the key actions of the three-level jump project comprise: a take-off action and a take-off action, wherein the take-off action comprises a single-foot jump, a striding jump and a jump;

the key actions for the 110 column items include: a hurdle action and an inter-hurdle running action, wherein the hurdle action includes starting to hurdle, vacating a hurdle, and landing a hurdle.

In the above method, preferably, the key action semantics corresponding to the corresponding key action skeleton information are obtained by deep learning of the key actions of each track and field project.

The invention also provides a device for automatically identifying and grabbing the track and field video action key frames, which comprises the following components:

the recognition and capture module is used for automatically recognizing and capturing a first group, a second group, a third group and a fourth group of action key frames of each key action based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model;

the key frame collection module is used for sequentially selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames according to each key action preset by each motion item to form a first set corresponding to each key action;

the key frame removing module is used for removing action key frames exceeding the range of the confidence space from the first set by utilizing the key frame confidence space of each key action;

the key frame output module is used for outputting the final result of automatic key frame identification and capture according to the number of the remaining action key frames in the first set; wherein if only one action key frame remains in the first set, the action key frame is the final result; otherwise, selecting one action key frame as a final result according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, wherein the key action semantics are obtained by deep learning of the skeleton posture and key actions of the human body key frame.

In the above-described apparatus, it is preferable that,

the key action model comprises the motion direction of each key skeleton point in each key action, which is respectively defined as a preset motion direction;

the identification and capture module judges the motion direction of the corresponding key skeleton point in the current video frame based on the difference between the key skeleton point position in the current video frame and the key skeleton point position in the next frame, and respectively defines the motion direction as each current motion direction; and judging whether the current video frame is a key frame or not by comparing whether each current motion direction in the current video frame is the same as each preset motion direction in each key action model or not.

The present invention also provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the above-described method for automatic recognition and capture of track and field video action key frames.

According to the technical scheme, the method, the device and the computer readable medium for automatically identifying and capturing the track and field video action key frames solve the problem that the efficiency is low when the action key frames are manually marked at present. Compared with the prior art, the invention has the following beneficial effects:

firstly, based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model, action key frames are automatically identified and captured, and the efficiency is improved.

Secondly, selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset by each motion item to form a first set corresponding to each key action; removing action key frames beyond the range of the confidence space from the first set by using the key frame confidence space of each key action; and outputting the final result of automatic identification and capture of the key frames according to the number of the residual action key frames in the first set, thereby improving the accuracy of identification of the action key frames.

Thirdly, according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, one action key frame which is most matched is selected as a final result, and the accuracy of action key frame identification is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments of the present invention or the prior art will be briefly described and explained. It is obvious that the drawings in the following description are only some embodiments of the invention, from which the drawings of the players can be derived without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an automatic track and field video action key frame recognition and capture method provided by the present invention;

FIG. 2 is a schematic diagram of keyframe skeleton information corresponding to a starting action of a 100-meter project in the present invention;

FIG. 3 is a diagram illustrating key frame skeleton information corresponding to a 100 m item run-in action according to the present invention;

FIG. 4 is a key frame skeleton information diagram of a single-foot jump action corresponding to a three-level jump project in the present invention;

FIG. 5 is a key frame skeleton information diagram of a step jump action corresponding to a three-level jump project in the present invention;

FIG. 6 is a key frame skeleton information diagram of a jumping motion corresponding to a three-level jumping project according to the present invention;

FIG. 7 is a diagram illustrating key frame skeleton information corresponding to a hurdle item starting a hurdle attack action according to the present invention;

FIG. 8 is a diagram illustrating key frame skeleton information corresponding to a column-crossing event in the present invention;

FIG. 9 is a diagram illustrating key frame skeleton information of a landing action in the hurdle item soaring according to the present invention;

FIG. 10 is a diagram of a key motion model in the present invention;

FIG. 11 is a diagram illustrating a preset action trend model according to the present invention;

FIG. 12 is a schematic diagram of a motion region for capturing a human image according to the present invention;

fig. 13 is a schematic diagram of seed distribution in the corresponding direction obtained by using a seed optimization algorithm in the present invention.

Detailed Description

The technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All embodiments of athletes obtained by those skilled in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.

The realization principle of the invention is as follows:

automatically identifying and capturing a first group, a second group, a third group and a fourth group of action key frames of each key action based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position region model;

and outputting the final result of automatic identification and capture of the key frames according to the number of the remaining action key frames in the first set.

The scheme provided by the invention realizes automatic identification and capture of the track and field video action key frames, and improves the identification accuracy rate through key frame confidence space and key action semantic analysis.

In order to make the technical solution and implementation of the present invention more clearly explained and illustrated, several preferred embodiments for implementing the technical solution of the present invention are described below.

It should be noted that the terms of orientation such as "inside, outside", "front, back" and "left and right" are used herein as reference objects, and it is obvious that the use of the corresponding terms of orientation does not limit the scope of protection of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a method for automatically identifying and capturing a track and field video action key frame, which includes the following steps:

and step 110, automatically identifying and capturing a first group of action key frames of each key action from the track and field video based on the human body key frame skeleton posture model respectively. And automatically identifying and capturing a second group of action key frames of each key action from the track and field video based on the key action model. And automatically identifying and capturing a third group of action key frames of each key action from the track and field video based on the action trend model, and automatically identifying and capturing a fourth group of action key frames of each key action from the track and field video based on the predefined position area.

The name and frame number are recorded for each group of action key frames. E.g., start running keyframe, 00012.

And step 120, selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset in each motion item, and forming a first set corresponding to each key action.

For example: start key frame [ start key frame, 00012; start running keyframe, 00015; … … ].

Step 130, using the key frame confidence space of each key action, removing action key frames beyond the confidence space range from the first set.

And step 140, outputting the final result of automatic identification and capture of the action key frames according to the number of the remaining action key frames in the first set.

If only one action key frame remains in each key action in the first set, taking the action key frame as a final result, and continuously and automatically identifying and capturing the next key action or finishing the process; otherwise, step 150 is performed.

Step 150, selecting an action key frame as a final result according to the matching degree of the skeleton information in each action key frame in the first set and the key action semantics. The key action semantics are obtained by deep learning of the skeleton posture and key actions of the human key frame.

For example:

key actions for a 100 meter running project include: a start run action and a mid-way run action.

The key actions of the three-level jump project comprise: a take-off action, which is similar to a 100 meter running project, and a take-off action, which includes a single-foot jump, a striding jump, and a jump.

And obtaining key action semantics corresponding to corresponding key action skeleton information through deep learning of key actions of track and field projects such as a 100-meter running project, a three-level jumping project and a 110-column project.

For example: fig. 2 corresponds to a 100-meter item starting action, fig. 3 corresponds to a 100-meter item running-halfway action, fig. 4 corresponds to a single-foot jump action of a three-level jump item, fig. 5 corresponds to a step jump action of a three-level jump item, fig. 6 corresponds to a jump action of a three-level jump item, fig. 7 corresponds to a hurdle item starting hurdling action, fig. 8 corresponds to a hurdle item emptying hurdle action, fig. 9 corresponds to a hurdle item emptying landing action, and the like.

The deep learning of the key action semantics can be realized by adopting the prior art.

And if only one action key frame which accords with the key action semantics is available, outputting the action key frame as a final result, and finishing automatic identification and capture. And if more than one action key frame is consistent with the key action semantics, averaging the frame sequence numbers of the action key frames to obtain an average value, outputting the action key frame closest to the average value as a final result, and finishing automatic identification and capture.

In the method, detailed descriptions are automatically recognized and captured based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area.

The method comprises the steps of automatically recognizing and capturing key frames based on human skeleton posture deep learning, obtaining a key frame human skeleton posture model mainly through training of a human skeleton posture of an athlete and setting each key frame, and automatically recognizing and capturing the key frames from a running video of the athlete by using the key frame human skeleton posture model. This technique can be implemented using existing techniques.

Specifically, the method comprises the following steps: take 100 meters starting as an example.

First, a start motion key frame as shown in fig. 2 is obtained by deep learning based on the human skeleton posture, and the angles of the joints in the skeleton posture are calculated and obtained based on the start motion key frame. Then, a running video of the athlete is input, each frame of image of the running video is compared with the starting motion key frame, the angle of each joint is compared, the angle tolerance can be set to be 1-1.5 degrees, and finally the video frames meeting the conditions in the running video are added into the first group of motion key frames.

Secondly, as shown in fig. 10, automatically identifying and capturing action key frames based on key action analysis includes the following steps:

and presetting a key action model of each key action, wherein the key action model comprises the motion direction of each key skeleton point in each key action, and the motion direction is respectively defined as the preset motion direction. Wherein, the selection of key skeleton point includes: left, right hand, left, right elbow, left, right knee, and left, right foot.

And judging the motion direction of the corresponding key skeleton point in the current video frame based on the difference between the key skeleton point position in the current video frame and the key skeleton point position in the next frame, and respectively defining the motion direction as each current motion direction.

And judging whether the current video frame is a key frame or not by comparing whether each current motion direction in the current video frame is the same as each preset motion direction in each key action model or not.

The preset movement direction and the current movement direction both comprise a movement direction and an included angle with a horizontal line, and the included angle is provided with a tolerance of 1-2 degrees.

In this step, the auxiliary judgment can be carried out through the sports item, the height of the mass center and the position of the athlete. For example: in the third-level jump project, the centroid height in the current video frame is low (more than 50% lower than the centroid average height), and the athlete is located in front of the bunker, which indicates that the jumping stage is in progress.

Thirdly, as shown in fig. 11, 12 and 13, the automatic identification and capturing of action key frames based on action trends includes the following steps:

and presetting an action trend model, wherein the action trend model is used for representing the possible moving track of the centroid, and as shown in fig. 11, the action trend model comprises a first direction, a second direction, a third direction, a fourth direction and a fifth direction, wherein the first direction is vertically upward, the second direction is inclined upwards by 45 degrees relative to the horizontal direction, the third direction is horizontal direction, the fourth direction is inclined downwards by 45 degrees relative to the horizontal direction, and the fifth direction is vertically downward.

As shown in fig. 12, a portrait session is captured from a sports video of an athlete using a rectangular box.

And in the portrait activity area, acquiring the centroid position of the athlete of each frame by using a skeleton algorithm.

As shown in fig. 13, based on the action trend model, a seed optimization algorithm is used to obtain a seed distribution in a corresponding direction, so as to accurately obtain an action trend of the athlete, where an initial starting point of the action trend is a key frame.

Automatically identifying and capturing key frames at key positions based on a predefined position area method, comprising the following steps:

and establishing each key position area of each sports item through big data analysis, and automatically intercepting the clearest and most complete picture as a key frame when the athlete enters the key position area.

In step 130, the specific steps of removing the action key frames beyond the confidence space range from the first set by using the key frame confidence space of each key action are as follows:

each key action is a set of key action frame sequences comprising a plurality of key action frames starting to ending from the key action, such as: the start key action includes a total of 50 frames of key action frames from crouch to upright. Each frame of key action frame comprises a plurality of human body key point coordinates, and the coordinates use the gravity center of the human body as the origin of coordinates.

For example, the following 17 key points are commonly used: 0: nose, 1: left eye, 2: right eye, 3: left ear, 4: right ear, 5: left shoulder, 6: right shoulder, 7: left elbow, 8: right elbow, 9: left wrist, 10: right wrist, 11: left crotch, 12: right crotch, 13: left knee, 14: right knee, 15: left ankle, 16: right ankle.

The first step is as follows: in the key action model, 17 key points of each key action frame of a key action frame sequence are stored in a two-dimensional table, wherein the frame number is a row, and the key column is a column. The key action model is a standard key action model which is manually selected and processed in advance, and can be intercepted from a match video or a training video or manually intercepted after being recorded from a match rebroadcast.

Noses [302.12, 305.15, 310.23, 330.56, 210.45, 250.65 … … ],

left eye [ … … ]

Right eye [ … … ]

Left ear [ … … ]

Right ear [ … … ]

Left shoulder [ … … ]

Right shoulder [ … … ]

Left elbow [ … … ]

Right elbow (… …)

Left wrist [ … … ]

Right wrist [ … … ]

Left crotch [ … … ]

Right crotch [ … … ]

Left knee (… …)

Right knee (… …)

Left ankle [ … … ]

Right ankle [ … … ]

The data is similar to the data of a nose and is obtained by similar openposition human skeleton posture recognition. These exemplary data are omitted here and do not affect the implementation of the technical solution.

The numerical values in the above table may be different due to the size of the key motion model image, but basically have no influence on the calculation result due to the fact that the gravity center of the human body is taken as the origin of coordinates and the difference calculation mode is adopted.

And secondly, respectively acquiring the coordinates of 17 corresponding key points in each frame of image in the athlete running video (video to be analyzed), translating the window by taking the sequence length of the key action frames as the window, and calculating the coordinate difference value of each corresponding key point in each frame of athlete running video in the window so as to calculate the score of each key point, wherein for example, if the difference value is 0, the score is 100, the weights of the coordinates x and y are the same, each phase is 10 pixels apart, and the score is reduced by 1.

For example, the perfection score for 17 keypoints in one frame of image is 1700 points. The confidence score can be preset according to the requirement of the accuracy of the action, for example, if the total score of each keypoint in one frame of image is greater than or equal to 85%, the score of 1700 × 85% =1445 is regarded as confidence. Thus, a window will generate a time (window duration) based score curve whose upper and lower fluctuation ranges are confidence spaces, for example: the confidence space fluctuates up and down by 5 points.

And thirdly, obtaining a motion video sequence positioned in the confidence space by translating the window on the motion video, and recording the sequence numbers of the starting frame and the ending frame of the motion video sequence.

And fourthly, judging the action key frame identified in the first set, if the sequence number of the action key frame is positioned in the sequence numbers of the starting frame and the ending frame, keeping the action key frame, otherwise, deleting the action key frame from the first set.

Based on the method, the invention also provides a device for automatically identifying and capturing the track and field video action key frames, which comprises the following steps:

the recognition and capture module is used for automatically recognizing and capturing to obtain a first group, a second group, a third group and a fourth group of action key frames based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model;

In the device, the human body key frame-based skeleton posture model automatic identification and grabbing, the key action model automatic identification and grabbing, the action trend model automatic identification and grabbing and the predefined position area automatic identification and grabbing based on the method can be used.

The method for automatically identifying and capturing the track and field video action key frames can be realized as a computer software program. For example, the present invention also provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the above-described track and field video action key frame automatic identification and capture method.

With the above description of the specific embodiments, compared with the prior art, the method, the device and the computer readable medium for automatically identifying and capturing the track and field video action key frame provided by the invention have the following advantages:

firstly, the action key frames are automatically identified and captured based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model, and the efficiency of identifying and capturing the action key frames of the track and field video is improved.

Fourthly, in the process of automatically identifying and grabbing action key frames based on key action analysis, auxiliary judgment is carried out through the motion items, the height of the mass center and the positions of athletes, and the error rate of identification is reduced.

And fifthly, in the automatic identification and capturing of the action key frames based on the action trend, the seed distribution in the corresponding direction is obtained by using a seed optimization algorithm, so that the action trend of the athlete is accurately obtained, and the method is applied to the field of video capturing for the first time, and the identification of the action key frames is pertinently further improved.

Sixth, in view of the fact that the action key frames are obtained by automatic recognition and capture based on the human body key frame skeleton posture model, the recognition accuracy is poor, a large amount of training data and long modeling time are needed, therefore, according to the scheme of the application, on the basis that the action key frames are obtained by automatic recognition and capture based on the human body key frame skeleton posture model, the action key frames are obtained by automatic recognition and capture based on a key action model, an action trend model and a predefined position area model, the optimal action key frames are obtained in a self-adaptive mode by adopting a confidence space, key action semantic matching, averaging and other combination modes for the obtained action key frames, the recognition rate of the action key frames is up to more than 95%, and the feasibility and the reliability of the method in practical application are improved.

Finally, it should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The present invention is not limited to the above-mentioned preferred embodiments, and any structural changes made under the teaching of the present invention shall fall within the scope of the present invention, which is similar or similar to the technical solutions of the present invention.

Claims

1. A track and field video action key frame automatic identification and capture method is characterized by comprising the following steps:

outputting a final result of automatic identification and capture of the key frames according to the number of the residual action key frames in the first set; wherein if only one action key frame remains in the first set of each key action, the action key frame is taken as a final result; otherwise, selecting one action key frame as a final result according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, wherein the key action semantics are obtained by deep learning of the skeleton posture and key actions of the human body key frame.

2. The method according to claim 1, wherein if a unique action key frame cannot be selected as a final result according to the matching degree of the skeleton information and the key action semantics in each action key frame in the first set of each key action, averaging the frame numbers of all the action key frames in the first set to obtain an average value, and taking an action key frame closest to the average value as the final result.

3. The method of claim 1, wherein automatically identifying and grabbing motion key frames based on a key motion model comprises the steps of:

4. The method of claim 3, wherein the automatic identification and capture of motion keys based on a key motion model is frame-aided determined by the sport item, the height of the centroid of the human skeleton, and the athlete's position.

5. The method of claim 1, wherein automatically identifying and grabbing motion key frames based on motion trends, comprises the steps of:

in the portrait activity area, acquiring the centroid position of the athlete in each frame of video image by using a human body skeleton posture algorithm;

based on the action trend model, the seed distribution of the centroids of the skeletons in the track direction of the possible movement in a preset number of track and field video frames is obtained by utilizing a seed optimization algorithm, the action trend of the athlete is obtained according to the seed distribution of the centroids in the track direction of the possible movement, and the initial starting point of the action trend is an action key frame.

6. The method of claim 1, wherein automatically identifying and grabbing action key frames based on predefined location areas comprises the steps of:

7. The method of claim 3,

8. The method according to claim 7, wherein the key action semantics corresponding to the corresponding key action skeleton information are obtained through deep learning of the key actions of each track and field project.

9. An automatic identification and grabbing device for track and field video action key frames is characterized by comprising the following components:

the key frame output module is used for outputting the final result of automatic key frame identification and capture according to the number of the remaining action key frames in the first set; wherein if only one action key frame remains in the first set, the action key frame is the final result; and otherwise, selecting one action key frame as a final result according to the matching degree of the skeleton information and the key action semantics in each action key frame in the first set, wherein the key action semantics are obtained by deeply learning the skeleton posture and the key action of the human body key frame.

10. A computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the track and field video action key frame automatic identification and grabbing method according to any one of claims 1 to 8.