Disclosure of Invention
In view of the above drawbacks, the technical problem to be solved by the present invention is to provide a method and a device for automatically identifying and capturing an action key frame of a track and field video, so as to solve the problem of low efficiency caused by manually marking the action key frame at present.
Therefore, the invention provides a method for automatically identifying and capturing a track and field video action key frame, which comprises the following steps:
automatically identifying and capturing a first group, a second group, a third group and a fourth group of action key frames of each key action based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position region model respectively;
selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset by each motion item to form a first set corresponding to each key action;
removing action key frames beyond the range of the confidence space from the first set by using the key frame confidence space of each key action;
outputting a final result of automatic identification and capture of the key frames according to the number of the remaining action key frames in the first set; wherein if only one action key frame remains in the first set, the action key frame is the final result; otherwise, selecting one action key frame as a final result according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, wherein the key action semantics are obtained by deep learning of the skeleton posture and key actions of the human body key frame.
In the method, preferably, if a unique action key frame cannot be selected as a final result according to the matching degree between the skeleton information and the key action semantics in each action key frame in the first set, the frame numbers of all the action key frames in the first set are averaged to obtain an average value, and an action key frame closest to the average value is used as the final result.
In the above method, preferably, the automatic identification and capture of motion key frames based on the key motion model comprises the following steps:
presetting a key action model of each key action, wherein the key action model comprises the movement direction of each key skeleton point in each key action and is respectively defined as the preset movement direction;
judging the motion direction of the corresponding key skeleton point in the current video frame based on the difference between the key skeleton point position in the current video frame and the key skeleton point position in the next frame, and respectively defining the motion direction as the current motion direction;
and judging whether the current video frame is an action key frame or not by comparing whether the current motion direction is the same as the preset motion direction or not.
In the above method, preferably, frame-assisted judgment is performed on the automatic identification and capture of motion keys based on the key motion model through the sports item, the height of the center of mass of the human skeleton and the position of the athlete.
In the above method, preferably, the automatic identification and capture of the action key frame based on the action trend includes the following steps:
presetting an action trend model which comprises a track direction for representing the possible movement of the centroid;
capturing a portrait activity area from the track and field video by using a rectangular frame;
in the portrait activity area, acquiring the centroid position of the athlete in each frame of video image by using a human body skeleton posture algorithm;
based on the action trend model, the seed distribution of the centroids of the skeletons in the track and field video frames in the preset number in the possible moving track direction is obtained by utilizing a seed optimization algorithm, the action trend of the athlete is obtained according to the seed distribution of the centroids in the possible moving track direction, and the initial starting point of the action trend is an action key frame.
In the above method, preferably, the automatically identifying and grabbing the action key frame based on the predefined location area comprises the following steps:
establishing each key position area of each motion item by performing big data analysis on the track and field video;
and intercepting the clearest and most complete video frame in the track and field video in the key position area as an action key frame.
In the above-described method, it is preferable that,
key actions for a 100 meter running project include: a start run action and a mid-way run action;
the key actions of the three-level jump project comprise: a take-off action and a take-off action, wherein the take-off action comprises a single-foot jump, a striding jump and a jump;
the key actions for the 110 column items include: a hurdle action and an inter-hurdle running action, wherein the hurdle action includes starting to hurdle, vacating a hurdle, and landing a hurdle.
In the above method, preferably, the key action semantics corresponding to the corresponding key action skeleton information are obtained by deep learning of the key actions of each track and field project.
The invention also provides a device for automatically identifying and grabbing the track and field video action key frames, which comprises the following components:
the recognition and capture module is used for automatically recognizing and capturing a first group, a second group, a third group and a fourth group of action key frames of each key action based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model;
the key frame collection module is used for sequentially selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames according to each key action preset in each motion item to form a first set corresponding to each key action;
the key frame removing module is used for removing action key frames exceeding the range of the confidence space from the first set by utilizing the key frame confidence space of each key action;
the key frame output module is used for outputting the final result of automatic key frame identification and capture according to the number of the remaining action key frames in the first set; wherein if only one action key frame remains in the first set, the action key frame is the final result; otherwise, selecting one action key frame as a final result according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, wherein the key action semantics are obtained by deep learning of the skeleton posture and key actions of the human body key frame.
In the above-described apparatus, it is preferable that,
the key action model comprises the motion direction of each key skeleton point in each key action, which is respectively defined as a preset motion direction;
the recognition and capture module judges the motion direction of the corresponding key skeleton point in the current video frame based on the difference between the key skeleton point position in the current video frame and the key skeleton point position in the next frame, and respectively defines the motion direction as each current motion direction; and judging whether the current video frame is a key frame or not by comparing whether each current motion direction in the current video frame is the same as each preset motion direction in each key action model or not.
The invention also provides a computer readable medium on which a computer program is stored, which, when executed by a processor, implements the above-mentioned track and field video action key frame automatic identification and capture method.
According to the technical scheme, the method, the device and the computer readable medium for automatically identifying and capturing the track and field video action key frames solve the problem that the efficiency is low when the action key frames are manually marked at present. Compared with the prior art, the invention has the following beneficial effects:
firstly, based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model, action key frames are automatically identified and captured, and the efficiency is improved.
Secondly, selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset by each motion item to form a first set corresponding to each key action; removing action key frames beyond the range of the confidence space from the first set by using the key frame confidence space of each key action; and outputting the final result of automatic identification and capture of the key frames according to the number of the residual action key frames in the first set, thereby improving the accuracy of identification of the action key frames.
Thirdly, according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, one action key frame which is most matched is selected as a final result, and the accuracy of action key frame identification is further improved.
Detailed Description
The technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All embodiments of athletes obtained by those skilled in the art without any creative effort based on the embodiments of the present invention belong to the protection scope of the present invention.
The realization principle of the invention is as follows:
automatically identifying and capturing a first group, a second group, a third group and a fourth group of action key frames of each key action based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position region model;
selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset by each motion item to form a first set corresponding to each key action;
removing action key frames beyond the range of the confidence space from the first set by using the key frame confidence space of each key action;
and outputting the final result of automatic identification and capture of the key frames according to the number of the remaining action key frames in the first set.
The scheme provided by the invention realizes automatic identification and capture of the track and field video action key frames, and improves the identification accuracy rate through key frame confidence space and key action semantic analysis.
In order to make the technical solution and implementation of the present invention more clearly explained and illustrated, several preferred embodiments for implementing the technical solution of the present invention are described below.
It should be noted that the terms of orientation such as "inside, outside", "front, back" and "left and right" are used herein as reference objects, and it is obvious that the use of the corresponding terms of orientation does not limit the scope of protection of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for automatically identifying and capturing a track and field video action key frame, which includes the following steps:
and step 110, automatically identifying and capturing a first group of action key frames of each key action from the track and field video based on the human body key frame skeleton posture model respectively. And automatically identifying and capturing a second group of action key frames of each key action from the track and field video based on the key action model. And automatically identifying and capturing a third group of action key frames of each key action from the track and field video based on the action trend model, and automatically identifying and capturing a fourth group of action key frames of each key action from the track and field video based on the predefined position area.
The name and frame number are recorded for each group of action key frames. E.g., start running keyframe, 00012.
And step 120, selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset in each motion item, and forming a first set corresponding to each key action.
For example: start key frame [ start key frame, 00012; start running keyframe, 00015; … … ].
Step 130, using the key frame confidence space of each key action, removing action key frames beyond the confidence space range from the first set.
And step 140, outputting the final result of automatic identification and capture of the action key frames according to the number of the remaining action key frames in the first set.
If only one action key frame remains in each key action in the first set, taking the action key frame as a final result, and continuously and automatically identifying and capturing the next key action or finishing the next key action; otherwise, step 150 is performed.
Step 150, selecting an action key frame as a final result according to the matching degree of the skeleton information in each action key frame in the first set and the key action semantics. The key action semantics are obtained by deep learning of the skeleton posture and key actions of the human key frame.
For example:
key actions for a 100 meter running project include: a start run action and a mid-way run action.
The key actions of the three-level jump project comprise: a take-off action, which is similar to a 100 meter running project, and a take-off action, which includes a single-foot jump, a striding jump, and a jump.
The key actions for the 110 column items include: a hurdle action and an inter-hurdle running action, wherein the hurdle action includes starting to hurdle, vacating a hurdle, and landing a hurdle.
And obtaining key action semantics corresponding to corresponding key action skeleton information through deep learning of key actions of track and field projects such as a 100-meter running project, a three-level jumping project and a 110-column project.
For example: fig. 2 corresponds to a 100-meter item starting action, fig. 3 corresponds to a 100-meter item running-halfway action, fig. 4 corresponds to a single-foot jump action of a three-level jump item, fig. 5 corresponds to a step jump action of a three-level jump item, fig. 6 corresponds to a jump action of a three-level jump item, fig. 7 corresponds to a hurdle item starting hurdling action, fig. 8 corresponds to a hurdle item emptying hurdle action, fig. 9 corresponds to a hurdle item emptying landing action, and the like.
The deep learning of the key action semantics can be realized by adopting the prior art.
And if only one action key frame which accords with the key action semantics is available, outputting the action key frame as a final result, and finishing automatic identification and capture. And if more than one action key frame is consistent with the key action semantics, averaging the frame sequence numbers of the action key frames to obtain an average value, outputting the action key frame closest to the average value as a final result, and finishing automatic identification and capture.
In the above method, the detailed description is automatically recognized and grabbed based on the human body key frame skeleton posture model, the key action model, the action trend model and the predefined position area as follows.
The method comprises the steps of automatically recognizing and capturing key frames based on human skeleton posture deep learning, obtaining a key frame human skeleton posture model mainly through training of a human skeleton posture of an athlete and setting each key frame, and automatically recognizing and capturing the key frames from a running video of the athlete by using the key frame human skeleton posture model. This technique can be implemented using existing techniques.
Specifically, the method comprises the following steps: take 100 meters starting as an example.
First, a start motion key frame as shown in fig. 2 is obtained by deep learning based on the human skeleton posture, and the angles of the joints in the skeleton posture are calculated and obtained based on the start motion key frame. Then, inputting a running video of the athlete, comparing each frame of image of the running video with the starting motion key frame, comparing the angles of all joints, setting the angle tolerance to be 1-1.5 degrees, and finally adding the video frames meeting the conditions in the running video into the first group of motion key frames.
Secondly, as shown in fig. 10, automatically identifying and capturing action key frames based on key action analysis includes the following steps:
and presetting a key action model of each key action, wherein the key action model comprises the motion direction of each key skeleton point in each key action, and the motion direction is respectively defined as the preset motion direction. Wherein, the selection of the key skeleton point comprises the following steps: left, right hand, left, right elbow, left, right knee, and left, right foot.
And judging the motion direction of the corresponding key skeleton point in the current video frame based on the difference between the key skeleton point position in the current video frame and the key skeleton point position in the next frame, and respectively defining the motion direction as each current motion direction.
And judging whether the current video frame is a key frame or not by comparing whether each current motion direction in the current video frame is the same as each preset motion direction in each key action model or not.
The preset movement direction and the current movement direction both comprise a movement direction and an included angle with a horizontal line, and the included angle is provided with a tolerance of 1-2 degrees.
In this step, the auxiliary judgment can be carried out through the sports item, the height of the mass center and the position of the athlete. For example: in the third-level jump project, the height of the centroid in the current video frame is low (more than 50% lower than the average height of the centroid), and the athlete is located in front of the sand pit, which indicates that the jumping stage is in progress.
Thirdly, as shown in fig. 11, 12 and 13, the automatic identification and capturing of action key frames based on action trends includes the following steps:
and presetting an action trend model, wherein the action trend model is used for representing the possible moving track of the centroid, and as shown in fig. 11, the action trend model comprises a first direction, a second direction, a third direction, a fourth direction and a fifth direction, wherein the first direction is vertically upward, the second direction is inclined upwards by 45 degrees relative to the horizontal direction, the third direction is horizontal direction, the fourth direction is inclined downwards by 45 degrees relative to the horizontal direction, and the fifth direction is vertically downward.
As shown in fig. 12, a portrait session is captured from a sports video of an athlete using a rectangular box.
And in the portrait activity area, acquiring the barycenter position of each frame by using a skeleton algorithm.
As shown in fig. 13, based on the action trend model, the seed distribution in the corresponding direction is obtained by using a seed optimization algorithm, so as to accurately obtain the action trend of the athlete, where the initial starting point of the action trend is a key frame.
Automatically identifying and capturing key frames at key positions based on a predefined position area method, comprising the following steps:
and establishing each key position area of each sports item through big data analysis, and automatically intercepting the clearest and most complete picture as a key frame when the athlete enters the key position area.
In step 130, the specific steps of removing the action key frames beyond the confidence space range from the first set by using the key frame confidence space of each key action are as follows:
each key action is a set of key action frame sequences comprising a plurality of key action frames starting to ending from the key action, such as: the start key action includes a total of 50 frames of key action frames from crouch to upright. Each frame of key action frame comprises a plurality of human body key point coordinates, and the coordinates use the gravity center of the human body as the origin of coordinates.
For example, the following 17 key points are commonly used: 0: nose, 1: left eye, 2: right eye, 3: left ear, 4: right ear, 5: left shoulder, 6: right shoulder, 7: left elbow, 8: right elbow, 9: left wrist, 10: right wrist, 11: left crotch, 12: right crotch, 13: left knee, 14: right knee, 15: left ankle, 16: right ankle.
The first step is as follows: in the key action model, 17 key points of each key action frame of a key action frame sequence are stored in a two-dimensional table, the frame number is a row, and the key column is a column. The key action model is a standard key action model which is manually selected and processed in advance, and can be intercepted from a match video or a training video or manually intercepted after being recorded from a match rebroadcast.
Noses [302.12, 305.15, 310.23, 330.56, 210.45, 250.65 … … ],
left eye [ … … ]
Right eye [ … … ]
Left ear [ … … ]
Right ear [ … … ]
Left shoulder [ … … ]
Right shoulder [ … … ]
Left elbow [ … … ]
Right elbow (… …)
Left wrist [ … … ]
Right wrist [ … … ]
Left crotch [ … … ]
Right crotch [ … … ]
Left knee (… …)
Right knee (… …)
Left ankle [ … … ]
Right ankle [ … … ]
The data is similar to the data of a nose and is obtained by similar openposition human skeleton posture recognition. These exemplary data are omitted here and do not affect the implementation of the technical solution.
The numerical values in the above table may be different due to the size of the key motion model image, but basically have no influence on the calculation result due to the fact that the gravity center of the human body is taken as the origin of coordinates and the difference calculation mode is adopted.
Secondly, coordinates of 17 corresponding key points in each frame of image in the athlete running video (video to be analyzed) are respectively obtained, then the length of the sequence of the key action frames is taken as a window, the window is translated, and the coordinate difference value of each corresponding key point in each frame of the athlete running video in the window is calculated, so that the score of each key point is calculated, for example, if the difference value is 0, the score is 100, the weight of coordinates x and y is the same, each phase is 10 pixels apart, and the score is reduced by 1.
For example, the perfection score for 17 keypoints in one frame of image is 1700 points. The confidence score can be preset according to the requirement of the accuracy of the action, for example, if the total score of each keypoint in one frame of image is greater than or equal to 85%, the score of 1700 × 85% =1445 is regarded as confidence. Thus, a window will generate a time (window duration) based score curve whose upper and lower fluctuation ranges are confidence spaces, for example: the confidence space fluctuates up and down by 5 points.
And thirdly, translating the window on the motion video to obtain a motion video sequence positioned in the confidence space, and recording the sequence numbers of the starting frame and the ending frame of the motion video sequence.
And fourthly, judging the action key frame identified in the first set, if the sequence number of the action key frame is positioned in the sequence numbers of the starting frame and the ending frame, keeping the action key frame, otherwise, deleting the action key frame from the first set.
Based on the method, the invention also provides a device for automatically identifying and capturing the field video action key frame, which comprises the following steps:
the recognition and capture module is used for automatically recognizing and capturing to obtain a first group, a second group, a third group and a fourth group of action key frames based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model;
the key frame collection module is used for sequentially selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames according to each key action preset by each motion item to form a first set corresponding to each key action;
the key frame removing module is used for removing action key frames exceeding the range of the confidence space from the first set by utilizing the key frame confidence space of each key action;
the key frame output module is used for outputting the final result of automatic key frame identification and capture according to the number of the remaining action key frames in the first set; wherein if only one action key frame remains in the first set, the action key frame is the final result; otherwise, selecting one action key frame as a final result according to the matching degree of skeleton information in each action key frame in the first set and key action semantics, wherein the key action semantics are obtained by deep learning of the skeleton posture and key actions of the human body key frame.
In the device, the automatic recognition and capture based on the human body key frame skeleton posture model, the automatic recognition and capture based on the key action model, the automatic recognition and capture based on the action trend model and the automatic recognition and capture based on the predefined position area can also be used.
The method for automatically identifying and capturing the track and field video action key frames can be realized as a computer software program. For example, the present invention also provides a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the above-mentioned track and field video action key frame automatic identification and capture method.
With the above description of the specific embodiments, compared with the prior art, the method, the device and the computer readable medium for automatically identifying and capturing the track and field video action key frame provided by the invention have the following advantages:
firstly, based on a human body key frame skeleton posture model, a key action model, an action trend model and a predefined position area model, action key frames are automatically identified and captured, and the efficiency of identifying and capturing the action key frames of the track and field video is improved.
Secondly, selecting corresponding action key frames from the first group of action key frames, the second group of action key frames, the third group of action key frames and the fourth group of action key frames in sequence according to each key action preset by each motion item to form a first set corresponding to each key action; removing action key frames beyond the range of the confidence space from the first set by using the key frame confidence space of each key action; and outputting the final result of automatic identification and capture of the key frames according to the number of the residual action key frames in the first set, thereby improving the accuracy of identification of the action key frames.
Thirdly, according to the matching degree of skeleton information and key action semantics in each action key frame in the first set, one action key frame which is most matched is selected as a final result, and the accuracy of identifying the action key frames is further improved.
And fourthly, in the process of automatically identifying and grabbing action key frames based on key action analysis, auxiliary judgment is carried out through the motion items, the height of the mass center and the positions of the athletes, and the error rate of identification is reduced.
And fifthly, in the automatic identification and capturing of the action key frames based on the action trend, the seed distribution in the corresponding direction is obtained by using a seed optimization algorithm, so that the action trend of the athlete is accurately obtained, and the method is applied to the field of video capturing for the first time, and the identification of the action key frames is pertinently further improved.
Sixth, in view of the fact that the action key frames are obtained by automatic recognition and capture based on the human body key frame skeleton posture model, the recognition accuracy is poor, a large amount of training data and long modeling time are needed, therefore, according to the scheme of the application, on the basis that the action key frames are obtained by automatic recognition and capture based on the human body key frame skeleton posture model, the action key frames are obtained by automatic recognition and capture based on a key action model, an action trend model and a predefined position area model, the optimal action key frames are obtained in a self-adaptive mode by adopting a confidence space, key action semantic matching, averaging and other combination modes for the obtained action key frames, the recognition rate of the action key frames is up to more than 95%, and the feasibility and the reliability of the method in practical application are improved.
Finally, it should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The present invention is not limited to the above-mentioned preferred embodiments, and any structural changes made under the teaching of the present invention shall fall within the scope of the present invention, which is similar or similar to the technical solutions of the present invention.