CN112396018B

CN112396018B - Badminton player foul action recognition method combining multi-mode feature analysis and neural network

Info

Publication number: CN112396018B
Application number: CN202011364578.7A
Authority: CN
Inventors: 张刚瀚; 黄国恒; 程良伦; 张煜乾; 陈泽炯
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2023-06-06
Anticipated expiration: 2040-11-27
Also published as: CN112396018A

Abstract

The invention discloses a badminton player foul action recognition method combining multi-mode feature analysis and a neural network, which comprises the following steps: extracting character images, motion gesture sequences and optical flow data of athletes in real time; sending the character image into a space flow network of a double-flow network to obtain the space characteristics of the athlete; transmitting the motion gesture sequence as a directed graph into a multi-layer graph convolution neural network to obtain gesture space-time characteristics of the athlete during motion; extracting features of each frame of optical flow data through a convolutional neural network, and then sending the extracted features into a time relation network to obtain optical flow motion information features of athletes; and respectively pairing the obtained three features two by two to obtain three aggregation features, respectively sending the three aggregation features into a convolutional neural network to obtain three fusion features, weighting and fusing the three fusion features to obtain a final overall human body multi-modal fusion motion feature, and sending the final overall human body multi-modal fusion motion feature into a fully connected network to obtain a final action classification recognition result. The invention improves the accuracy of identifying the foul actions of the athlete.

Description

Badminton player foul action recognition method combining multi-mode feature analysis and neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a badminton player foul action recognition method combining multi-mode feature analysis and a neural network.

Background

The badminton match rules have clear regulations, the game is required to meet the standard actions, the service cannot be deliberately delayed, the station cannot obstruct the vision of the opponent, the opponent field cannot be deliberately invaded in the match, and certain actions are made to obstruct the attack of the opponent or to disperse the attention of the opponent. However, these actions are sometimes subtle when they are sent out, so that the eyes cannot observe and judge in detail, and in addition, the actions of athletes change frequently in the competition process, so that the judge may ignore certain foul actions, thereby affecting the fairness of the competition. Along with the improvement of computer vision technology, the machine can realize the fine analysis of videos and images, so that the machine can be used for replacing human eyes to identify whether the player plays the game or not, namely, the player is identified by behavior to judge whether the player plays the game or not according with the specification. The existing behavior recognition method mainly comprises a double-flow method and 3D convolution, and also comprises the step of inputting human body gestures to perform behavior recognition. However, because of a lot of uncertainty on human behaviors, the player's actions are complex during competition, and the hand swing action may be inconspicuous, and complex actions may be mixed, which may cause misjudgment on the behaviors by the system, but only data of a single mode is used, and interaction between information is lacking, so that detailed analysis is difficult.

In the prior art, the Chinese patent with publication number of CN110705463A discloses a method and a system for identifying the human body behaviors of a video based on a multi-mode double-flow 3D network in the year 2020, 1 and 17, wherein the method comprises the following steps: a depth dynamic image sequence DDIS generated based on the depth video; a pose evaluation graph sequence PEMS generated based on RGB video; respectively inputting the depth dynamic image sequence and the gesture evaluation image sequence into a 3D convolutional neural network, and constructing a DDIS stream and a PEMS stream to obtain respective classification results; and fusing the obtained classification results to obtain a final behavior recognition result. The patent does not combine multi-feature information, does not have feature fusion, and has low recognition accuracy.

Disclosure of Invention

The invention provides a badminton player foul action recognition method combining multi-mode feature analysis and a neural network, which aims to overcome the defects that the player foul action recognition in the prior art does not have multi-feature information fusion and is low in recognition accuracy.

The primary purpose of the invention is to solve the technical problems, and the technical scheme of the invention is as follows:

a badminton player foul action recognition method combining multi-mode feature analysis and neural network comprises the following steps:

s1: extracting character images, motion gesture sequences and optical flow data of athletes in real time;

s2: sending the character image into a space flow network of a double-flow network to obtain the space characteristics of the athlete;

s3: transmitting the motion gesture sequence as a directed graph into a multi-layer graph convolution neural network to obtain gesture space-time characteristics of the athlete during motion;

s4: extracting features of each frame of optical flow data through a convolutional neural network, and then sending the extracted features into a time relation network to obtain optical flow motion information features of athletes;

s5: pairing the three features obtained in the steps S1, S2 and S3 two by two to obtain three polymerization features;

s6: the three aggregation features are respectively sent into a convolutional neural network to obtain three fusion features;

s7: the three fusion features are weighted and fused to obtain the final overall human body multi-mode fusion motion feature;

s8: and sending the overall human body multi-mode fusion motion characteristics into a fully-connected network to obtain a final action classification recognition result.

Further, in step S1, an image of the player character is acquired through video capturing, a motion gesture of the player is acquired through openPose, and optical flow data of the player is acquired through DenseF.

Further, step S2 is to send the character image into a spatial stream network of the dual stream network, and model the spatial information of the character image to obtain the character spatial characteristics of the athlete.

Further, step S3 is to transfer the motion gesture sequence as a directed graph into a multi-layer graph convolution neural network, and model the motion gesture sequence of the athlete to obtain the motion gesture space-time characteristics of the athlete.

Further, the motion gesture space-time characteristic is obtained through graph convolution operation, and the graph convolution operation formula is as follows:

wherein v is _ti 、v _tj Represents the joint point of human body posture, f _in And f _out Representing input and output images, w and w' representing weights between the nodes and weights after reconstruction, l _ti (. Cndot.) means using the node v _ti To assign digital labels to other nodes, which digital labels depend on the shortest path between two joints, Z _ti (. Cndot.) is a regularization term; b (v) _ti )＝{v _tj |d(v _tj ,v _ti ) Not less than D }, wherein D is set to a constant 1, D (v _tj ,v _ti ) Is the shortest path between two joint points.

Further, the step S4 is a process of obtaining the athlete optical flow motion information feature:

modeling each frame of optical flow of the optical flow sequence by utilizing a ResNet base network in the convolutional neural network, and then fusing features of each modeled frame of optical flow;

sending the optical flow sequences with the characteristics fused into a time relation network to be grouped according to different frame numbers, and sequencing the serial numbers of the optical flows in each group from small to large;

modeling each group of optical flow sequences to obtain inter-frame time relation features, and then fusing the inter-frame time relation features of the same group to obtain inter-segment time relation features;

and adding all the time relation features to obtain the optical flow motion information features of the integral athlete containing time reasoning information.

Further, in step S6, the three kinds of aggregation features are respectively sent to the convolutional neural network, each aggregation feature contains a feature pair, and the convolutional neural network models and fuses each feature pair to obtain three kinds of fusion features.

Further, the fully-connected network performs action recognition and classification on the input human body multi-mode fusion movement characteristics, and judges whether the mobilizer has a foul action or not.

Further, the process of acquiring the motion gesture sequence includes:

acquiring a moving image sequence, and passing each picture of the image sequence through a VGG19 network to obtain image characteristics;

respectively acquiring the confidence coefficient of the joint point and the affinity vector between the joint points of each joint point of the body of the athlete according to the image characteristics;

clustering the joint points by using the confidence coefficient of the joint points and the affinity vector among the joint points, and performing skeleton splicing to obtain the motion gesture sequence of the athlete.

Further, the character image is an RGB image.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

according to the invention, different characteristics are acquired through the object image, the motion gesture sequence and the optical flow data, and the characteristics are fused to perform action recognition in the fully connected network, so that the accuracy of foul recognition is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a diagram of a network architecture of the present invention.

FIG. 3 is a diagram of a time-dependent network according to the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.

Example 1

1-2, a badminton player foul action recognition method combining multi-mode feature analysis and neural network comprises the following steps:

s3: transmitting the motion gesture sequence as a directed graph into a multi-layer graph rolling neural network (GCN) to obtain gesture space-time characteristics of the athlete during the motion;

In a specific embodiment, a plurality of cameras can be arranged to shoot a badminton playing field at multiple angles, and a character image is obtained by intercepting video frames in real time.

The process for acquiring the motion gesture sequence comprises the following steps:

obtaining a moving image sequence, and passing each picture of the image sequence through a 10-layer VGG19 network to obtain an image characteristic F;

motion is acquired according to image characteristicsJoint point confidence for each joint point of a person's body

Affinity vector between the joint points +.>

Wherein j refers to the index of the joint point, and c refers to the index number of the limb;

And (3) taking the acquired motion gesture sequence as a directed graph to be transmitted into a multi-layer graph convolution neural network, and modeling the motion gesture sequence of the athlete to obtain the motion gesture space-time characteristics of the athlete.

More specifically, the motion gesture sequence of the athlete is regarded as a graph structure, the vertexes of the graph are the nodes in each frame, and the vertex set of the graph is expressed as V= { V _ti T=1,..t, i=1,..n }, the edges of the graph being the connecting edges between the respective nodes within each frame and the connecting edges of the corresponding nodes from frame to frame, the set of connecting edges between the respective nodes within each frame being denoted as E _s ＝{v _ti v _tj I (i, j) E H, the set of connecting edges of corresponding knuckles between frames is denoted as E _F ＝{v _ti v _(t+1)j }. Then the skeleton sequence is graphically convolved using the following equation:

v _ti 、v _tj represents the joint point of human body posture, f _in And f _out Representing input and output images, w and w' representing weights between the nodes and weights after reconstruction, l _ti (. Cndot.) means using the node v _ti To assign digital labels to other nodes, which digital labels depend on the shortest path between two joints, Z _ti (. Cndot.) is a regularization term; b (v) _ti )＝{v _tj |d(v _tj ,v _ti ) Not less than D }, wherein D is set to a constant 1, D (v _tj ,v _ti ) Is the shortest path between two joint points.

The invention is provided with 9 layers of space-time diagram convolution operation units, and finally can output the motion gesture space-time characteristics PF of the athlete.

In the invention, optical flow data is acquired through a DenseF low tool package, and the specific flow is as follows: two pictures are input each time, a first T (x, y) is a reference image, a second I (x, y) is a current image, and then the following objective functions are designed so that each corresponding point on the two registered images is the same as possible:

u (x, y), v (x, y) are the offsets of each point on the image,

ψ (x) is the error function. Minimizing the objective function allows the output of the sequence of optical flows mobilized during the competition.

After the optical flow sequence data of the athlete is obtained, the invention adopts the space flow network in the double-flow architecture to extract the space characteristics for the image frames, and as shown in fig. 3, the Time Relation Network (TRN) is used for replacing the time flow network to obtain the optical flow motion information characteristics, and the process for obtaining the optical flow motion information characteristics of the athlete is as follows:

modeling each frame of optical flow of the optical flow sequence by utilizing a ResNet base network in a Convolutional Neural Network (CNN), and then fusing features of each modeled frame of optical flow;

sending the optical flow sequences with the integrated features into a time relation network to be grouped according to different frame numbers (particularly, grouping can be carried out according to a group of 2 frames, a group of 3 frames and a group of 4 frames), and sequencing the serial numbers of the optical flows in each group from small to large;

Taking 3 frames as an example, the temporal relationship of 3 frames can be expressed as:

wherein h is _φ And g _θ Implemented using a simple multi-layer perceptron; athlete optical flow motion information feature MF (V) =t ₂ (V)+T ₃ (V)+T ₄ (V)。

In a specific embodiment, the acquired human body space features SF, optical flow motion information features MF and motion gesture space-time features PF of the badminton athlete are paired and aggregated two by two to obtain three different aggregation features < SF, MF >, < PF, MF > and < PF, SF >. And then inputting the three aggregation features into a CNN module for modeling to obtain Fusion features Fusion1, fusion2 and Fusion3 of the paired modal features. The three fusion features are weighted and fused to obtain the final human multi-modal fusion feature, the feature comprises fusion features of different combinations of three modes, information complementation can be realized among different modes, the obtained information is more abundant, the robustness of the feature is higher,

The same or similar reference numerals correspond to the same or similar components;

the terms describing the positional relationship in the drawings are merely illustrative, and are not to be construed as limiting the present patent;

it is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A badminton player foul action recognition method combining multi-mode feature analysis and neural network is characterized by comprising the following steps:

the motion attitude space-time characteristic is obtained through graph convolution operation, and the graph convolution operation formula is as follows:

wherein (1)>

、/>

Representing the joints of human body posture->

And->

Representative input and outputGo out of the image->

And->

Representing the weights between the nodes and the weights after reconstruction, +.>

Means using the node->

To assign digital labels to other nodes, said digital labels being dependent on the shortest path between two joints,/for>

Is a regularization term; />

Wherein D is set to be a constant 1, < >>

Is the shortest path between two joint points;

the process for obtaining the athlete optical flow movement information features is as follows:

adding all the inter-segment time relation features to obtain the overall athlete optical flow movement information features containing time reasoning information;

2. The method for identifying foul actions of badminton players by combining multi-mode feature analysis and neural networks according to claim 1, wherein in step S1, player character images are acquired through video frame capturing, player motion gestures are acquired through openPose, and player optical flow data are acquired through DenseF.

3. The method for identifying the foul actions of the badminton player by combining the multi-mode characteristic analysis and the neural network according to claim 1, wherein the step S2 is to send the character image into a space flow network of a double-flow network, and model the space information of the character image to obtain the character space characteristics of the player.

4. The method for identifying the foul actions of the badminton player by combining the multi-mode feature analysis and the neural network according to claim 1, wherein the step S3 is to transmit the motion gesture sequence as a directed graph into a multi-layer graph convolution neural network, and model the motion gesture sequence of the player to obtain the motion gesture space-time feature of the player.

5. The badminton player foul action recognition method combining multi-mode feature analysis and neural network according to claim 1, wherein in step S6, three kinds of aggregate features are respectively sent into a convolutional neural network, each aggregate feature contains a feature pair, and the convolutional neural network models and fuses each feature pair to obtain three kinds of fusion features.

6. The method for identifying the foul actions of the badminton player by combining the multi-mode characteristic analysis and the neural network according to claim 1, wherein the fully-connected network performs action identification classification on the input human body multi-mode fusion motion characteristics to judge whether the foul actions exist in the sportsman.

7. The method for identifying a fouled action of a shuttlecock player in combination with a multimodal feature analysis and a neural network as claimed in claim 1, wherein the step of obtaining a sequence of motion attitudes comprises:

8. The method for identifying a fouled action of a shuttlecock player in combination with a multimodal feature analysis and a neural network of claim 1, wherein the character image is an RGB image.