CN112396018A

CN112396018A - Badminton player foul action recognition method combining multi-modal feature analysis and neural network

Info

Publication number: CN112396018A
Application number: CN202011364578.7A
Authority: CN
Inventors: 张刚瀚; 黄国恒; 程良伦; 张煜乾; 陈泽炯
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-02-23
Anticipated expiration: 2040-11-27
Also published as: CN112396018B

Abstract

The invention discloses a badminton foul action recognition method combining multi-modal characteristic analysis and a neural network, which comprises the following steps: extracting figure images, motion attitude sequences and optical flow data of athletes in real time; sending the figure image into a space flow network of a double-flow network to obtain spatial characteristics of an athlete; the motion attitude sequence is transmitted into a multilayer graph convolution neural network as a directed graph to obtain attitude space-time characteristics of the athlete during motion; extracting the characteristics of each frame of optical flow data through a convolutional neural network, and then sending the optical flow data into a time relation network to obtain the optical flow motion information characteristics of athletes; and pairing the three obtained characteristics in pairs respectively to obtain three aggregation characteristics, sending the aggregation characteristics into a convolutional neural network respectively to obtain three fusion characteristics, performing weighted fusion on the fusion characteristics to obtain the final overall human body multi-modal fusion motion characteristics, and sending the final overall human body multi-modal fusion motion characteristics into a full-connection network to obtain a final action classification recognition result. The invention improves the accuracy of identifying the foul action of the athletes.

Description

Badminton player foul action recognition method combining multi-modal feature analysis and neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a badminton foul action recognition method combining multi-modal feature analysis and a neural network.

Background

The badminton match rules are clearly specified, the goal is to meet the standard action, the goal cannot be intentionally delayed, the sight of the opponent cannot be blocked by the standing position, the opponent cannot be intentionally invaded into the field area in the match, and certain actions are made to block the attack of the opponent or disperse the attention of the opponent. However, sometimes these actions are relatively subtle, so that human eyes cannot observe and judge the actions carefully, and in addition, the actions of athletes frequently change in the process of a game, so that the judgment may ignore certain foul actions, thereby affecting the fairness of the game. With the improvement of computer vision technology, the machine can realize the fine analysis of videos and images, so that the machine can be used for replacing human eyes to identify whether the match actions of the athletes are violated, namely, the action of the athletes is identified to judge whether the actions of the athletes meet the specifications during the match. The current behavior recognition method mainly uses a double-flow method and 3D convolution, and also carries out behavior recognition by inputting human body gestures. However, due to the uncertainty of human behaviors, the motions of athletes during a match are complex, the motions of the athletes during a hand swing may not be obvious, and some complex motions are mixed, which may cause misjudgment of the system on the behaviors, and the detailed analysis is difficult due to the lack of information interaction caused by the use of data of a single mode.

In the prior art, a chinese patent publication No. CN110705463A discloses a method and a system for identifying human body behaviors based on a multi-mode dual-stream 3D network in 1 month and 17 days 2020, including: a depth dynamic map sequence DDIS generated based on the depth video; a pose evaluation graph sequence PEMS generated based on the RGB video; respectively inputting the depth dynamic graph sequence and the pose evaluation graph sequence into a 3D convolutional neural network, and constructing a DDIS stream and a PEMS stream to obtain respective classification results; and fusing the obtained classification results to obtain a final behavior recognition result. The patent does not combine multi-feature information, does not have feature fusion, and has low recognition accuracy.

Disclosure of Invention

The invention provides the badminton foul motion recognition method combining the multi-modal characteristic analysis and the neural network, aiming at overcoming the defects that the foul motion recognition of athletes in the prior art does not have multi-characteristic information fusion and is low in recognition accuracy.

The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:

a badminton player foul action recognition method combining multi-modal feature analysis and a neural network comprises the following steps:

s1: extracting figure images, motion attitude sequences and optical flow data of athletes in real time;

s2: sending the figure image into a space flow network of a double-flow network to obtain spatial characteristics of an athlete;

s3: the motion attitude sequence is transmitted into a multilayer graph convolution neural network as a directed graph to obtain attitude space-time characteristics of the athlete during motion;

s4: extracting the characteristics of each frame of optical flow data through a convolutional neural network, and then sending the optical flow data into a time relation network to obtain the optical flow motion information characteristics of athletes;

s5: pairwise matching the three characteristics obtained in the steps S1, S2 and S3 respectively to obtain three polymerization characteristics;

s6: respectively sending the three aggregation characteristics into a convolutional neural network to obtain three fusion characteristics;

s7: weighting and fusing the three fusion characteristics to obtain final overall human body multi-modal fusion motion characteristics;

s8: and sending the multi-modal fusion motion characteristics of the whole human body into a full-connection network to obtain a final action classification recognition result.

Further, in step S1, a player character image is acquired by video frame capture, a player motion pose is acquired by openpos, and player optical flow data is acquired by DenseFlow.

Further, in step S2, the character image is sent to the spatial flow network of the dual-flow network, and the spatial information of the character image is modeled to obtain the character spatial characteristics of the athlete.

Further, in step S3, the motion gesture sequence is transmitted as a directed graph into a multi-layer graph convolution neural network, and the motion gesture sequence of the athlete is modeled to obtain the motion gesture spatio-temporal features of the athlete.

Further, the motion attitude space-time characteristic is obtained through graph convolution operation, and the graph convolution operation formula is as follows:

wherein v is_ti、v_tjRepresenting a joint point of a human posture, f_inAnd f_outRepresenting input and output images, w and w' representing weights between joint points and reconstructed weights, l_tiBy using articulation point v_tiTo assign to other nodes a digital label that depends on the shortest path, Z, between two nodes_ti(. is a regularization term; b (v)_ti)＝{v_tj|d(v_tj,v_ti) ≧ D }, where D is set to a constant 1, D (v)_tj,v_ti) Is the shortest path between two joint points.

Further, the process of obtaining the optical flow motion information characteristic of the athlete in step S4 is as follows:

modeling each frame of optical flow of the optical flow sequence by using a ResNet basic network in the convolutional neural network, and then fusing features of each frame of optical flow after modeling;

sending the optical flow sequence after the feature fusion into a time relation network to group according to different frame numbers, and sequencing the sequence numbers of the optical flows in each group from small to large;

modeling each group of optical flow sequences to obtain inter-frame time relation characteristics, and then fusing the inter-frame time relation characteristics of the same group to obtain inter-segment time relation characteristics;

and adding the time relation characteristics among all the segments to obtain the optical flow motion information characteristics of the whole athlete containing the time reasoning information.

Further, in step S6, the three aggregation features are respectively sent to the convolutional neural network, each aggregation feature contains a feature pair, and the convolutional neural network models and fuses each feature pair to obtain three fusion features.

Furthermore, the full-connection network carries out action recognition and classification on the input human body multi-mode fusion motion characteristics, and judges whether the sportsman has foul actions.

Further, the process of acquiring the motion gesture sequence includes:

acquiring a moving image sequence, and passing each picture of the image sequence through a VGG19 network to obtain image characteristics;

respectively obtaining joint point confidence coefficient of each joint point of the body of the athlete and affinity vectors among the joint points according to the image characteristics;

and clustering the joint points by using the joint point confidence coefficient and the affinity vectors among the joint points and carrying out skeleton splicing to obtain the motion posture sequence of the athlete.

Further, the human image is an RGB image.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

according to the invention, different characteristics are obtained through the human image, the motion attitude sequence and the optical flow data, and the characteristics are fused and then are recognized in the full-connection network, so that the accuracy of foul recognition is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a diagram of the network architecture of the present invention.

FIG. 3 is a diagram of a time relationship network architecture according to the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example 1

As shown in fig. 1-2, a badminton player foul action recognition method combining multi-modal feature analysis and neural network includes the following steps:

s3: the motion attitude sequence is transmitted into a multilayer graph convolution neural network (GCN) as a directed graph to obtain attitude space-time characteristics of the athlete during motion;

In a specific embodiment, a plurality of cameras can be arranged to shoot a badminton game field in multiple angles, and character images are obtained by intercepting video frames in real time.

The process of acquiring the motion attitude sequence comprises the following steps:

acquiring a moving image sequence, and passing each picture of the image sequence through a VGG19 network with 10 layers to obtain an image characteristic F;

obtaining joint point confidence coefficient of each joint point of the body of the athlete according to the image characteristics

And affinity vector between joint points

Wherein j refers to the joint point subscript, c refers to the limb subscript number;

And transmitting the obtained motion attitude sequence as a directed graph into a multilayer graph convolution neural network, and modeling the motion attitude sequence of the athlete to obtain the motion attitude space-time characteristics of the athlete.

More specifically, the motion posture sequence of the athlete is regarded as a graph structure, the vertex of the graph is a joint point in each frame, and the vertex set of the graph is represented as V ═ V { (V })_tiI 1, T, i 1, N, the edges of the graph are the connecting edges between the respective joint points in each frame and the connecting edges of the corresponding joint points between the frames, and the set of the connecting edges between the respective joint points in each frame is denoted as E_s＝{v_tiv_tjL (i, j) belongs to H, and the connection edge set of the corresponding joint points between frames is represented as E_F＝{v_tiv_(t+1)j}. Then the framework sequence is sequenced using the following equationAnd (3) line graph convolution:

v_ti、v_tjrepresenting a joint point of a human posture, f_inAnd f_outRepresenting input and output images, w and w' representing weights between joint points and reconstructed weights, l_tiBy using articulation point v_tiTo assign to other nodes a digital label that depends on the shortest path, Z, between two nodes_ti(. is a regularization term; b (v)_ti)＝{v_tj|d(v_tj,v_ti) ≧ D }, where D is set to a constant 1, D (v)_tj,v_ti) Is the shortest path between two joint points.

The invention is provided with 9 layers of space-time graph convolution arithmetic units, and finally the movement posture space-time characteristics PF of the athlete can be output.

In the invention, optical flow data is acquired through a DenseFlow toolkit, and the specific flow is as follows: every time two pictures are input, the first picture T (x, y) is a reference image, the second picture I (x, y) is a current image, and then the following objective functions are designed, so that each corresponding point on the two registered images is as same as possible:

u (x, y), v (x, y) are the amount of shift for each point on the image,

ψ (x) is an error function. Minimizing the objective function can output the optical flow sequence data of the sportsman in the competition.

After optical flow sequence data of the athlete is acquired, the invention adopts a spatial flow network in a double-flow architecture to extract spatial features for an image frame, as shown in fig. 3, a Time Relationship Network (TRN) is used to replace a time flow network to acquire optical flow motion information features, and the process of acquiring the optical flow motion information features of the athlete is as follows:

modeling each frame of optical flow of the optical flow sequence by using a ResNet base line network in a Convolutional Neural Network (CNN), and then fusing features of each frame of optical flow after modeling;

sending the optical flow sequence after feature fusion into a time relation network to be grouped according to different frame numbers (specifically, the optical flow sequence can be grouped according to a group of 2 frames, a group of 3 frames and a group of 4 frames), and sequencing the sequence numbers of the optical flows in each group from small to large;

Taking 3 frames as an example, the temporal relationship of 3 frames can be expressed as:

wherein h is_φAnd g_θThe method is realized by using a simple multilayer perceptron; optical flow motion information characteristics MF (V) ═ T of athlete₂(V)+T₃(V)+T₄(V)。

In a specific embodiment, the obtained human body space characteristics SF, the optical flow motion information characteristics MF and the motion posture space-time characteristics PF of the badminton are paired pairwise and aggregated to obtain three different aggregation characteristics < SF, MF >, < PF, MF > and < PF, SF >. The three aggregated features are then input into the CNN module for modeling, resulting in Fusion features Fusion1, Fusion2, and Fusion3 for the paired modal features. Then the three fusion characteristics are weighted and fused to obtain the final human body multi-modal fusion characteristic, the characteristic comprises fusion characteristics of different combinations of three modes, information complementation can be realized among different modes, thus the obtained information is richer, the robustness of the characteristic is higher,

The same or similar reference numerals correspond to the same or similar parts;

the terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A badminton player foul action recognition method combining multi-modal feature analysis and a neural network is characterized by comprising the following steps:

2. The method for identifying the foul action of the badminton player by combining the multi-modal feature analysis and the neural network as claimed in claim 1, wherein in step S1, images of the player' S character are obtained by video frame capturing, the motion posture of the player is obtained by openpos, and the optical flow data of the player is obtained by DenseFlow.

3. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network as claimed in claim 1, wherein step S2 is to send the character image into the spatial flow network of the dual-flow network to model the spatial information of the character image and obtain the spatial character feature of the player.

4. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network as claimed in claim 1, wherein step S3 is to introduce the motion gesture sequence as a directed graph into a multi-layered graph convolution neural network, and model the motion gesture sequence of the player to obtain the motion gesture spatiotemporal features of the player.

5. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network as claimed in claim 4, wherein the motion attitude spatiotemporal features are obtained through graph convolution operation, and the graph convolution operation formula is as follows:

6. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network as claimed in claim 1, wherein the step S4 of obtaining the optical flow motion information features of the player is as follows:

7. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network as claimed in claim 1, wherein in step S6, three aggregated features are respectively fed into the convolutional neural network, each aggregated feature contains a feature pair, and modeling and fusion of each feature pair by the convolutional neural network results in three fused features.

8. The badminton player foul action recognition method combining multi-modal feature analysis and neural network as claimed in claim 1, wherein the fully connected network performs action recognition classification on the input human multi-modal fusion motion features to judge whether the player has foul action.

9. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network according to claim 1, wherein the process of obtaining the motion gesture sequence comprises:

10. The badminton player foul motion recognition method combining multi-modal feature analysis and neural network as claimed in claim 1, wherein the character image is an RGB image.