CN116469040B

CN116469040B - Football player tracking method based on video and sensor perception fusion

Info

Publication number: CN116469040B
Application number: CN202310685914.5A
Authority: CN
Inventors: 廖频; 韩翔宇; 陈子扬; 臧露奇; 张震; 肖江; 闵卫东; 韩清
Original assignee: Nanchang University
Current assignee: Nanchang University
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-08-29
Anticipated expiration: 2043-06-12
Also published as: CN116469040A

Abstract

The invention provides a football player tracking method based on video and sensor perception fusion, which comprises the following steps of S1: detecting position information of a player helmet in each frame of a football game video by using a YOLOv7 method; s2: aligning the dimensions of the video frame and the sensor data according to the timestamp information; s3: respectively determining the machine orientation corresponding to the bottom line and the edge line video frames; s4: matching the player position and the player number in the player helmet detection information obtained in the step S1 and the sensor information of the step S2; s5: and (5) finishing player tracking by using a deep SORT method for extracting the enhanced features. According to the invention, the data alignment is completed by using the video frames with different dimensions and the information of two dimensions of the sensor, the allocation problem of the number of the player is determined by using the player helmet information detected by the YOLOv7 and the sensor frame, the tracking of the player is completed by using the deep start, the realization is simple, the applicability is higher, and the data association accuracy can be effectively improved.

Description

Football player tracking method based on video and sensor perception fusion

Technical Field

The invention relates to the technical field of computer vision and perception fusion, in particular to a football player tracking method based on video and sensor perception fusion.

Background

In recent years, multi-target tracking is widely applied to various sports events, and the efficiency of health monitoring of collision players is improved. In multi-target tracking, data association is still a very critical link, and the solution of the problem of data association in multi-target tracking depends on accurate matching of player points and motion trajectories. Improper association of the track with the spot may result in erroneous tracking or loss of the target, resulting in reduced tracking system performance.

Common multi-objective tracking is mainly divided into three ways: a multi-target tracking method based on a conventional method, a multi-target tracking method based on detection, and a multi-target tracking method based on an attention mechanism. The multi-target tracking based on the traditional method is characterized in that the target modeling analysis or the target characteristic tracking is performed, the characteristic matching method is adopted as a representative method, SURF characteristics, harris corner points and other characteristics of the target are extracted, and characteristic information which is most similar to the SURF characteristics and Harris corner points is searched in subsequent frames, so that the target is positioned in the frame. The multi-target tracking method based on detection comprises the steps of enabling each frame of video to pass through a detection network to obtain a target detection result, then cutting targets of all detection frames to obtain a plurality of targets, then converting a problem of target tracking into a bipartite graph matching problem of front and rear frames, associating the targets of the front and rear frames of the video, solving the problem by constructing a similarity matrix, and the common multi-target tracking method based on detection has a SORT, deepSORT algorithm. The multi-target tracking mode based on the attention mechanism introduces a transducer attention mechanism into the field of target tracking, and is controlled by a plurality of steps of tracking algorithm based on detection, target detection is respectively executed, and the characteristics are extracted and time is associated, so that the tracking effect is further improved.

The method is mainly used for tracking a plurality of targets through a pure vision scheme, but aiming at complex sports scenes such as ball games, because large-area collision shielding frequently occurs and the shooting view angles of cameras are frequently replaced, a lot of deviation is introduced in the process of tracking the players, and the player tracking accuracy is low. In addition, it is difficult to match the real number of the player on the playing field with the player number obtained by the tracking algorithm only by using the player tracking mode of the pure vision scheme.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a football player tracking method based on video and sensor perception fusion, which fully utilizes the information of two dimensions of a video frame and a sensor, does not need to keep the frame rate of the video frame and the frame rate of the sensor consistent, determines the allocation problem of player numbers through the player helmet information detected by YOLOv7 and the sensor frame, completes tracking of players by using deep, has simple realization, stronger applicability, can effectively improve the data association accuracy and improve the player tracking effect.

In order to achieve the above purpose, the present invention provides the following technical solutions: a football player tracking method based on fusion of video and sensor perception, comprising the steps of:

s1: detecting position information of a player helmet in each frame of a football game video by using a YOLOv7 method;

s2: aligning the dimensions of the video frame and the sensor data according to the timestamp information;

s3: respectively determining the machine orientation corresponding to the bottom line and the edge line video frames;

s4: matching the player position and the player number in the player helmet detection information obtained in the step S1 and the sensor information of the step S2;

s5: and (5) finishing player tracking by using a deep SORT method for extracting the enhanced features, and obtaining a final result.

Preferably, when the position information of the helmet of the player is detected in the step S1, the specific steps are as follows:

s101: after the video frames are subjected to frame extraction processing to obtain pictures, the football player helmet detection is completed by utilizing a YOLOv7 method;

s102: for the situation that the number m of detected player helmets exceeds the number n of actual players, the post-processing method of progressive frame cutting is used for reducing the interference of off-site replacement players, and the cutting process is as follows:

if the coordinates of the upper left corner and the lower right corner of the picture frame obtained by video frame extraction are (X) _min ，Y _min )，(X _max ，Y _max ) Setting a cutting step distance as s, and setting the upper limit of cutting times as t times:

the length and width of the original image are respectively reduced by 2s, and the vertex coordinates of the new upper left corner and the new lower right corner are as follows:

X' _min =X _min + s，

Y' _min =Y _min + s，

X' _max =X _max - s，

Y' _max =X _max - s，

and (3) removing the detection frame information of which the center point is positioned outside the picture frame in the S101, and updating the number m of the currently detected player helmets.

S103: the clipping step of S102 is repeated until m < = n or the clipping number reaches the clipping upper limit t.

Preferably, when the dimensions are aligned in the step S2, the specific steps are as follows:

s201: extracting information recorded by each contestant helmet sensor, wherein the information comprises the number of the current contestant, the position of the contestant at each moment, whether the contestant collides at the current moment and the timestamp information at the current moment;

s202: if the sampling frequency of the ball game video is f ₁ Hz, sampling frequency of sensor f ₂ Hz, respectively calculating the offset of each video recording frame and each sensor recording frame relative to the starting moment of the match, specifically:

step one: acquiring a sensor frame number snap_track and a video frame number snap_det at the starting moment of the game, and recording a time stamp t of the starting of the game _begin ；

Step two: calculating an offset track_est for each of the sensor frames _i The specific calculation steps are as follows:

track_est _i = (track_t _i - t _begin ) / 1000 * f ₂ + snap_track，

likewise, the offset det_est of each of the video frames is calculated _i ：

det_est _i = (det_t _i - t _begin ) / 1000 * f ₁ + snap_det，

Wherein track_t _i And det_t _i The current time stamps for the sensor frame and the video frame, respectively.

Step three: for track_est in each sensor frame _i Selecting the frame number det of the video frame with the smallest offset Euclidean distance _i ^* As a matching frame, the alignment operation of the dimensions of the data is completed, specifically as follows:

，

wherein A and B are respectively the sets of all sensor frame numbers and video frame numbers, det_est _j And (3) representing the offset of each video frame, wherein L (·) is a calculation formula of Euclidean distance.

Preferably, when determining the orientation of the video frame in the step S3, the specific steps are as follows:

s301: extracting a 60 th frame picture of the video under the bottom line visual angle, and identifying and recording numbers appearing in the video frame by using an OCR method;

s302: meanwhile, the number of the player in the sensor information corresponding to the frame is sorted in ascending order according to the horizontal coordinate of each player, and the sorted player number set is recorded as a sorted_tracking_players;

s303: the digits recognized by OCR in the step S301 are taken out and are respectively matched with the sorted player number sets obtained in the step S302, and if the current digits are the same as the numbers with the index of i in the list, the position i+1 is recorded in the set table pos;

s304: if sum (pos) > len (pos)// 2, determining the current viewing angle as the home agent; if sum (pos) < len (pos)// 2, then the current viewing angle is determined as the passenger crew position. Where sum () represents the sum of the addition of the set elements and len () represents the length of the set.

S305: randomly extracting video frame images under a sideline visual angle, converting the images in an RGB format into a single-channel gray level image, removing picture noise by Gaussian blur, and detecting straight line segments where landmark lines in the video frame images are located by using a Canny edge detection method.

S306: the image of the edge detection in step S305 is taken out, and a straight line segment and an end point coordinate set of the line segment exceeding 50 pixels in length { (x 1, y 1), (x 2, y 2) } are detected by using the hough straight line detection method and the detection result is recorded in the set B.

S307: randomly sampling T groups of line segments from the set B, and calculating slope according to coordinates of two ends of the line segments. The specific calculation method is as follows:

slope = (y2 - y1) / (x2 - x1)，

if the slope of the line segment exceeding the T/2 group is greater than 0, the current slope is considered to be greater than 0, otherwise, the current slope is considered to be less than 0.

S308, acquiring the coordinate positions of all players in the sensor frame corresponding to the current video frame, normalizing the coordinates of the players into a plane rectangular coordinate system, and calculating the horizontal coordinate centers x_center of all players.

S309, according to the slope of the straight line and the position distribution information of the player, acquiring a current viewing angle judgment symbol location, wherein the calculation method comprises the following steps:

，

wherein x_medium is the horizontal coordinate of the position distribution center of the player, if the position is greater than 0, the current machine position is judged to be the viewing angle of the main team, and if the position is less than 0, the current machine position is judged to be the viewing angle of the guest team.

Preferably, when the player position is matched in the step S4, the specific steps are as follows:

s401: by using a self-adaptive rotation mode, correcting the angle deviation introduced by the movement of the machine position, the specific process is as follows:

if the clockwise rotation direction is positive, setting the minimum rotation angle to be-30 degrees, setting the maximum rotation angle to be 30 degrees, setting the rotation step distance to be 3 degrees each time, and setting the rotation angle set R= { -30 degrees, -27 degrees, -24 degrees, -21 degrees, -18 degrees, -15 degrees, -12 degrees, -9 degrees, -6 degrees, -3 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, 15 degrees, 18 degrees, 21 degrees, 24 degrees, 27 degrees and 30 degrees;

each time an angle θ is selected from R, the coordinates (x _d , y _d ) The rotation matrix is utilized to rotate theta degrees to obtain a new center point coordinate (x' _d , y' _d )：

，

S402: normalizing the coordinate information of the player in the sensor information and the coordinates of the central point of the helmet of each group of player corrected in the step S401 respectively, and calculating the Euclidean distance between the two coordinate sets of each group;

s403: selecting a group with the smallest Euclidean distance as the optimal matching best_match, and allocating numbers for each player and recording, wherein the method comprises the following specific steps:

，

wherein L (·) represents the Euclidean distance calculation formula, T _θ Represents the rotation of the coordinates, G represents the normalization of the coordinates to the interval [0,1 ]]A is the set of helmet coordinate information of all players in the current frame, track _i Sensor coordinates, det, representing each player in the current frame picture _i Representing the current frame pictureAnd detecting the detection result of each player helmet.

Preferably, when the player is tracked in the step S5, the specific steps are as follows:

s501: inputting the matched player boundary frame information and the original video frame image into a deep SORT network, and extracting the characteristic information in a detection frame by utilizing a characteristic extraction network in the deep SORT network to obtain the left upper corner coordinate (left, top) of each player helmet, and the high height and wide width of the boundary frame;

s502: the original prediction bounding box is expanded from the boundary of the fully wrapped helmet to a bounding box containing the upper body of the player, such that the expanded bounding box contains the differential features between different players.

，

Where x, y are coordinates of a center point of the detection frame, and the expanded new coordinates of the upper left corner (new_left, new_top) and the lower right corner (new_right, new_bottom) of the bounding box are:

，

wherein scale_w and scale_h are respectively wide and high expansion coefficients;

s503, cutting out an original image of a video frame in the expanded bounding box to obtain an upper body image of each player, and adjusting the image size to 128 multiplied by 64 pixels to serve as a feature extractor input image of the deep SORT network;

s504: the helmet targets of two adjacent frames are associated with appearance features and motion features, fusion matching is carried out if the association is successful, and if and only if the fusion result is smaller than a threshold value, the matching is considered to be successful;

s505: using Kalman filtering to complete state prediction and updating at each moment;

s506: a Hungary algorithm is applied to obtain a successful matching pair set, an unmatched helmet target set and a matching number track set of the player helmet target and the corresponding number track;

s507: performing secondary verification on a successful matching pair set obtained by the Hungary algorithm, and updating the matching pair set to be in an unmatched state if the target similarity between the calculated matching pairs is larger than a given threshold value; finally obtaining a successful matching target set, an unmatched helmet target set and an unmatched numbering track set;

s508: integrating the tracking results and completing the visualization in the original video.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a football player tracking method based on video and sensor perception fusion, which fully utilizes information of two dimensions of a video frame and a sensor, and does not need to keep the frame rates of the video camera and the sensor consistent, thereby reducing the requirements on equipment and being beneficial to promoting the development of health monitoring after the collision of the football player.

2. The invention determines the allocation problem of the number of the player through the information of the helmet of the player detected by YOLOv7 and the sensor frame, completes the tracking of the player by using Deep SORT, detects the boundary frame of the helmet of the player detected by YOLOv7, determines the allocation problem of the number of the player by the detection result and the sensor data based on a rotation alignment mode, respectively designs a machine position prediction method of a bottom line and a side line to determine the orientation of the machine position on the premise of default machine position marking, and finally completes the tracking of the player by using Deep SORT extracted by enhanced features, thereby being simple to realize, having stronger applicability and effectively improving the accuracy of data association.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic view of a frame cut;

FIG. 3 is a flow chart of the bottom line machine position prediction;

FIG. 4 is a flow chart of edge machine position prediction;

fig. 5 is a schematic diagram of rotation matching.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a football player tracking method based on video and sensor perception fusion includes the following steps:

Further, the step S1 includes the following steps:

s102: for the situation that the number m of detected player helmets exceeds the number n of actual players, the interference of off-site replacement players is reduced by using a post-processing method of progressive frame cutting, and the cutting process is shown in fig. 2:

X' _min =X _min + s，

Y' _min =Y _min + s，

X' _max =X _max - s，

Y' _max =X _max - s，

S2: aligning the dimensions of the video frame and the sensor data according to the timestamp information:

Step two: calculating an offset per frame track_est in a sensor _i The specific calculation steps are as follows:

track_est _i = (track_t _i - t _begin ) / 1000 * f ₂ + snap_trac k，

likewise, the offset det_est of each of the video frames is calculated _i ：

det_est _i = (det_t _i - t _begin ) / 1000 * f ₁ + snap_det，

，

S3: and respectively determining the machine orientation corresponding to the bottom line and the edge line video frames:

the flow of the bottom line machine position prediction is shown in fig. 3, and the steps are as follows:

s304: if sum (pos) > len (pos)// 2, determining the current viewing angle as the home agent; if sum (pos) < len (pos)// 2, determining the current viewing angle as the passenger train position; where sum () represents the sum of the addition of the set elements and len () represents the length of the set.

The flow of the boundary machine position prediction is shown in fig. 4, and the steps are as follows:

S307: randomly sampling T groups of line segments from the set B, and calculating slope according to coordinates of two ends of the line segments. The specific calculation is as follows:

slope = (y2 - y1) / (x2 - x1)，

if the slope of the line segment exceeding the T/2 group exceeds 0, the slope is considered to be judged to be 0, otherwise, the current slope is judged to be < 0.

S309, according to the slope of the straight line and the position distribution information of the player, acquiring a current view angle judgment symbol location, wherein the specific calculation is as follows:

，

S4: as shown in fig. 5, the player position and player number in the player helmet detection information and the sensor information are matched:

，

wherein L (·) represents the Euclidean distance calculation formula, T _θ Represents the rotation of the coordinates, G represents the normalization of the coordinates to the interval [0,1 ]]A is the set of helmet coordinate information of all players in the current frame, track _i Sensor coordinates, det, representing each player in the current frame picture _i And the detection result of the helmet of each player in the current frame picture is shown.

S5: finishing player tracking by using a deep SORT method to obtain a final result:

，

The football player tracking method based on the video and sensor perception fusion, provided by the invention, improves the problem of poor accuracy of a pure vision tracking scheme, improves the accuracy of tracking the football player by using a sensor and monocular vision fusion mode, and promotes the development of health monitoring after the football player is collided.

The foregoing description of the preferred embodiments of the present invention has been presented only in terms of those specific and detailed descriptions, and is not, therefore, to be construed as limiting the scope of the invention. It should be noted that modifications, improvements and substitutions can be made by those skilled in the art without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A football player tracking method based on video and sensor perception fusion is characterized in that: the method comprises the following steps:

s5: finishing player tracking by using a deep SORT method for enhancing feature extraction to obtain a final result;

in step S3, the specific steps are as follows:

s304: if sum (pos) > len (pos)// 2, determining the current viewing angle as the home agent; if sum (pos) < len (pos)// 2, determining the current viewing angle as the passenger train position; where sum () represents the sum of the addition of the set elements and len () represents the length of the set;

s305: randomly extracting video frame images under a sideline visual angle, converting the images in an RGB format into a single-channel gray level image, removing picture noise by Gaussian blur, and detecting straight line segments where landmark lines in the video frame images are located by using a Canny edge detection method;

s306: taking out the image detected by the edge in step S305, detecting a straight line segment with a length exceeding 50 pixels and an end point coordinate set of the line segment { (x 1, y 1), (x 2, y 2) } by using a hough straight line detection method, and recording the detection result to a set B;

s307: randomly sampling a T group of line segments from the set B, and calculating a slope according to coordinates of two ends of the line segments; the specific calculation method is as follows:

slope = (y2 - y1) / (x2 - x1)

if the slope of the line segment exceeding the T/2 group is greater than 0, the current slope is considered to be greater than 0, otherwise, the current slope is considered to be less than 0;

s308, acquiring the coordinate positions of all players in a sensor frame corresponding to the current video frame, normalizing the coordinates of the players into a plane rectangular coordinate system, and calculating the horizontal coordinate centers x_center of all players;

，

wherein x_medium is the horizontal coordinate of the position distribution center of the player, if the position is more than 0, the current machine position is judged to be the viewing angle of the main team, and if the position is less than 0, the current machine position is judged to be the viewing angle of the guest team;

in step S4, the specific steps are as follows:

，

2. The football player tracking method based on video and sensor perception fusion of claim 1, wherein in step S1, the specific steps are as follows:

X' _min =X _min + s

Y' _min =Y _min + s

X' _max =X _max - s

Y' _max =X _max - s

removing the detection frame information of which the center point is positioned outside the picture frame and updating the number m of the currently detected player helmets;

3. The football player tracking method based on video and sensor perception fusion of claim 1, wherein in step S2, the specific steps are as follows:

s202: if the sampling frequency of the ball game video is f ₁ Hz, sampling frequency of sensor f ₂ Hz, sensor frame number snap_track and video frame number snap_det at the starting moment of the game are obtained, and the time stamp t of the starting of the game is recorded _begin ；

S203: calculating an offset track_est for each of the sensor frames _i The method comprises the following steps:

，

likewise, the offset det_est of each of the video frames is calculated _i ：

，

Wherein track_t _i And det_t _i Current timestamps of the sensor frame and the video frame, respectively;

s204: for track_est in each sensor frame _i Selecting the frame number det of the video frame with the smallest offset Euclidean distance L _i ^* As a matching frame, the dimension alignment of data is completed, concretely as follows:

，

4. The football player tracking method based on video and sensor perception fusion of claim 1, wherein in step S5, the specific steps are as follows:

s502: expanding an original prediction bounding box from a completely wrapped helmet boundary to a bounding box containing the upper body of the player, such that the expanded bounding box contains differential features between different players;

，