CN116469040B - Football player tracking method based on video and sensor perception fusion - Google Patents

Football player tracking method based on video and sensor perception fusion Download PDF

Info

Publication number
CN116469040B
CN116469040B CN202310685914.5A CN202310685914A CN116469040B CN 116469040 B CN116469040 B CN 116469040B CN 202310685914 A CN202310685914 A CN 202310685914A CN 116469040 B CN116469040 B CN 116469040B
Authority
CN
China
Prior art keywords
player
frame
degrees
video
sensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310685914.5A
Other languages
Chinese (zh)
Other versions
CN116469040A (en
Inventor
廖频
韩翔宇
陈子扬
臧露奇
张震
肖江
闵卫东
韩清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang University
Original Assignee
Nanchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang University filed Critical Nanchang University
Priority to CN202310685914.5A priority Critical patent/CN116469040B/en
Publication of CN116469040A publication Critical patent/CN116469040A/en
Application granted granted Critical
Publication of CN116469040B publication Critical patent/CN116469040B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention provides a football player tracking method based on video and sensor perception fusion, which comprises the following steps of S1: detecting position information of a player helmet in each frame of a football game video by using a YOLOv7 method; s2: aligning the dimensions of the video frame and the sensor data according to the timestamp information; s3: respectively determining the machine orientation corresponding to the bottom line and the edge line video frames; s4: matching the player position and the player number in the player helmet detection information obtained in the step S1 and the sensor information of the step S2; s5: and (5) finishing player tracking by using a deep SORT method for extracting the enhanced features. According to the invention, the data alignment is completed by using the video frames with different dimensions and the information of two dimensions of the sensor, the allocation problem of the number of the player is determined by using the player helmet information detected by the YOLOv7 and the sensor frame, the tracking of the player is completed by using the deep start, the realization is simple, the applicability is higher, and the data association accuracy can be effectively improved.

Description

Football player tracking method based on video and sensor perception fusion
Technical Field
The invention relates to the technical field of computer vision and perception fusion, in particular to a football player tracking method based on video and sensor perception fusion.
Background
In recent years, multi-target tracking is widely applied to various sports events, and the efficiency of health monitoring of collision players is improved. In multi-target tracking, data association is still a very critical link, and the solution of the problem of data association in multi-target tracking depends on accurate matching of player points and motion trajectories. Improper association of the track with the spot may result in erroneous tracking or loss of the target, resulting in reduced tracking system performance.
Common multi-objective tracking is mainly divided into three ways: a multi-target tracking method based on a conventional method, a multi-target tracking method based on detection, and a multi-target tracking method based on an attention mechanism. The multi-target tracking based on the traditional method is characterized in that the target modeling analysis or the target characteristic tracking is performed, the characteristic matching method is adopted as a representative method, SURF characteristics, harris corner points and other characteristics of the target are extracted, and characteristic information which is most similar to the SURF characteristics and Harris corner points is searched in subsequent frames, so that the target is positioned in the frame. The multi-target tracking method based on detection comprises the steps of enabling each frame of video to pass through a detection network to obtain a target detection result, then cutting targets of all detection frames to obtain a plurality of targets, then converting a problem of target tracking into a bipartite graph matching problem of front and rear frames, associating the targets of the front and rear frames of the video, solving the problem by constructing a similarity matrix, and the common multi-target tracking method based on detection has a SORT, deepSORT algorithm. The multi-target tracking mode based on the attention mechanism introduces a transducer attention mechanism into the field of target tracking, and is controlled by a plurality of steps of tracking algorithm based on detection, target detection is respectively executed, and the characteristics are extracted and time is associated, so that the tracking effect is further improved.
The method is mainly used for tracking a plurality of targets through a pure vision scheme, but aiming at complex sports scenes such as ball games, because large-area collision shielding frequently occurs and the shooting view angles of cameras are frequently replaced, a lot of deviation is introduced in the process of tracking the players, and the player tracking accuracy is low. In addition, it is difficult to match the real number of the player on the playing field with the player number obtained by the tracking algorithm only by using the player tracking mode of the pure vision scheme.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a football player tracking method based on video and sensor perception fusion, which fully utilizes the information of two dimensions of a video frame and a sensor, does not need to keep the frame rate of the video frame and the frame rate of the sensor consistent, determines the allocation problem of player numbers through the player helmet information detected by YOLOv7 and the sensor frame, completes tracking of players by using deep, has simple realization, stronger applicability, can effectively improve the data association accuracy and improve the player tracking effect.
In order to achieve the above purpose, the present invention provides the following technical solutions: a football player tracking method based on fusion of video and sensor perception, comprising the steps of:
s1: detecting position information of a player helmet in each frame of a football game video by using a YOLOv7 method;
s2: aligning the dimensions of the video frame and the sensor data according to the timestamp information;
s3: respectively determining the machine orientation corresponding to the bottom line and the edge line video frames;
s4: matching the player position and the player number in the player helmet detection information obtained in the step S1 and the sensor information of the step S2;
s5: and (5) finishing player tracking by using a deep SORT method for extracting the enhanced features, and obtaining a final result.
Preferably, when the position information of the helmet of the player is detected in the step S1, the specific steps are as follows:
s101: after the video frames are subjected to frame extraction processing to obtain pictures, the football player helmet detection is completed by utilizing a YOLOv7 method;
s102: for the situation that the number m of detected player helmets exceeds the number n of actual players, the post-processing method of progressive frame cutting is used for reducing the interference of off-site replacement players, and the cutting process is as follows:
if the coordinates of the upper left corner and the lower right corner of the picture frame obtained by video frame extraction are (X) min ,Y min ),(X max ,Y max ) Setting a cutting step distance as s, and setting the upper limit of cutting times as t times:
the length and width of the original image are respectively reduced by 2s, and the vertex coordinates of the new upper left corner and the new lower right corner are as follows:
X' min =X min + s,
Y' min =Y min + s,
X' max =X max - s,
Y' max =X max - s,
and (3) removing the detection frame information of which the center point is positioned outside the picture frame in the S101, and updating the number m of the currently detected player helmets.
S103: the clipping step of S102 is repeated until m < = n or the clipping number reaches the clipping upper limit t.
Preferably, when the dimensions are aligned in the step S2, the specific steps are as follows:
s201: extracting information recorded by each contestant helmet sensor, wherein the information comprises the number of the current contestant, the position of the contestant at each moment, whether the contestant collides at the current moment and the timestamp information at the current moment;
s202: if the sampling frequency of the ball game video is f 1 Hz, sampling frequency of sensor f 2 Hz, respectively calculating the offset of each video recording frame and each sensor recording frame relative to the starting moment of the match, specifically:
step one: acquiring a sensor frame number snap_track and a video frame number snap_det at the starting moment of the game, and recording a time stamp t of the starting of the game begin
Step two: calculating an offset track_est for each of the sensor frames i The specific calculation steps are as follows:
track_est i = (track_t i - t begin ) / 1000 * f 2 + snap_track,
likewise, the offset det_est of each of the video frames is calculated i
det_est i = (det_t i - t begin ) / 1000 * f 1 + snap_det,
Wherein track_t i And det_t i The current time stamps for the sensor frame and the video frame, respectively.
Step three: for track_est in each sensor frame i Selecting the frame number det of the video frame with the smallest offset Euclidean distance i * As a matching frame, the alignment operation of the dimensions of the data is completed, specifically as follows:
wherein A and B are respectively the sets of all sensor frame numbers and video frame numbers, det_est j And (3) representing the offset of each video frame, wherein L (·) is a calculation formula of Euclidean distance.
Preferably, when determining the orientation of the video frame in the step S3, the specific steps are as follows:
s301: extracting a 60 th frame picture of the video under the bottom line visual angle, and identifying and recording numbers appearing in the video frame by using an OCR method;
s302: meanwhile, the number of the player in the sensor information corresponding to the frame is sorted in ascending order according to the horizontal coordinate of each player, and the sorted player number set is recorded as a sorted_tracking_players;
s303: the digits recognized by OCR in the step S301 are taken out and are respectively matched with the sorted player number sets obtained in the step S302, and if the current digits are the same as the numbers with the index of i in the list, the position i+1 is recorded in the set table pos;
s304: if sum (pos) > len (pos)// 2, determining the current viewing angle as the home agent; if sum (pos) < len (pos)// 2, then the current viewing angle is determined as the passenger crew position. Where sum () represents the sum of the addition of the set elements and len () represents the length of the set.
S305: randomly extracting video frame images under a sideline visual angle, converting the images in an RGB format into a single-channel gray level image, removing picture noise by Gaussian blur, and detecting straight line segments where landmark lines in the video frame images are located by using a Canny edge detection method.
S306: the image of the edge detection in step S305 is taken out, and a straight line segment and an end point coordinate set of the line segment exceeding 50 pixels in length { (x 1, y 1), (x 2, y 2) } are detected by using the hough straight line detection method and the detection result is recorded in the set B.
S307: randomly sampling T groups of line segments from the set B, and calculating slope according to coordinates of two ends of the line segments. The specific calculation method is as follows:
slope = (y2 - y1) / (x2 - x1),
if the slope of the line segment exceeding the T/2 group is greater than 0, the current slope is considered to be greater than 0, otherwise, the current slope is considered to be less than 0.
S308, acquiring the coordinate positions of all players in the sensor frame corresponding to the current video frame, normalizing the coordinates of the players into a plane rectangular coordinate system, and calculating the horizontal coordinate centers x_center of all players.
S309, according to the slope of the straight line and the position distribution information of the player, acquiring a current viewing angle judgment symbol location, wherein the calculation method comprises the following steps:
wherein x_medium is the horizontal coordinate of the position distribution center of the player, if the position is greater than 0, the current machine position is judged to be the viewing angle of the main team, and if the position is less than 0, the current machine position is judged to be the viewing angle of the guest team.
Preferably, when the player position is matched in the step S4, the specific steps are as follows:
s401: by using a self-adaptive rotation mode, correcting the angle deviation introduced by the movement of the machine position, the specific process is as follows:
if the clockwise rotation direction is positive, setting the minimum rotation angle to be-30 degrees, setting the maximum rotation angle to be 30 degrees, setting the rotation step distance to be 3 degrees each time, and setting the rotation angle set R= { -30 degrees, -27 degrees, -24 degrees, -21 degrees, -18 degrees, -15 degrees, -12 degrees, -9 degrees, -6 degrees, -3 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, 15 degrees, 18 degrees, 21 degrees, 24 degrees, 27 degrees and 30 degrees;
each time an angle θ is selected from R, the coordinates (x d , y d ) The rotation matrix is utilized to rotate theta degrees to obtain a new center point coordinate (x' d , y' d ):
S402: normalizing the coordinate information of the player in the sensor information and the coordinates of the central point of the helmet of each group of player corrected in the step S401 respectively, and calculating the Euclidean distance between the two coordinate sets of each group;
s403: selecting a group with the smallest Euclidean distance as the optimal matching best_match, and allocating numbers for each player and recording, wherein the method comprises the following specific steps:
wherein L (·) represents the Euclidean distance calculation formula, T θ Represents the rotation of the coordinates, G represents the normalization of the coordinates to the interval [0,1 ]]A is the set of helmet coordinate information of all players in the current frame, track i Sensor coordinates, det, representing each player in the current frame picture i Representing the current frame pictureAnd detecting the detection result of each player helmet.
Preferably, when the player is tracked in the step S5, the specific steps are as follows:
s501: inputting the matched player boundary frame information and the original video frame image into a deep SORT network, and extracting the characteristic information in a detection frame by utilizing a characteristic extraction network in the deep SORT network to obtain the left upper corner coordinate (left, top) of each player helmet, and the high height and wide width of the boundary frame;
s502: the original prediction bounding box is expanded from the boundary of the fully wrapped helmet to a bounding box containing the upper body of the player, such that the expanded bounding box contains the differential features between different players.
Where x, y are coordinates of a center point of the detection frame, and the expanded new coordinates of the upper left corner (new_left, new_top) and the lower right corner (new_right, new_bottom) of the bounding box are:
wherein scale_w and scale_h are respectively wide and high expansion coefficients;
s503, cutting out an original image of a video frame in the expanded bounding box to obtain an upper body image of each player, and adjusting the image size to 128 multiplied by 64 pixels to serve as a feature extractor input image of the deep SORT network;
s504: the helmet targets of two adjacent frames are associated with appearance features and motion features, fusion matching is carried out if the association is successful, and if and only if the fusion result is smaller than a threshold value, the matching is considered to be successful;
s505: using Kalman filtering to complete state prediction and updating at each moment;
s506: a Hungary algorithm is applied to obtain a successful matching pair set, an unmatched helmet target set and a matching number track set of the player helmet target and the corresponding number track;
s507: performing secondary verification on a successful matching pair set obtained by the Hungary algorithm, and updating the matching pair set to be in an unmatched state if the target similarity between the calculated matching pairs is larger than a given threshold value; finally obtaining a successful matching target set, an unmatched helmet target set and an unmatched numbering track set;
s508: integrating the tracking results and completing the visualization in the original video.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a football player tracking method based on video and sensor perception fusion, which fully utilizes information of two dimensions of a video frame and a sensor, and does not need to keep the frame rates of the video camera and the sensor consistent, thereby reducing the requirements on equipment and being beneficial to promoting the development of health monitoring after the collision of the football player.
2. The invention determines the allocation problem of the number of the player through the information of the helmet of the player detected by YOLOv7 and the sensor frame, completes the tracking of the player by using Deep SORT, detects the boundary frame of the helmet of the player detected by YOLOv7, determines the allocation problem of the number of the player by the detection result and the sensor data based on a rotation alignment mode, respectively designs a machine position prediction method of a bottom line and a side line to determine the orientation of the machine position on the premise of default machine position marking, and finally completes the tracking of the player by using Deep SORT extracted by enhanced features, thereby being simple to realize, having stronger applicability and effectively improving the accuracy of data association.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic view of a frame cut;
FIG. 3 is a flow chart of the bottom line machine position prediction;
FIG. 4 is a flow chart of edge machine position prediction;
fig. 5 is a schematic diagram of rotation matching.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a football player tracking method based on video and sensor perception fusion includes the following steps:
s1: detecting position information of a player helmet in each frame of a football game video by using a YOLOv7 method;
s2: aligning the dimensions of the video frame and the sensor data according to the timestamp information;
s3: respectively determining the machine orientation corresponding to the bottom line and the edge line video frames;
s4: matching the player position and the player number in the player helmet detection information obtained in the step S1 and the sensor information of the step S2;
s5: and (5) finishing player tracking by using a deep SORT method for extracting the enhanced features, and obtaining a final result.
Further, the step S1 includes the following steps:
s101: after the video frames are subjected to frame extraction processing to obtain pictures, the football player helmet detection is completed by utilizing a YOLOv7 method;
s102: for the situation that the number m of detected player helmets exceeds the number n of actual players, the interference of off-site replacement players is reduced by using a post-processing method of progressive frame cutting, and the cutting process is shown in fig. 2:
if the coordinates of the upper left corner and the lower right corner of the picture frame obtained by video frame extraction are (X) min ,Y min ),(X max ,Y max ) Setting a cutting step distance as s, and setting the upper limit of cutting times as t times:
the length and width of the original image are respectively reduced by 2s, and the vertex coordinates of the new upper left corner and the new lower right corner are as follows:
X' min =X min + s,
Y' min =Y min + s,
X' max =X max - s,
Y' max =X max - s,
and (3) removing the detection frame information of which the center point is positioned outside the picture frame in the S101, and updating the number m of the currently detected player helmets.
S103: the clipping step of S102 is repeated until m < = n or the clipping number reaches the clipping upper limit t.
S2: aligning the dimensions of the video frame and the sensor data according to the timestamp information:
s201: extracting information recorded by each contestant helmet sensor, wherein the information comprises the number of the current contestant, the position of the contestant at each moment, whether the contestant collides at the current moment and the timestamp information at the current moment;
s202: if the sampling frequency of the ball game video is f 1 Hz, sampling frequency of sensor f 2 Hz, respectively calculating the offset of each video recording frame and each sensor recording frame relative to the starting moment of the match, specifically:
step one: acquiring a sensor frame number snap_track and a video frame number snap_det at the starting moment of the game, and recording a time stamp t of the starting of the game begin
Step two: calculating an offset per frame track_est in a sensor i The specific calculation steps are as follows:
track_est i = (track_t i - t begin ) / 1000 * f 2 + snap_trac k,
likewise, the offset det_est of each of the video frames is calculated i
det_est i = (det_t i - t begin ) / 1000 * f 1 + snap_det,
Wherein track_t i And det_t i The current time stamps for the sensor frame and the video frame, respectively.
Step three: for track_est in each sensor frame i Selecting the frame number det of the video frame with the smallest offset Euclidean distance i * As a matching frame, the alignment operation of the dimensions of the data is completed, specifically as follows:
wherein A and B are respectively the sets of all sensor frame numbers and video frame numbers, det_est j And (3) representing the offset of each video frame, wherein L (·) is a calculation formula of Euclidean distance.
S3: and respectively determining the machine orientation corresponding to the bottom line and the edge line video frames:
the flow of the bottom line machine position prediction is shown in fig. 3, and the steps are as follows:
s301: extracting a 60 th frame picture of the video under the bottom line visual angle, and identifying and recording numbers appearing in the video frame by using an OCR method;
s302: meanwhile, the number of the player in the sensor information corresponding to the frame is sorted in ascending order according to the horizontal coordinate of each player, and the sorted player number set is recorded as a sorted_tracking_players;
s303: the digits recognized by OCR in the step S301 are taken out and are respectively matched with the sorted player number sets obtained in the step S302, and if the current digits are the same as the numbers with the index of i in the list, the position i+1 is recorded in the set table pos;
s304: if sum (pos) > len (pos)// 2, determining the current viewing angle as the home agent; if sum (pos) < len (pos)// 2, determining the current viewing angle as the passenger train position; where sum () represents the sum of the addition of the set elements and len () represents the length of the set.
The flow of the boundary machine position prediction is shown in fig. 4, and the steps are as follows:
s305: randomly extracting video frame images under a sideline visual angle, converting the images in an RGB format into a single-channel gray level image, removing picture noise by Gaussian blur, and detecting straight line segments where landmark lines in the video frame images are located by using a Canny edge detection method.
S306: the image of the edge detection in step S305 is taken out, and a straight line segment and an end point coordinate set of the line segment exceeding 50 pixels in length { (x 1, y 1), (x 2, y 2) } are detected by using the hough straight line detection method and the detection result is recorded in the set B.
S307: randomly sampling T groups of line segments from the set B, and calculating slope according to coordinates of two ends of the line segments. The specific calculation is as follows:
slope = (y2 - y1) / (x2 - x1),
if the slope of the line segment exceeding the T/2 group exceeds 0, the slope is considered to be judged to be 0, otherwise, the current slope is judged to be < 0.
S308, acquiring the coordinate positions of all players in the sensor frame corresponding to the current video frame, normalizing the coordinates of the players into a plane rectangular coordinate system, and calculating the horizontal coordinate centers x_center of all players.
S309, according to the slope of the straight line and the position distribution information of the player, acquiring a current view angle judgment symbol location, wherein the specific calculation is as follows:
wherein x_medium is the horizontal coordinate of the position distribution center of the player, if the position is greater than 0, the current machine position is judged to be the viewing angle of the main team, and if the position is less than 0, the current machine position is judged to be the viewing angle of the guest team.
S4: as shown in fig. 5, the player position and player number in the player helmet detection information and the sensor information are matched:
s401: by using a self-adaptive rotation mode, correcting the angle deviation introduced by the movement of the machine position, the specific process is as follows:
if the clockwise rotation direction is positive, setting the minimum rotation angle to be-30 degrees, setting the maximum rotation angle to be 30 degrees, setting the rotation step distance to be 3 degrees each time, and setting the rotation angle set R= { -30 degrees, -27 degrees, -24 degrees, -21 degrees, -18 degrees, -15 degrees, -12 degrees, -9 degrees, -6 degrees, -3 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, 15 degrees, 18 degrees, 21 degrees, 24 degrees, 27 degrees and 30 degrees;
each time an angle θ is selected from R, the coordinates (x d , y d ) The rotation matrix is utilized to rotate theta degrees to obtain a new center point coordinate (x' d , y' d ):
S402: normalizing the coordinate information of the player in the sensor information and the coordinates of the central point of the helmet of each group of player corrected in the step S401 respectively, and calculating the Euclidean distance between the two coordinate sets of each group;
s403: selecting a group with the smallest Euclidean distance as the optimal matching best_match, and allocating numbers for each player and recording, wherein the method comprises the following specific steps:
wherein L (·) represents the Euclidean distance calculation formula, T θ Represents the rotation of the coordinates, G represents the normalization of the coordinates to the interval [0,1 ]]A is the set of helmet coordinate information of all players in the current frame, track i Sensor coordinates, det, representing each player in the current frame picture i And the detection result of the helmet of each player in the current frame picture is shown.
S5: finishing player tracking by using a deep SORT method to obtain a final result:
s501: inputting the matched player boundary frame information and the original video frame image into a deep SORT network, and extracting the characteristic information in a detection frame by utilizing a characteristic extraction network in the deep SORT network to obtain the left upper corner coordinate (left, top) of each player helmet, and the high height and wide width of the boundary frame;
s502: the original prediction bounding box is expanded from the boundary of the fully wrapped helmet to a bounding box containing the upper body of the player, such that the expanded bounding box contains the differential features between different players.
Where x, y are coordinates of a center point of the detection frame, and the expanded new coordinates of the upper left corner (new_left, new_top) and the lower right corner (new_right, new_bottom) of the bounding box are:
wherein scale_w and scale_h are respectively wide and high expansion coefficients;
s503, cutting out an original image of a video frame in the expanded bounding box to obtain an upper body image of each player, and adjusting the image size to 128 multiplied by 64 pixels to serve as a feature extractor input image of the deep SORT network;
s504: the helmet targets of two adjacent frames are associated with appearance features and motion features, fusion matching is carried out if the association is successful, and if and only if the fusion result is smaller than a threshold value, the matching is considered to be successful;
s505: using Kalman filtering to complete state prediction and updating at each moment;
s506: a Hungary algorithm is applied to obtain a successful matching pair set, an unmatched helmet target set and a matching number track set of the player helmet target and the corresponding number track;
s507: performing secondary verification on a successful matching pair set obtained by the Hungary algorithm, and updating the matching pair set to be in an unmatched state if the target similarity between the calculated matching pairs is larger than a given threshold value; finally obtaining a successful matching target set, an unmatched helmet target set and an unmatched numbering track set;
s508: integrating the tracking results and completing the visualization in the original video.
The football player tracking method based on the video and sensor perception fusion, provided by the invention, improves the problem of poor accuracy of a pure vision tracking scheme, improves the accuracy of tracking the football player by using a sensor and monocular vision fusion mode, and promotes the development of health monitoring after the football player is collided.
The foregoing description of the preferred embodiments of the present invention has been presented only in terms of those specific and detailed descriptions, and is not, therefore, to be construed as limiting the scope of the invention. It should be noted that modifications, improvements and substitutions can be made by those skilled in the art without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (4)

1. A football player tracking method based on video and sensor perception fusion is characterized in that: the method comprises the following steps:
s1: detecting position information of a player helmet in each frame of a football game video by using a YOLOv7 method;
s2: aligning the dimensions of the video frame and the sensor data according to the timestamp information;
s3: respectively determining the machine orientation corresponding to the bottom line and the edge line video frames;
s4: matching the player position and the player number in the player helmet detection information obtained in the step S1 and the sensor information of the step S2;
s5: finishing player tracking by using a deep SORT method for enhancing feature extraction to obtain a final result;
in step S3, the specific steps are as follows:
s301: extracting a 60 th frame picture of the video under the bottom line visual angle, and identifying and recording numbers appearing in the video frame by using an OCR method;
s302: meanwhile, the number of the player in the sensor information corresponding to the frame is sorted in ascending order according to the horizontal coordinate of each player, and the sorted player number set is recorded as a sorted_tracking_players;
s303: the digits recognized by OCR in the step S301 are taken out and are respectively matched with the sorted player number sets obtained in the step S302, and if the current digits are the same as the numbers with the index of i in the list, the position i+1 is recorded in the set table pos;
s304: if sum (pos) > len (pos)// 2, determining the current viewing angle as the home agent; if sum (pos) < len (pos)// 2, determining the current viewing angle as the passenger train position; where sum () represents the sum of the addition of the set elements and len () represents the length of the set;
s305: randomly extracting video frame images under a sideline visual angle, converting the images in an RGB format into a single-channel gray level image, removing picture noise by Gaussian blur, and detecting straight line segments where landmark lines in the video frame images are located by using a Canny edge detection method;
s306: taking out the image detected by the edge in step S305, detecting a straight line segment with a length exceeding 50 pixels and an end point coordinate set of the line segment { (x 1, y 1), (x 2, y 2) } by using a hough straight line detection method, and recording the detection result to a set B;
s307: randomly sampling a T group of line segments from the set B, and calculating a slope according to coordinates of two ends of the line segments; the specific calculation method is as follows:
slope = (y2 - y1) / (x2 - x1)
if the slope of the line segment exceeding the T/2 group is greater than 0, the current slope is considered to be greater than 0, otherwise, the current slope is considered to be less than 0;
s308, acquiring the coordinate positions of all players in a sensor frame corresponding to the current video frame, normalizing the coordinates of the players into a plane rectangular coordinate system, and calculating the horizontal coordinate centers x_center of all players;
s309, according to the slope of the straight line and the position distribution information of the player, acquiring a current viewing angle judgment symbol location, wherein the calculation method comprises the following steps:
wherein x_medium is the horizontal coordinate of the position distribution center of the player, if the position is more than 0, the current machine position is judged to be the viewing angle of the main team, and if the position is less than 0, the current machine position is judged to be the viewing angle of the guest team;
in step S4, the specific steps are as follows:
s401: by using a self-adaptive rotation mode, correcting the angle deviation introduced by the movement of the machine position, the specific process is as follows:
if the clockwise rotation direction is positive, setting the minimum rotation angle to be-30 degrees, setting the maximum rotation angle to be 30 degrees, setting the rotation step distance to be 3 degrees each time, and setting the rotation angle set R= { -30 degrees, -27 degrees, -24 degrees, -21 degrees, -18 degrees, -15 degrees, -12 degrees, -9 degrees, -6 degrees, -3 degrees, 0 degrees, 3 degrees, 6 degrees, 9 degrees, 12 degrees, 15 degrees, 18 degrees, 21 degrees, 24 degrees, 27 degrees and 30 degrees;
each time an angle θ is selected from R, the coordinates (x d , y d ) The rotation matrix is utilized to rotate theta degrees to obtain a new center point coordinate (x' d , y' d ):
S402: normalizing the coordinate information of the player in the sensor information and the coordinates of the central point of the helmet of each group of player corrected in the step S401 respectively, and calculating the Euclidean distance between the two coordinate sets of each group;
s403: selecting a group with the smallest Euclidean distance as the optimal matching best_match, and allocating numbers for each player and recording, wherein the method comprises the following specific steps:
wherein L (·) represents the Euclidean distance calculation formula, T θ Represents the rotation of the coordinates, G represents the normalization of the coordinates to the interval [0,1 ]]A is the set of helmet coordinate information of all players in the current frame, track i Sensor coordinates, det, representing each player in the current frame picture i And the detection result of the helmet of each player in the current frame picture is shown.
2. The football player tracking method based on video and sensor perception fusion of claim 1, wherein in step S1, the specific steps are as follows:
s101: after the video frames are subjected to frame extraction processing to obtain pictures, the football player helmet detection is completed by utilizing a YOLOv7 method;
s102: for the situation that the number m of detected player helmets exceeds the number n of actual players, the post-processing method of progressive frame cutting is used for reducing the interference of off-site replacement players, and the cutting process is as follows:
if the coordinates of the upper left corner and the lower right corner of the picture frame obtained by video frame extraction are (X) min ,Y min ),(X max ,Y max ) Setting a cutting step distance as s, and setting the upper limit of cutting times as t times:
the length and width of the original image are respectively reduced by 2s, and the vertex coordinates of the new upper left corner and the new lower right corner are as follows:
X' min =X min + s
Y' min =Y min + s
X' max =X max - s
Y' max =X max - s
removing the detection frame information of which the center point is positioned outside the picture frame and updating the number m of the currently detected player helmets;
s103: the clipping step of S102 is repeated until m < = n or the clipping number reaches the clipping upper limit t.
3. The football player tracking method based on video and sensor perception fusion of claim 1, wherein in step S2, the specific steps are as follows:
s201: extracting information recorded by each contestant helmet sensor, wherein the information comprises the number of the current contestant, the position of the contestant at each moment, whether the contestant collides at the current moment and the timestamp information at the current moment;
s202: if the sampling frequency of the ball game video is f 1 Hz, sampling frequency of sensor f 2 Hz, sensor frame number snap_track and video frame number snap_det at the starting moment of the game are obtained, and the time stamp t of the starting of the game is recorded begin
S203: calculating an offset track_est for each of the sensor frames i The method comprises the following steps:
likewise, the offset det_est of each of the video frames is calculated i
Wherein track_t i And det_t i Current timestamps of the sensor frame and the video frame, respectively;
s204: for track_est in each sensor frame i Selecting the frame number det of the video frame with the smallest offset Euclidean distance L i * As a matching frame, the dimension alignment of data is completed, concretely as follows:
wherein A and B are respectively the sets of all sensor frame numbers and video frame numbers, det_est j And (3) representing the offset of each video frame, wherein L (·) is a calculation formula of Euclidean distance.
4. The football player tracking method based on video and sensor perception fusion of claim 1, wherein in step S5, the specific steps are as follows:
s501: inputting the matched player boundary frame information and the original video frame image into a deep SORT network, and extracting the characteristic information in a detection frame by utilizing a characteristic extraction network in the deep SORT network to obtain the left upper corner coordinate (left, top) of each player helmet, and the high height and wide width of the boundary frame;
s502: expanding an original prediction bounding box from a completely wrapped helmet boundary to a bounding box containing the upper body of the player, such that the expanded bounding box contains differential features between different players;
where x, y are coordinates of a center point of the detection frame, and the expanded new coordinates of the upper left corner (new_left, new_top) and the lower right corner (new_right, new_bottom) of the bounding box are:
wherein scale_w and scale_h are respectively wide and high expansion coefficients;
s503, cutting out an original image of a video frame in the expanded bounding box to obtain an upper body image of each player, and adjusting the image size to 128 multiplied by 64 pixels to serve as a feature extractor input image of the deep SORT network;
s504: the helmet targets of two adjacent frames are associated with appearance features and motion features, fusion matching is carried out if the association is successful, and if and only if the fusion result is smaller than a threshold value, the matching is considered to be successful;
s505: using Kalman filtering to complete state prediction and updating at each moment;
s506: a Hungary algorithm is applied to obtain a successful matching pair set, an unmatched helmet target set and a matching number track set of the player helmet target and the corresponding number track;
s507: performing secondary verification on a successful matching pair set obtained by the Hungary algorithm, and updating the matching pair set to be in an unmatched state if the target similarity between the calculated matching pairs is larger than a given threshold value; finally obtaining a successful matching target set, an unmatched helmet target set and an unmatched numbering track set;
s508: integrating the tracking results and completing the visualization in the original video.
CN202310685914.5A 2023-06-12 2023-06-12 Football player tracking method based on video and sensor perception fusion Active CN116469040B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310685914.5A CN116469040B (en) 2023-06-12 2023-06-12 Football player tracking method based on video and sensor perception fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310685914.5A CN116469040B (en) 2023-06-12 2023-06-12 Football player tracking method based on video and sensor perception fusion

Publications (2)

Publication Number Publication Date
CN116469040A CN116469040A (en) 2023-07-21
CN116469040B true CN116469040B (en) 2023-08-29

Family

ID=87181052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310685914.5A Active CN116469040B (en) 2023-06-12 2023-06-12 Football player tracking method based on video and sensor perception fusion

Country Status (1)

Country Link
CN (1) CN116469040B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000064088A (en) * 2000-08-21 2000-11-06 주진용 Analysis Broadcasting System And Method Of Sports Image
WO2015081303A1 (en) * 2013-11-26 2015-06-04 Double Blue Sports Analytics, Inc. Automated video tagging with aggregated performance metrics
CN109903312A (en) * 2019-01-25 2019-06-18 北京工业大学 A kind of football sportsman based on video multi-target tracking runs distance statistics method
CN111093781A (en) * 2017-09-29 2020-05-01 英特尔公司 Aligning sensor data with video
CN111104851A (en) * 2019-11-05 2020-05-05 新华智云科技有限公司 Method and system for automatically generating defense area at basketball goal moment
CN113506210A (en) * 2021-08-10 2021-10-15 深圳市前海动竞体育科技有限公司 Method for automatically generating point maps of athletes in basketball game and video shooting device
CN113688740A (en) * 2021-08-26 2021-11-23 燕山大学 Indoor posture detection method based on multi-sensor fusion vision
CN113780181A (en) * 2021-09-13 2021-12-10 浙江大学 Football match offside judgment method and device based on unmanned aerial vehicle and electronic equipment
CN114120168A (en) * 2021-10-15 2022-03-01 上海洛塔信息技术有限公司 Target running distance measuring and calculating method, system, equipment and storage medium
DE202022101862U1 (en) * 2022-04-07 2022-05-17 Aziz Makandar System for identifying players and tracking multiple targets using an extended Gaussian mixture model
CN115131821A (en) * 2022-06-29 2022-09-30 大连理工大学 Improved YOLOv5+ Deepsort-based campus personnel crossing warning line detection method
CN115731268A (en) * 2022-11-17 2023-03-03 东南大学 Unmanned aerial vehicle multi-target tracking method based on visual/millimeter wave radar information fusion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180193694A1 (en) * 2017-01-06 2018-07-12 Rick C. Bergman Rfid-based location identification in athletic equipment and athletic playing fields

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000064088A (en) * 2000-08-21 2000-11-06 주진용 Analysis Broadcasting System And Method Of Sports Image
WO2015081303A1 (en) * 2013-11-26 2015-06-04 Double Blue Sports Analytics, Inc. Automated video tagging with aggregated performance metrics
CN111093781A (en) * 2017-09-29 2020-05-01 英特尔公司 Aligning sensor data with video
CN109903312A (en) * 2019-01-25 2019-06-18 北京工业大学 A kind of football sportsman based on video multi-target tracking runs distance statistics method
CN111104851A (en) * 2019-11-05 2020-05-05 新华智云科技有限公司 Method and system for automatically generating defense area at basketball goal moment
CN113506210A (en) * 2021-08-10 2021-10-15 深圳市前海动竞体育科技有限公司 Method for automatically generating point maps of athletes in basketball game and video shooting device
CN113688740A (en) * 2021-08-26 2021-11-23 燕山大学 Indoor posture detection method based on multi-sensor fusion vision
CN113780181A (en) * 2021-09-13 2021-12-10 浙江大学 Football match offside judgment method and device based on unmanned aerial vehicle and electronic equipment
CN114120168A (en) * 2021-10-15 2022-03-01 上海洛塔信息技术有限公司 Target running distance measuring and calculating method, system, equipment and storage medium
DE202022101862U1 (en) * 2022-04-07 2022-05-17 Aziz Makandar System for identifying players and tracking multiple targets using an extended Gaussian mixture model
CN115131821A (en) * 2022-06-29 2022-09-30 大连理工大学 Improved YOLOv5+ Deepsort-based campus personnel crossing warning line detection method
CN115731268A (en) * 2022-11-17 2023-03-03 东南大学 Unmanned aerial vehicle multi-target tracking method based on visual/millimeter wave radar information fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种多特征自适应融合的球员跟踪算法;张晓伟;刘弘;孙玉灵;;计算机工程(第17期);全文 *

Also Published As

Publication number Publication date
CN116469040A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN110232311B (en) Method and device for segmenting hand image and computer equipment
CA2949844C (en) System and method for identifying, analyzing, and reporting on players in a game from video
CN109145708B (en) Pedestrian flow statistical method based on RGB and D information fusion
TWI679612B (en) Image tracking method
CN109145803B (en) Gesture recognition method and device, electronic equipment and computer readable storage medium
CN104091175B (en) A kind of insect automatic distinguishing method for image based on Kinect depth information acquiring technology
JPWO2014156733A1 (en) Number counting device and number counting method
CN107247942B (en) Tennis video event detection method integrating multi-mode features
CN104517101A (en) Game poker card recognition method based on pixel square difference matching
TW201541407A (en) Method for generating three-dimensional information from identifying two-dimensional images
CN111209820B (en) Face living body detection method, system, equipment and readable storage medium
WO2019172172A1 (en) Object tracker, object tracking method, and computer program
CN112287867A (en) Multi-camera human body action recognition method and device
CN116469040B (en) Football player tracking method based on video and sensor perception fusion
CN111914913A (en) Novel stereo matching optimization method
CN111160107A (en) Dynamic region detection method based on feature matching
CN106550229A (en) A kind of parallel panorama camera array multi-view image bearing calibration
CN111709954A (en) Calibration method of go robot vision system
CN107368826A (en) Method and apparatus for text detection
CN111275021A (en) Automatic football offside line scribing method based on computer vision
CN110322476B (en) Target tracking method for improving STC and SURF feature joint optimization
CN112016565A (en) Segmentation method for fuzzy numbers at account number of financial bill
CN116309780A (en) Water gauge water level identification method based on target detection
Lee et al. A study on sports player tracking based on video using deep learning
CN108076365B (en) Human body posture recognition device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant