CN113033308A

CN113033308A - Team sports video game lens extraction method based on color features

Info

Publication number: CN113033308A
Application number: CN202110204176.9A
Authority: CN
Inventors: 毋立芳; 袁元; 卢哲; 简萌; 孙泽文; 张晔鹏; 韩嘉润; 万青青
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2021-06-25

Abstract

A team sports video match shot extraction method based on color features belongs to the field of video data processing. Firstly, preprocessing a video, and automatically segmenting the video by taking a lens as a unit to structure the video; and then calculating the r, g and b channel arithmetic mean values of the video frames and the shot video clips, then clustering through K-means, selecting a reference shot video clip according to the extreme difference of the r, g and b channel arithmetic mean values of the video frames in the shot video clips, and dividing the team sports video into a match shot (far shot and middle shot) video clip and other shot (special-write shot and off-site shot) video clips by comparing the dominant color difference of the reference shot video clip and each shot video clip. The invention extracts the far-lens and middle-lens video segments occupying most of the content of team sports videos by utilizing the dominant color difference of the video frames, is favorable for reducing the adverse effect of redundant content on video analysis and processing, and lays a good structural foundation for further research.

Description

Team sports video game lens extraction method based on color features

Technical Field

The invention belongs to the field of video data processing.

Background

Team sports such as football, basketball and the like are popular with people and have wide crowd foundation, so that the method has wide application prospect in research on team sports videos. Video data is unstructured data, which is converted into a series of labeled sequences with shot type as a key attribute, and is the basis for event detection and summary generation. The common method is that firstly, the video is preprocessed, and the video is automatically segmented and structured by taking a shot as a unit; on the basis, the type of the lens is automatically identified through methods such as mode identification and the like. In team sports videos, shots in the videos can be divided into four types, namely a far shot, a medium shot, a close-up shot and an off-site shot, according to the difference between shooting angles and shooting distances. The far lens displays the station of each player in the court and the progress of the game at an aerial view angle, and the middle lens more clearly displays the game situation of most players in the court; close-up often is the playback of far and medium shots; off-site footage such as spectators, advertising, and LOGO are redundant content relative to the game itself. In summary, the far and middle shots, which typically occupy the vast majority of the entire game, are the most important game shots for the game itself. Extracting the game shots of the sports video may reduce the adverse impact of redundant content on the video analysis process.

Eldib and the like determine an RGB range interval of field pixels through experiments, all pixels of which components fall in the interval are regarded as field color pixels, and then threshold values are set according to the proportion of the field pixels to distinguish various lenses. And the type of the lens is reflected by utilizing the field color proportion of a sub-window area in a video frame in Junqing and the like, and a certain effect is achieved.

The shot classification firstly needs to carry out video segmentation, accurately detects shot boundaries to complete video segmentation, and is a guarantee premise for effectively dividing the shot types. In team sports video, shot boundaries can be divided into abrupt and gradual types. In the existing related work, the shot segmentation still easily generates false detection, especially the detection of the gradual shot, the reason is that many slow shots are played back in the team sports video, and the LOGO is frequently swept and changed at the beginning and the end of the playback, so that the wrong judgment of the algorithm is easily caused, which is very unfavorable for the subsequent research work. The existing lens classification algorithm cannot compensate for the false detection of the segmentation of the lens boundary, and the accuracy of lens classification is reduced due to the influence of the false detection.

Secondly, the algorithm proposed by Eldib et al is based on the field color, but the playing field will change with the factors of venue, weather, light, etc. The algorithm proposed by Junqing et al depends on the selection of the size and position of the sub-window, so the algorithm robustness is poor.

In summary, the recognition accuracy of each type of shot in the current shot classification method is obviously affected by the insufficient accuracy of shot boundary detection, so that the recognition accuracy of the shot is not ideal, and the universality of the existing shot classification algorithm is not strong.

Disclosure of Invention

The invention aims to provide a team sports video game shot extraction method based on color features, which is used for extracting a far lens and a medium lens occupying most of the contents of team sports videos to perform subsequent analysis and research on the videos. The team sports video is divided into a match shot (far shot, middle shot) video clip and other shot (close-up shot, off-scene) video clips according to the fact that the dominant colors of the whole video frame are different greatly among different types of shots. Experiments prove that the method has good adaptability, high accuracy, practicability and high efficiency.

Inputting: original team sports video

And (3) outputting: set of game shot video clips

Defining:

S＝{S₁,S₂,……,S_S}: collecting shot video clips after video segmentation;

f: a set of video frames of a shot video clip;

AM_R: the arithmetic mean value of the R channel values corresponding to all the pixel points in the video frame;

AM_G: the arithmetic mean value of the values of the G channels corresponding to all the pixel points in the video frame;

AM_B: the arithmetic mean value of the values of the B channels corresponding to all the pixel points in the video frame;

R_R: AM in shot video clip_RA very poor result;

R_G: AM in shot video clip_GA very poor result;

R_B: AM in shot video clip_BA very poor result;

all video frames in a shot video clip correspond to AM_RThe arithmetic mean of (a);

all video frames in a shot video clip correspond to AM_GThe arithmetic mean of (a);

all video frames in a shot video clip correspond to AM_BThe arithmetic mean of (a);

A_RGB: will be provided with

Arrays stacked in row order;

C_max: after K-means clustering, the video clip with the most shots is provided;

TH_BS: a threshold for distinguishing the candidate reference shot from all other shots;

and (3) CBS: a set of candidate reference shot video clips;

S_FM: the CBS has shot video clips with the maximum video frame number;

shot video clip S_FMCorresponding to

Shot video clip S_FMCorresponding to

Shot video clip S_FMCorresponding to

And

percent error of (d);

and

percent error of (d);

and

percent error of (d);

TH_GS: a threshold for distinguishing match shots from non-match shots;

s': a set of game shot video segments.

The method comprises the following steps: the video is segmented into video shot fragments, and the video is stored frame by frame according to the form of pictures to obtain S

Step two: computing AM of all video frames_R,AM_G,AM_B

Step three: calculating R of all shot video clips_R,R_G,R_B；

And A_RGB

Step four: corresponding to all shot video clips_RGBK-means clustering is performed and C is found_max

Step five: to C_maxAll shot video clips are processed as follows: if R is_R,R_G,R_BAt least one of them is less than or equal to TH_BSThen store the shot video clip into CBS

Step six: find out

Step seven: computing stationWith shot video clips

And performing the following operations on all shot video clips: if it is not

Are all less than or equal to TH_GSThen it is stored in S'.

Taking values: TH_BS＝100,TH_GS＝15％。

By using mean attribute in ImageStat module of Python image processing library PIL, average AM of three channel pixel values of each frame in single-shot video clip can be calculated_R,AM_G,AM_B。

The formula is as follows:

in the formula, r, g, b are RGB color pixel values corresponding to pixel points in the video frame, respectively, and n is the number of pixel points.

In the formula, N is the number of video frames included in a single shot video clip.

The invention utilizes the color characteristics to extract the match shots of the team sports video, and utilizes the dominant color difference of the whole video frame to eliminate the close-up shots and off-scene shots, thereby keeping the main match shots in the video. In the existing related work, the classification of the lens is usually concentrated on the field, and the method is not limited by the conventional thought and provides a new idea. Firstly, the method is not limited by site factors and is suitable for sports of multiple teams, and meanwhile, the method can well make up for partial false detection in the shot boundary detection method: if the non-match shot has wrong segmentation, the whole shot can be identified and removed, and the research and analysis of the subsequent video cannot be influenced. Therefore, the method can extract the most main match shot part in the sports video of the whole team, and lays a good structural foundation for the subsequent semantic event detection and abstract generation.

To test the effectiveness of the method, 25, 112 minutes of video were extracted from the published Youtube-8M dataset to form a SportKF dataset as shown in FIG. 2, from four sports (basketball, football, American football, and hockey), containing 197,878 frames, 572 shots.

The method is applied to the data set, and 5 sections of football videos are compared with the recall ratio and the precision ratio by the method proposed by Junqing and the like, wherein the computing formulas of the recall ratio and the precision ratio are as follows:

checking rate (actual detection number-false detection number)/number to be detected

Precision ratio (true check count-false check count)/true check count

For the extracted match shots, the recall ratio of the four team sports is 100%, and the precision ratio is 86.67%. Aiming at football videos, the recall ratio of the method proposed by Junqing and the like is 93.2%, and the recall ratio of the method reaches 100%; the precision rate of the method proposed by Junqing and the like is 88.7%, and the recall rate of the method reaches 93.4%.

The experimental result shows that the method has a good effect on the detection and identification of the match shots, and the recall ratio and the precision ratio are improved compared with the representative methods provided by Junqing and the like.

Drawings

FIG. 1 is a flow chart

FIG. 2SportKF dataset video cover

FIG. 3 perspective view of the lens

FIG. 4 close-up lens

FIG. 5LOGO lens

Detailed Description

The invention provides a team sports video game shot extraction method based on color features. The method comprises the following specific implementation steps:

take the NBA game video with a duration of 7 minutes, a frame width 640, a frame height 360, and a frame rate of 30 frames/second as an example. The far shot is shown in fig. 3, the close-up shot is shown in fig. 4, and the LOGO shot is shown in fig. 5 (the specific numerical values below are 2 bits after decimal point, rounded).

The method comprises the following steps: segmenting the video into video shots to obtain S ═ S₁,S₂,……,S₃₃}. (with 33 lenses in total)

Step two: computing AM of all video frames_R,AM_G,AM_B. AM corresponding to FIG. 3_R＝100.50,AM_G＝80.34,AM_B74.88; AM corresponding to FIG. 4_R＝93.04,AM_G＝76.01,AM_B71.23; AM corresponding to FIG. 5_R＝130.07,AM_G＝112.26,AM_B＝79.29.

Step three: calculating R of all shot video clips_R,R_G,R_B；

And A_RGB

R corresponding to the lens (serial number 17) in FIG. 3_R＝37.22,R_G＝35.56,R_B＝40.65；

A_RGB＝(110.05,90.39,79.10)

R corresponding to the lens (serial number 3) in FIG. 4_R＝40.13,R_G＝28.46,R_B＝19.81；

A_RGB＝(96.27,77.20,65.88)

R corresponding to the lens (serial number 12) in FIG. 5_R＝13.84,R_G＝12.63,R_B＝1.80；

A_RGB＝(105.70,90.43,74.45)

Step four: corresponding to all shot video clips_RGBK-means clustering is performed and C is found_maxIn this example C_maxComprises 9

lenses

4,6,9,15,17,20,21,27 and 31

TH in this example_BSSet to 100, see C_maxLens with serial number 17 in (1), corresponding to R_R＝40.13,R_G＝28.46,R_B19.81 satisfies the condition "if R_R,R_G,R_BAt least one of them is less than or equal to TH_BS"so the CBS is stored; in contrast, C_maxLens with a serial number of 20 in (1), corresponding to R_R＝212.30,R_G＝216.18,R_B196.36 does not satisfy the condition "if R_R,R_G,R_BAt least one of them is less than or equal to TH_BS"so CBS is not stored; the result CBS of step five in this example contains 8

shots

4,6,9,15,17,21,27, 31.

Step six: find out

In this example, the shot with the largest frame number among the 8 shots included in the CBS is the shot with the sequence number of 31, and includes 1895 frames, and therefore, the corresponding shot corresponds to 1895 frames

Step seven: computing all shot video clips

And performing the following operations on all shot video clips: if it is not

Are all less than or equal to TH_GSThen store it into S'

TH in this example_GSSet to 15%, as shown in the lens (number 17) of FIG. 3, corresponding to

Satisfy'

Are all less than or equal to TH_GS"so stored in S'; on the contrary, as the lens (serial number 3) in FIG. 4, the corresponding one

And do not satisfy "

Are all less than or equal to TH_GS"so S' is not stored; the final resulting set S' of game shots in this example containsThe 11 shots with the

serial numbers

2,4,6,9,13,15,17,21,25,27 and 31 are all match shots and have no missed detection.

Claims

1. A team sports video match shot extraction method based on color features is characterized by comprising the following steps:

f: a set of video frames of a shot video clip;

R_R: AM in shot video clip_RA very poor result;

R_G: AM in shot video clip_GA very poor result;

R_B: AM in shot video clip_BA very poor result;

A_RGB: will be provided with

Arrays stacked in row order;

and (3) CBS: a set of candidate reference shot video clips;

S_FM: the CBS has shot video clips with the maximum video frame number;

shot video clip S_FMCorresponding to

Shot video clip S_FMCorresponding to

Shot video clip S_FMCorresponding to

And

percent error of (d);

and

percent error of (d);

and

percent error of (d);

TH_GS: a threshold for distinguishing match shots from non-match shots;

s': a match shot video clip set;

Step two: computing AM of all video frames_R,AM_G,AM_B

Step three: calculating R of all shot video clips_R,R_G,R_B；

And A_RGB

Step (ii) ofFifthly: to C_maxAll shot video clips are processed as follows: if R is_R,R_G,R_BAt least one of them is less than or equal to TH_BSThen store the shot video clip into CBS

Step six: find out

Step seven: computing all shot video clips

And performing the following operations on all shot video clips: if it is not

Are all less than or equal to TH_GSIf yes, storing the data in S';

taking values: TH_BS＝100,TH_GS＝15％。