CN111553165A

CN111553165A - Football player competition performance evaluation method based on emotion calculation

Info

Publication number: CN111553165A
Application number: CN202010178749.0A
Authority: CN
Inventors: 谢湘; 刘伟
Original assignee: Guizhou Hi Ball Big Data Industry Development Co ltd; Beijing Institute of Technology BIT
Current assignee: Guizhou Hi Ball Data Technology Co ltd; Beijing Institute of Technology BIT
Priority date: 2020-03-15
Filing date: 2020-03-15
Publication date: 2020-08-18
Anticipated expiration: 2040-03-15
Also published as: CN111553165B

Abstract

The invention relates to a football player competition performance evaluation method based on emotion calculation, and belongs to the technical field of artificial intelligence machine learning. The method comprises the steps of carrying out structuralization processing and quantification on expressions related to player performances in the football newspaper by utilizing an emotion calculation technology and a text information extraction technology, then combining the expressions with statistical data, and outputting player performance scores by utilizing a linear regression algorithm. On the basis of inputting player technical statistics, the method quantifies the quality of events by introducing the war newspaper text information and utilizing the emotion calculation technology, thereby simultaneously considering the quantity and the quality of technical items and being capable of judging the game performance of players more scientifically and reasonably.

Description

Football player competition performance evaluation method based on emotion calculation

Technical Field

The invention relates to a football player competition performance evaluation method based on emotion calculation, and belongs to the technical field of machine learning.

Background

Analyzing the performance of a player's game has important value for the entire soccer industry. For the team management layer, the team management layer can assist the team management layer to better complete the trading activities of the players by observing the competition performance of the players within a period of time; for a coach team, recent match performance of players can make a formation and a competition strategy provide suggestions for the coach team; for the player himself, the game performance score may also encourage a more strenuous training and game.

Currently, player performance assessment is mainly achieved in the following two ways. The method is based on subjective evaluation of experts and computer automatic scoring based on match statistical data. However, the former has high labor cost and certain subjective deviation among different experts; the latter is based entirely on data, losing information that many technical statistics cannot represent (e.g. a wonderful shot and a bad shot are both considered as a shot). Considering that after each game there are a large number of game reports, these reports are highly professional and contain a large amount of player performance information. Therefore, the text information related to the performance of the players in the war newspaper can be acquired, the emotion calculation technology is applied, the text information is extracted and quantified and is combined with the technical statistical information, and finally, more scientific and reasonable performance scores of the players are output.

Disclosure of Invention

The invention aims to solve the technical problem of how to analyze and evaluate the game performance of football players by machine learning, and creatively provides a game performance evaluation method for the football players based on emotion calculation.

The method has the innovation points that expressions related to player performance in the football newspaper are subjected to structured processing and quantization by utilizing an emotion calculation technology and a text information extraction technology, and then are combined with statistical data, and player performance scores are output by utilizing a linear regression algorithm. The method combines the statistical information of the players and the text information of the war newspaper, and can evaluate and judge the performance of the players more scientifically and reasonably. As shown in fig. 1.

In order to achieve the purpose, the technical method adopted by the invention is as follows:

a football player competition performance evaluation method based on emotion calculation comprises the following steps:

step 1: and training an emotion calculation model. As shown in fig. 2, the method comprises the following steps:

step 1.1: collecting a plurality of pieces of relevant sports news as a corpus, taking CBOW as a network, and training a word2vec model with 100-dimensional word vectors as output, namely, converting each Chinese vocabulary into 100-dimensional vectors.

Step 1.2: and acquiring a post-match report written by a football commentator for a plurality of football matches (such as all the matches in the China and super league 2018 season).

Step 1.3: the whole war paper is divided into several sentences.

Step 1.4: and (3) performing word segmentation on the sentence, converting each word into a 100-dimensional word vector by using the word2vec model trained in the step 1.1, and further converting the sentence into a vector sequence.

Step 1.5: the annotator is invited to classify the emotional tendency of each sentence into four categories: positive, neutral, negative, independent of player performance. Then, sentences that are neutral and irrelevant to player performance are culled.

Step 1.6: in order to enhance the robustness of the text emotion calculation model and solve the problems of small data volume and unbalanced positive and negative samples, positive and negative sentences reserved in the step 1.5 are subjected to data amplification by using a disorder method. That is, the sentence is first participled, and then the order of the words is randomly changed.

Step 1.7: and taking the word vector sequence corresponding to each sentence as input, taking the emotion score corresponding to the sentence as an output label, and training by using a long-time memory network (LSTM) to obtain an emotion calculation model.

The cross entropy is selected as a loss function, parameter updating is completed by using an Adam optimizer, and the training algebra is preferably 128 generations. The method comprises the following specific steps:

forget the door:

an input gate:

information of current state increase:

updated information of the current state:

an output gate:

wherein the output information of the current state is

c^<t>The unit state information at the time t is used for recording the information stored by the network until the time t;

information indicating an increase in state at time t; x is the number of^<t>Inputting information for the network at the time t; h represents the output value of the LSTM network; h is^<t>Is the output value of the LSTM network at the time t; h is^<t-1>Is the output value of the LSTM network at the time t-1; σ is an activation function, typically sigmoid or tanh. W_f、W_u、W_o、W_cIs a parameter matrix, b_f、b_u、b_o、b_cThe parameter vectors are obtained by training through a echelon descent method.

FIG. 3 shows the specific structure of LSTM, and FIG. 4 shows the complete structure of the emotion calculation model based on LSTM neural network.

Step 2: and extracting the text information. As shown in fig. 5, the method comprises the following steps:

step 2.1: the current war newspaper is divided into a plurality of sentences.

Step 2.2: after Chinese word segmentation and part-of-speech tagging are completed on a sentence, extraction and matching are completed on player names and event names (such as goal) in the sentence through a rule-based text extraction technology, and a binary group consisting of the player names and the event names is obtained.

Step 2.3: and (4) performing word segmentation and vectorization on the sentence, and converting the sentence into a word vector sequence by using the word2vec model trained in the step 1.4.

Step 2.4: inputting the word vector sequence into the emotion calculation model trained in the step 1.7, and outputting a corresponding emotion score: between-1 and 1, where-1 represents extremely negative and 1 represents extremely positive. And combining the two-tuple obtained in the step 2.2 to obtain a triad (such as Meixi-shooting-0.87) consisting of the name of the player-the event-the emotion score.

And step 3: training the player performance evaluation model and outputting a player performance score. As shown in fig. 6, the method comprises the following steps:

step 3.1: obtaining technical statistics of players of a plurality of football games (such as all games in 3 to 4 seasons of a certain tournament) and scoring of the players (such as scoring of an authoritative football data website) by a third party.

Step 3.2: players are divided into goalkeepers and non-goalkeepers. Wherein, the goalkeeper technical statistics item includes the necessary technical dimensions: time to live, miss results in goals, number of goals for a rescue, number of yellow cards, number of red cards, number of plays, number of passes, success rate of pass, and success rate of long pass. As shown in table 1, a goalkeeper technical statistics term is given that includes 22 technical dimensions.

TABLE 1 goalkeeper technical statistics terms

The technical statistics of the non-goalkeeper member include the following necessary technical dimensions: the number of the balls is the number of the hit, the number of the goals, the number of the oolong, the number of the attack aids, the number of the yellow cards, the number of the red cards, the number of the goal shots, the number of the righting shots, the number of the pass, the success rate of the pass, the number of the key pass, the success rate of the top competition, the number of the ball with the shot, the number of the offences, the number of the broken shots, the success rate of the breaking, the number of the defenses, the success rate of the center pass, the number of the. As shown in table 2, a non-goalkeeper technical statistics term is given that includes 36 technical dimensions.

TABLE 2 technical statistics of non-goalkeeper terms

A player score is a score between 0 and 10, and may be accurate to one decimal point (e.g., 7.1 points).

Step 3.3: dividing all the technical statistical data and scores of players in a scene into a training set, and dividing the technical statistical data and scores of players in a part of the scene (such as a certain season scene) into a test set.

Step 3.4: training linear regression models of goalkeeper and non-goalkeeper.

The specific formula is as follows:

linear regression equation: f (x) w₀+w₁x₁+w₂x₂+…+w_mx_m

Data set: d { (x)₁,y₁),(x₂,y₂),…，(x_i,y_i)，…，(x_n,y_n)}；

The cost function is:

wherein x is₁，x₂，…，x_mCounting the numerical values of all dimensions for player technology, wherein m is the dimension of the feature; w is a₁，w₂，…，w_mA weight value corresponding to each feature; w is a₀Is the intercept; n is the number of data sets; (x)_i,y_i) Scoring the technical statistics and performance corresponding to player i; j (W) is a cost function and represents the difference between the fitting result f (X) obtained by the linear regression equation and the true value Y. Obtaining each weight and intercept by minimizing the cost function, and finally obtainingAnd (5) linear regression model.

Step 3.5: matching the name of the player and the name of the event in the triad obtained in the step 1.7 with the name of the player and the name of the statistical item in the technical statistics, and adding the emotion score of the event of the player with the numerical value of the corresponding technical statistical item to obtain the technical statistics of the player combined with the text information (for example, the shooting number of the Meixi full match is 3, the triad obtained in the war about the Meixi shooting is Meixi-shooting-0.87, and the final Meixi shooting number is 3.87).

Step 3.6: and (4) respectively sending the players into the models trained in the step 3.4 according to the goalkeeper and the non-goalkeeper to obtain the final player performance scores.

Advantageous effects

Compared with the prior art, the method of the invention has the following advantages:

1. the existing method only depends on technical statistical information, can only reflect the number of certain events of players in a game, and cannot reflect the quality (for example, one wonderful pass and one bad pass are both marked as one pass). According to the method, on the basis of inputting player technical statistics, the quality of events is quantized by introducing the report text information and utilizing the emotion calculation technology, so that the quantity and the quality of technical items are considered simultaneously, and the match performance of players can be judged more scientifically and reasonably.

2. Based on the method, the performance of the players can be transversely and longitudinally compared by monitoring the competition performance of the players for a long time, and the team relation layer, the coach and the players can help the teams to obtain better results.

Drawings

FIG. 1 is a schematic diagram of a player performance assessment method based on emotion calculation.

FIG. 2 is a diagram of a emotion calculation model.

FIG. 3 is a diagram of a long term memory network (LSTM).

FIG. 4 is a complete block diagram of an emotion calculation model based on the LSTM neural network.

Fig. 5 is a schematic diagram of text information extraction.

Fig. 6 is a diagram of a player performance evaluation model.

Detailed Description

The invention is further illustrated and described in detail below with reference to the figures and examples.

Examples

The invention consists of three parts: the emotion calculation model part, the structure diagram of which is shown by FIG. 2; a text information extraction section, a flowchart of which is shown in fig. 5; the player performance assessment model section, a flow chart of which is shown in fig. 6.

Step 1: and training an emotion calculation model.

Step 1.1: and training a word vector model. The sports text in the THUCNews is used as a corpus, and 131604 texts are used in total. And (3) training a word2vec model with output of 100-dimensional word vectors by taking CBOW as a network, namely converting each Chinese vocabulary into 100-dimensional vectors.

Step 1.2: and acquiring a football war newspaper text.

The method comprises the step of crawling 480 post-match wars given by each match of the Xinlang sports and the fox searching sports in the middle-to-over 2018 match season. The whole war newspaper is divided into a plurality of sentences by taking commas, periods, exclamation marks and question marks as intervals.

Step 1.3: and marking the text emotion.

And inviting people who know the football to carry out emotion marking on the divided sentences. The emotional tendency of each sentence is divided into four categories: positive, neutral, negative, independent of player performance. Then, sentences that are neutral and irrelevant to player performance are culled. 6182 corpora with positive emotional tendency and 1078 corpora with negative emotional tendency are obtained finally.

Step 1.4: and performing data amplification.

In order to solve the problems of small data volume and unbalanced positive and negative samples, the data amplification is carried out on the reserved positive and negative corpora in a disorder mode. I.e. the sentence is participled first, and then the order of the words is randomly changed. 12146 pieces of data were obtained, of which 11146 pieces of data were used as a training set (positive: 5663; negative: 5483) and 1000 pieces of data were used as a test set (positive: 500; negative: 500). Wherein, the data in the test set is unordered corpus.

Step 1.5: and (5) text vectorization processing.

The names of players and the professional terms of the football are preset, and the preset words are guaranteed not to be segmented by mistake during word segmentation. And calling a jieba word segmentation component, performing word segmentation on the sentences, converting each word into a 100-dimensional word vector by using a trained word2vec model, and further converting the sentences into a vector sequence.

Step 1.6: and (5) training an emotion calculation model.

And taking a word vector sequence corresponding to each sentence as input, taking the emotion score corresponding to the sentence as an output label, and training by using a long-time memory network (LSTM) to obtain an emotion calculation model. Wherein, the cross entropy is selected as a loss function, parameter updating is completed by using an Adam optimizer, and the training algebra is 128 generations. FIG. 4 shows the complete structure of the emotion calculation model based on the LSTM neural network. Fig. 3 shows the structure of LSTM, which is calculated as follows:

forget the door:

an input gate:

information of current state increase:

updated information of the current state:

an output gate:

wherein the output information of the current state is

Step 2: and extracting text information.

Step 2.1: and obtaining and segmenting texts.

And acquiring the report of the current match, and dividing the report of the current match into sentences.

Step 2.2: and extracting the text relation.

And obtaining a two-element group consisting of the player name and the event name through a rule-based text information extraction technology. Specifically, when the game is implemented, the player lists of the two parties of the current game and preset standard event names (such as goal, pass, and the like) are loaded firstly. And then, realizing word segmentation and part-of-speech tagging by utilizing a jieba word segmentation component. And then extracting and matching the player names and the event names through a rule-based text information extraction technology, and finally extracting a two-tuple consisting of the player names and the event names (such as Meixi-shooting).

Step 2.3: and calculating emotion scores.

After Chinese word segmentation and part-of-speech tagging are completed on a sentence by using the jieba word segmentation component, the word is sent into a trained word2vec model and converted into a 100-dimensional word vector. And (3) inputting the word vector sequence into the emotion calculation model trained in the step 1.6, and outputting corresponding emotion scores (-1 to 1, -1 represents extreme negative, and 1 represents extreme positive). And combining with the binary group obtained by text relation extraction to obtain a triple (such as Gao Lin-goal-0.87) consisting of the name of the player, the event and the emotion score.

step 3.1: and acquiring data.

And (3) acquiring technical statistical data of players of each match from the middle-super 2016 to the 2019 season and scores of the players given by an authoritative football data website whoscored.com by using a crawler technology. The players are divided into goalkeeper and non-goalkeeper on the scene. The goalkeeper's technical statistics include: 22 technical dimensions such as time to live, number of putting-out, number of successful putting-out and number of passing balls; non-goalkeeper technician technical statistics include: the time of getting on the scene, the number of shooting gates, the number of shooting corrections, the number of key passing balls, the number of snap-ins and other 36 technical dimensions. The player scores are a score between 0 and 10, to the nearest decimal point, e.g. 7.1.

Step 3.2: dividing the technical statistical data and scores of players from 2016 to 2018 into a training set, and dividing the technical statistical data and scores of players from 2019 into a testing set.

Step 3.3: goalkeeper and non-goalkeeper linear regression models were trained.

The specific formula is as follows:

data set: d { (x)₁,y₁),(x₂,y₂),…，(x_i,y_i)，…，(x_n,y_n)}；

The cost function is:

wherein x is₁，x₂，…，x_mCounting the numerical values of all dimensions for player technology, wherein m is the dimension of the feature; w is a₁，w₂，…，w_mA weight value corresponding to each feature; w is a₀Is the intercept; n is the number of data sets; (x)_i,y_i) Scoring the technical statistics and performance corresponding to player i; j (W) is a cost function and represents a fitting result f (X) obtained by a linear regression equation) The difference from the true value Y. And obtaining each weight and intercept by minimizing the cost function, and finally obtaining a linear regression model.

Step 3.4: and correcting technical statistical data.

And matching the name of the player and the name of the event in the triad obtained in the step 2.3 with the name of the player and the name of the statistical item in the technical statistics, and adding the emotion score of the event of the player with the numerical value of the corresponding technical statistical item to obtain the technical statistics of the player combined with the text information (for example, Gao forest goal number of the whole game is 3, a triad obtained in a war newspaper and related to Gao forest goal is Gao forest goal-0.87, and the final Gao forest goal number is 3.87).

Step 3.5: and outputting the player performance score.

The players are classified into goalkeepers and non-goalkeepers according to the player categories. And (4) sending the corrected technical statistical data into the trained model corresponding to the step 3.3 to obtain the final player performance score.

Claims

1. A football player competition performance evaluation method based on emotion calculation is characterized by comprising the following steps:

firstly, carrying out structural processing on expressions related to player performances in a football newspaper by utilizing an emotion calculation technology;

then, the expression related to the player performance in the football newspaper is subjected to quantitative processing by utilizing a text information extraction technology;

and finally, combining the quantitative result with the statistical data, and outputting the player performance score by using a linear regression algorithm.

2. The method as claimed in claim 1, wherein the expression of football player's performance in the football newspaper is structured by using a training emotion calculation model.

3. The method for evaluating the game performance of football players based on emotion calculation as claimed in claim 2, wherein the training of the emotion calculation model comprises the following steps:

step 1.1: collecting a plurality of pieces of relevant sports news as a corpus, training a word2vec model with 100-dimensional word vectors as output by taking CBOW as a network, namely converting each Chinese vocabulary into 100-dimensional vectors;

step 1.2: obtaining a post-match report written by a football commentator in a plurality of football matches;

step 1.3: dividing the whole war into a plurality of sentences;

step 1.4: performing word segmentation on the sentence, converting each word into a 100-dimensional word vector by using the word2vec model trained in the step 1.1, and further converting the sentence into a vector sequence;

step 1.5: the annotator is invited to classify the emotional tendency of each sentence into four categories: positive, neutral, negative, independent of player performance; then, eliminating sentences which are neutral and irrelevant to player performance;

step 1.6: using a disorder method to amplify the data of the positive and negative sentences reserved in the step 1.5, namely, segmenting the sentences first and then randomly changing the sequence of the words;

step 1.7: and taking the word vector sequence corresponding to each sentence as input, taking the emotion score corresponding to the sentence as an output label, and training by using a long-time memory network LSTM to obtain an emotion calculation model.

4. The evaluation method for the game performance of the football players based on the emotion calculation as claimed in claim 1, wherein the method for performing the quantization processing by using the text information extraction technology comprises the following steps:

step 2.1: dividing the current war newspaper into a plurality of sentences;

step 2.2: after Chinese word segmentation and part-of-speech tagging are completed on a sentence, extraction and matching are completed on player names and event names in the sentence through a rule-based text extraction technology, and a binary group consisting of the player names and the event names is obtained;

step 2.3: performing word segmentation and vectorization on the sentence, and converting the sentence into a word vector sequence by using the word2vec model trained in the step 1.4;

step 2.4: and (3) inputting the word vector sequence into the trained emotion calculation model, outputting a corresponding emotion score, and combining the emotion score with the binary group obtained in the step 2.2 to obtain a triple consisting of the name of the player, the event and the emotion score.

5. A method as claimed in claim 4, wherein step 2.4 outputs the corresponding sentiment score as: between-1 and 1, where-1 represents extremely negative and 1 represents extremely positive.

6. The evaluation method for football player game performance based on emotion calculation as claimed in claim 1, wherein the method for outputting player performance score by using linear regression algorithm is:

training a player performance evaluation model, and outputting a player performance score:

step 3.1: acquiring technical statistical data of players in a plurality of football matches and scores of the players given by a third party;

step 3.2: dividing the players into goalkeepers and non-goalkeepers, and respectively counting necessary technical statistical items; a player score of between 0 and 10;

step 3.3: dividing all the technical statistical data and scores of players in the session into a training set, and dividing part of the technical statistical data and scores of players in the session into a test set;

step 3.4: training linear regression models of goalkeepers and non-goalkeepers as follows:

linear regression equation: f (x) w₀+w₁x₁+w₂x₂+…+w_mx_m

Data set: d { (x)₁,y₁),(x₂,y₂),…，(x_i,y_i)，…，(x_n,y_n)}

The cost function is:

wherein x is₁，x₂，…，x_mCounting the numerical values of all dimensions for player technology, wherein m is the dimension of the feature; w is a₁，w₂，…，w_mA weight value corresponding to each feature; w is a₀Is the intercept; n is the number of data sets; (x)_i,y_i) Scoring the technical statistics and performance corresponding to player i; j (W) is a cost function and represents the difference between a fitting result f (X) obtained by a linear regression equation and a true value Y; obtaining each weight and intercept through a minimum cost function, and finally obtaining a linear regression model;

step 3.5: matching the name of a player and the name of an event in the triad with the name of the player and the name of a statistical item in technical statistics, and adding the emotion score of the event of the player and the numerical value of the corresponding technical statistical item to obtain the technical statistics of the player combined with text information and the triad about shooting of the player obtained in the war;

7. A method for evaluating the game performance of football players based on emotion calculation as claimed in claim 3, wherein said method for obtaining emotion calculation model by using long and short time memory network LSTM training in step 1.7 is:

selecting cross entropy as a loss function, and completing parameter updating by using an Adam optimizer, wherein the method comprises the following steps:

forget the door:

an input gate:

information of current state increase:

updated information of the current state:

an output gate:

wherein the output information of the current state is

information indicating an increase in state at time t; x is the number of^<t>Inputting information for the network at the time t; h represents the output value of the LSTM network; h is^<t>Is the output value of the LSTM network at the time t; h is^<t-1>Is the output value of the LSTM network at the time t-1; σ is an activation function, typically sigmoid or tanh; w_f、W_u、W_o、W_cIs a parameter matrix, b_f、b_u、b_o、b_cThe parameter vectors are obtained by training through a echelon descent method.

8. The method of claim 6, wherein the goalkeeper skill statistics include necessary technical dimensions including time to live, number of missed goals, number of goals at a point of fire, number of yellow cards, number of red cards, number of firings, number of passes, success rate of pass, and success rate of long pass; the technical statistics of the non-goalkeeper includes necessary technical dimensions including time to live, number of goals, number of oolong, number of attack aids, number of yellow tiles, number of red tiles, number of goal shots, number of rightings, number of pass shots, success rate of pass shots, number of key pass shots, success rate of top fight, number of balls with pass, number of rules broken, number of balls broken, success rate of break, number of encirclements, success rate of pass, number of foul passes, and success rate of long pass.

9. A method as claimed in claim 6, wherein the player scores the score to the point one decimal after the player score in step 3.2.

10. The method as claimed in claim 7, wherein the training algebra is 128 generations when parameter updating is performed by Adam optimizer.