CN112569608B

CN112569608B - Table game hybrid recommendation method based on multi-source heterogeneous data

Info

Publication number: CN112569608B
Application number: CN202011531148.XA
Authority: CN
Inventors: 李绍利; 杨传颖; 石宝; 雷小涵; 李亚龙; 王成龙
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-03-25
Anticipated expiration: 2040-12-22
Also published as: CN112569608A

Abstract

The invention provides a table game hybrid recommendation algorithm based on multi-source heterogeneous data, aiming at solving the problem that the game characteristics and the user characteristics are not expressed sufficiently due to the sparse scoring matrix. The invention provides improvement on the basis of the traditional probability matrix decomposition algorithm (PMF), and the improvement points comprise: 1. the representation of the game feature vector is enhanced using game description text and game attribute information, 2. the representation of the user features is enhanced using the features of the user's scored game in combination with an attention mechanism. Experiments show that compared with a baseline model, the model provided by the invention has smaller score prediction error.

Description

Table game hybrid recommendation method based on multi-source heterogeneous data

Technical Field

The invention belongs to the technical field of big data and artificial intelligence, and particularly relates to a table game hybrid recommendation method based on multi-source heterogeneous data.

Background

The interest in board games has gradually begun to revive in the beginning of the 21 st century, the share of board games in the entertainment market has increased day by day, and this trend is global. The interest in table games is newer than watching movies, and there are currently fewer students researching the application recommended by table games. With the increasing number of table game players and table game players, the generated user interaction information and table game description information are more and more, and a good table game recommendation algorithm is needed to better meet the needs of people.

At present, various recommended applications are unclear, but the recommendation research of the board game is rare. Amazon and Barnes & Noble provide table game recommendation methods, but their recommendation methods are based on user purchase patterns and collaborative filtering from merchandise to merchandise, and do not involve game features to tailor the user's preferences, and do not achieve personalized recommendations. Yiu-Kai Ng et al propose the PeGRec model, which takes into account the characteristics of the game when recommending a new game to a user. The features comprise types, complexity, game time and the like, are integrated, and finally determine a recommendation list, so that the personalized recommendation of the user is realized, the cold start problem is solved to a certain extent, and the user satisfaction is improved. But requires the user to create a user profile and provide a favorite designated game by himself and does not achieve a fully intelligent recommendation. Jan Zalewski et al cluster the board games, calculate the user's likeability for each category, and recommend games to the user using a collaborative filtering idea based on the user. The method is suitable for occasions with few users, but in practical scenes, the users are many, and the cost of calculating the similarity matrix of the users is high. Michaelon et al propose a board game recommendation system that functions as a user entering a set of games they like and the system makes recommendations. The article provides a collaborative filtering-based and content-based hybrid recommendation method, and recommendation accuracy, diversity and novelty are improved. However, this approach requires the user to enter a set of games they like, on the one hand, for novice users who do not know what games they like, and on the other hand, does not utilize machine learning methods to analyze user preferences from user scored game assistance information.

Along with table game and the number of table game users is more and more, the data sparsity problem is more and more serious, traditional matrix decomposition algorithm only uses score data to recommend, when the score matrix is sparse, the performance of the score matrix is limited, and the traditional matrix decomposition algorithm does not consider: 1. enhancement of game feature expression by game multi-source heterogeneous auxiliary information. 2. User scored game features enhance user feature expression.

Disclosure of Invention

In order to overcome the defects of the prior art and further improve the effect of table game recommendation, the invention aims to provide a table game hybrid recommendation method based on multi-source heterogeneous data, which combines multi-source heterogeneous auxiliary information to solve the problem of inaccurate game characteristic and user characteristic representation caused by sparse scoring data, so that games which are more satisfactory can be recommended for users.

In order to achieve the purpose, the invention adopts the technical scheme that:

1. setting the conditional distribution p of the scoring matrix R as:

wherein R represents a scoring matrix, and R ═ R_ij}，R_ijScoring a game j for a user i, i being a user number, i being 1,2,3.. L, L being a total number of users, j being a game number, j being 1,2,3.. M, M being a total number of games, U being a user feature matrix, U being { U }_i}，U_iA feature vector representing a user i, V is a game feature matrix, and V is { V ═ V }_j}，V_jFeature vector, alpha, representing game j²Is R_ijThe variance of (a) is determined,

represents R_ijObey mean value of U_i ^TV_jVariance is α²With T as a transposed symbol, I_ijIs a matrix, if user I scores game j, then I_ijIs 1, otherwise is 0;

2. using convolutional neural networks to describe text from a board gameIn the method, the characteristic vector of the table game is extracted, and the formal expression is as follows: CNN (X)_j,W_cnn) Wherein X is_jIs a description text of game j, W_cnnIs a weight parameter inside the convolutional neural network;

3. extracting the table game characteristic vector from the attribute information of the table game by using a multilayer perceptron, and formally expressing the table game characteristic vector as MLP (Y)_j,W_mlp) Wherein Y is_jIs attribute information of game j, W_mlpRepresenting weight parameters inside the multi-layer perceptron network;

4. mixing CNN (X)_j,W_cnn) And MLP (Y)_j,W_mlp) Fusion was performed, denoted by CM as follows: CM (X)_j,Y_j,W_cnn,W_mlp,W_cm)＝relu(W_cm*concatenate(CNN(X_j,W_cnn),MLP(Y_j,W_mlp)))

Wherein concatenate is a splicing function, relu is an activation function, W_cmIs a fusion layer weight parameter;

5. feature vector V of game j_jIs defined as: v_j＝CM(X_j,Y_j,W^V)+ξ_j，W^V＝{W_cnn,W_mlp,W_cm}, setting W^VThe condition distribution of

I.e. for W^VEach parameter w in_kMean of 0 and variance of

k is W^VNumber of medium-weight parameter, k ═ 1,2,3. | w_k|，|w_kL is W^VNumber of middle weight parameters, ξ_jXi characteristics learned from the scoring matrix_jObedience mean 0, variance

Of a Gaussian spherical distribution, i.e.

I is an identity matrix;

6. the condition distribution of the game feature matrix is as follows:

wherein X is a game description text set, Y is a game attribute information set,

is a V_jThe variance of each of the components in the signal,

represents V_jObey mean value CM (X)_j,Y_j,W^V) Variance is

(ii) a gaussian spherical distribution;

7. features of a user's scored game, in combination with an attention mechanism, are used to enhance the user's feature representation, denoted UA,

wherein

a represents the use of a self-attention mechanism

Information is summarized to obtain an attention vector,

a matrix of feature vectors, W, for user i's scored games_a1And W_a2For the weight parameter, softmax and tanh are activation functions;

8. the feature vector of user i is defined as:

wherein, W^UWeight parameter, W, for the previous step in extracting user features^U＝{W_a1,W_a2}, setting W^UThe condition distribution of

I.e. for W^UEach parameter w in_tMean of 0 and variance of

t is W^USequence number of medium weight parameter, t ═ 1,2,3. | w_t|，|w_tL is W^UNumber of middle weight parameters, δ_iRepresenting features of the user learned from the scoring matrix, δ_iObedience mean 0 and variance

Of a Gaussian spherical distribution, i.e.

9. The condition distribution of the user characteristic matrix is as follows:

wherein V^UA matrix set formed by the feature vectors of the scored games of each user;

is U_iThe variance of each of the components in (a),

represents U_iObey mean value of

Variance of

(ii) a gaussian spherical distribution;

10. and (3) estimating variables and parameters in the optimization model by using the maximum posterior, wherein the formula is as follows:

11. by taking the negative logarithm, the above equation is expressed as:

wherein the content of the first and second substances,

12. updating V by coordinate descent method_j:

V_j←(UI_jU^T+λ_UI)^-1(UR_j+λ_VCM(X_j,Y_j,W^V))

Wherein, I_jIs a diagonal matrix whose diagonal elements are I_ijMiddle j column element, R_jIs a vector of value R_ijThe value of the element in the j-th column;

13. updating U by coordinate descent method_i

Wherein, I_iIs a diagonal matrix whose diagonal elements are I_ijElement of row i of (1), R_iIs a vector of value R_ijThe ith row element value;

14. according to epsilon (W)^V) Updating W using a back propagation algorithm^V：

15. According to epsilon (W)^U) Updating W using a back propagation algorithm^U：

16. Circularly executing the variables and parameters in the optimization model in the steps of 12-15 until convergence;

17. using optimized U, V, W^U、W^VPredicting the unknown score of the user on the game:

18. to evaluate the performance of the models herein, Root Mean Square Error (RMSE) was used as an evaluation criterion, with lower RMSE values representing better results. RMSE is defined as:

s is the score in the test set.

The board game attribute information includes: a minimum number of players, a maximum number of players, a mean time of play, a category of play, and a game play scheme. Dividing the table game attribute information into category information and numerical information, wherein the category information comprises: game category and game mechanism, the numerical information is: the minimum number of people in the game, the maximum number of people in the game and the average time of the game. And for category information, dimension reduction is carried out on the category information through embedding, Min-Max standardization is carried out on numerical value information, and the processed attribute information is spliced and then a multi-layer perceptron is used for extracting the table game feature vector from the attribute information.

Compared with the prior art, the invention has the beneficial effects that:

1. the problem of inaccurate game feature expression due to sparse scoring matrix is solved.

2. The problem of inaccurate user feature expression due to sparse scoring matrix is solved.

Drawings

FIG. 1 is a schematic diagram of a model of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

The invention integrates the multi-source heterogeneous auxiliary information into the traditional PMF algorithm. And integrating the feature vectors extracted from the table game description text and the table game attribute information into the feature representation of the game end. Because the scoring matrix is sparse, the expression of the user characteristics is also inaccurate, and the invention enhances the expression of the user characteristics by combining the characteristics of the scored game of the user with the attention mechanism. The overall frame is shown in fig. 1.

The specific steps of one embodiment of the invention are as follows:

1. the data set adopted by the embodiment is from https:// boardgamegeek.com website which is the largest table game website in the world, and scores of 9021 games of 142308 users are crawled by using a crawler technology, and are shown in table 1. The game auxiliary information comprises game description text and game attribute information, wherein the attribute information comprises: the minimum number of people in the game, the maximum number of people in the game, the average time of the game, the game category and the game mechanism.

TABLE 1

Number of users	Number of table games	Score of	Degree of sparseness
				142308	9021	854848	99.93％

2. The text data is preprocessed as follows:

1) setting the maximum length of the game description text to be 200;

2) deleting the stop word;

3) selecting words with frequency more than 5 to construct a vocabulary list;

4) the game description is represented as a word index vector.

3. Setting the conditional distribution p of the scoring matrix R as:

4. extracting a table game feature vector from the table game description text by using a Convolutional Neural Network (CNN), and formally expressing the table game feature vector as follows: CNN (X)_j,W_cnn) Wherein X is_jIs a description text of game j, W_cnnIs a weight parameter inside the CNN network. The CNN network consists of an embedded layer, a convolutional layer, a pooling layer and a full-connection layer 4, and the experimental settings are as follows:

1) the dimension of the word vector is 300, and the word vector is trained through an optimization process;

2) in the convolutional layer, 64 filters were used, each filter having a window size set to 3, 4, 5;

3) the output dimension is set to 200.

5. The game attribute information can be divided into two types, namely category information and numerical information, wherein the category information comprises: game category and game mechanism, the numerical information is: the minimum number of people in the game, the maximum number of people in the game and the average time of the game. For the category information, dimension reduction is performed through embedding, because high-dimensional and sparse vectors influence the performance of the model. Min-Max standardization is carried out on numerical value information, the processed attribute information is spliced, a multi-layer perceptron (MLP) is used for extracting a table game feature vector from the table game attribute information, and the table game feature vector is expressed as MLP (Y) in a formalized mode_j,W_mlp) Wherein Y is_jIs attribute information of game j, W_mlpRepresenting the weight parameters inside the MLP network. In this embodiment, the MLP network includes two hidden layers, and the output dimension is set to 50.

6. Fusing the feature vectors extracted from the game description text and the attribute information, and expressing the feature vectors by using CM: CM (X)_j,Y_j,W_cnn,W_mlp,W_cm)＝relu(W_cm*concatenate(CNN(X_j,W_cnn),MLP(Y_j,W_mlp) Conate is the splicing function and relu is the activation function. W_cmIs a fusion layer weight parameter. The output dimension is set to 50.

7. Feature vector V of game j_jIs defined as: v_j＝CM(X_j,Y_j,W^V)+ξ_jFor convenience of illustration, this embodiment uses W^VA weight parameter indicating the above-mentioned extraction of the game feature vector from the plurality of auxiliary data,

W^V＝{W_cnn,W_mlp,W_cm}, setting W^VThe condition distribution of

I.e. for W^VEach parameter w in_kMean of 0 and variance of

Of a Gaussian spherical distribution, i.e.

I is an identity matrix, ξ_jHas a dimension of 50.

8. Thus, the condition distribution of the game feature matrix is:

is a V_jThe variance of each of the components in the signal,

represents V_jObey mean value CM (X)_j,Y_j,W^V) Variance is

Gaussian spherical distribution.

9. Since the scoring data is sparse, the expression of the user features is inaccurate, and the game played by the user reflects the interest of the user, however, the contribution of different games to the user features is different, so the embodiment uses the features of the scored game of the user in combination with the attention mechanism to enhance the user features, which are denoted as UA,

wherein

a represents the use of a self-attention mechanism

Information is summarized to obtain an attention vector,

a matrix of feature vectors, W, for user i's scored games_a1And W_a2For the weight parameters, softmax and tanh are activation functions, and the output dimension of UA is set to 50.

10. The feature vector of user i is defined as:

I.e. for W^UEach parameter w in_tMean of 0 and variance of

Of a Gaussian spherical distribution, i.e.

δ_iHas a dimension of 50.

11. The condition distribution of the user characteristic matrix is as follows:

is U_iThe variance of each of the components in (a),

represents U_iObey mean value of

Variance of

(ii) a gaussian spherical distribution;

12. to optimize variables, parameters in the model, maximum a posteriori estimation (MAP) is used. The formula is as follows:

13. by taking the negative logarithm, the above equation is expressed as:

wherein the content of the first and second substances,

14 updating by coordinate descent

Wherein, I_jIs a diagonal matrix whose diagonal elements are I_ijMiddle j column element, R_jIs a vector of value R_ijThe value of the element in the j-th column.

15. Updating by coordinate descent

I_iIs a diagonal matrix whose diagonal elements are I_ijElement of row i of (1), R_iIs a vector of value R_ijRow i element value.

16. According to epsilon (W)^V) Updating W using a back propagation algorithm^V：

17. According to epsilon (W)^U) Updating W using a back propagation algorithm^U：

18. And circularly executing the variables and the parameters in the optimization model in the steps 14-17 until convergence.

19. Using optimized U, V, W^U、W^VPredicting the unknown score of the user on the game:

20. in order to measure the score prediction effect of the model of this embodiment, the embodiment uses Root Mean Square Error (RMSE) to verify the difference between the predicted score and the actual score obtained by the model of this embodiment and the baseline model. RMSE is defined as:

s is the score in the test set.

The results of the experiment are shown in table 2.

TABLE 2

Algorithm	RMSE
		PMF	1.617
CDL	1.605
		ConvMF	1.590
Model of the invention	1.381

It can be seen from table 2 that the model of the present invention is superior to the PMF model, the CDL model and the Convmf model, the PMF model only uses the scoring matrix for scoring prediction, the effect is the worst, the CDL model uses the SDAE to process the text information and fuses in the PMF, the scoring prediction effect is superior to the PMF model, the Convmf model uses the CNN network to process the text information and fuses in the PMF, and the error of scoring prediction is further reduced. However, no auxiliary information is added to the user terminal in the above baseline method, so that the score prediction error is further reduced. In the method for referencing Convmf, CNN is used for processing text information, MLP network processing attribute information is added, the text information and the MLP network processing attribute information are integrated and then are fused into a PMF model to enhance game feature representation, and user feature representation is enhanced by combining user scored game features and an attention mechanism, so that the error of scoring prediction is further reduced.

21. To verify the parameter lambda in the model herein_UAnd λ_VInfluence on score prediction result by setting different lambda_UAnd λ_VThe results of the experiments are shown in Table 3

TABLE 3

λ_U	λ_V	RMSE
			90	10	1.43616
50	50	1.40019
			10	90	1.38192
10	100	1.38281
			20	90	1.38094

Experiments have shown that the result of the score prediction is subject to λ_UAnd λ_VOf suitable lambda_UAnd λ_VThe error of the score prediction can be further reduced, and as can be seen from the above table, when lambda is used_UAnd λ_VThe score prediction error is minimal when the values of (a) and (b) are 20 and 90, respectively.

Claims

1. A board game hybrid recommendation method based on multi-source heterogeneous data is characterized by comprising the following steps:

step 1, setting the conditional distribution p of a scoring matrix R as:

step 2, extracting the table game feature vector from the table game description text by using a convolutional neural network, wherein the table game feature vector is represented in a formalized way as follows: CNN (X)_j,W_cnn) Wherein X is_jIs a description text of game j, W_cnnIs a weight parameter inside the convolutional neural network;

and 3, extracting the table game characteristic vector from the table game attribute information by using a multilayer perceptron, and formally expressing the table game characteristic vector as MLP (Y)_j,W_mlp) Wherein Y is_jIs attribute information of game j, W_mlpRepresenting weight parameters inside the multi-layer perceptron network;

step 4, CNN (X)_j,W_cnn) And MLP (Y)_j,W_mlp) Fusion was performed, denoted by CM as follows:

CM(X_j,Y_j,W_cnn,W_mlp,W_cm)＝relu(W_cm*concatenate(CNN(X_j,W_cnn),MLP(Y_j,W_mlp)))

step 5, the feature vector V of the game j_jIs defined as: v_j＝CM(X_j,Y_j,W^V)+ξ_j，W^V＝{W_cnn,W_mlp,W_cm}, setting W^VThe condition distribution of

I.e. for W^VEach parameter w in_kMean of 0 and variance of

Of a Gaussian spherical distribution, i.e.

I is an identity matrix;

step 6, the condition distribution of the game feature matrix is as follows:

is a V_jThe variance of each of the components in the signal,

represents V_jObey mean value CM (X)_j,Y_j,W^V) Variance is

(ii) a gaussian spherical distribution;

step 7, using the features of the user's scored game in combination with an attention mechanism to enhance the user feature representation, denoted UA,

wherein

a represents the use of a self-attention mechanism

Information is summarized to obtain an attention vector,

step 8, defining the feature vector of the user i as:

I.e. for W^UEach parameter w in_tMean of 0 and variance of

Of a Gaussian spherical distribution, i.e.

Step 9, the condition distribution of the user feature matrix is as follows:

is U_iThe variance of each of the components in (a),

represents U_iObey mean value of

Variance of

(ii) a gaussian spherical distribution;

step 10, using maximum posteriori to estimate variables and parameters in the optimization model, wherein the formula is as follows:

step 11, by taking the negative logarithm, the above formula is expressed as:

wherein the content of the first and second substances,

step 12, updating V by adopting a coordinate descent method_j:

V_j←(UI_jU^T+λ_UI)^-1(UR_j+λ_VCM(X_j,Y_j,W^V))

step 13, updating U by adopting a coordinate descent method_i

step 14, according to epsilon (W)^V) Updating W using a back propagation algorithm^V：

Step 15, according to epsilon (W)^U) Updating W using a back propagation algorithm^U：

Step 16, circularly executing the variables and parameters in the optimization model in the steps 12-15 until convergence;

step 17, using the optimized U, V, W^U、W^VPredicting the unknown score of the user on the game:

and step 18, in order to evaluate the performance of the model, the Root Mean Square Error (RMSE) is used as an evaluation standard, and the lower the root mean square error value is, the better the representation effect is.

2. The multi-source heterogeneous data-based board game hybrid recommendation method according to claim 1, wherein the board game attribute information comprises: a minimum number of players, a maximum number of players, a mean time of play, a category of play, and a game play scheme.

3. The multi-source heterogeneous data-based board game hybrid recommendation method according to claim 1, wherein the board game attribute information is divided into category information and numerical information, and the category information includes: game category and game mechanism, the numerical information is: the minimum number of people playing, the maximum number of people playing and the average time of playing; and for category information, dimension reduction is carried out on the category information through embedding, Min-Max standardization is carried out on numerical value information, and the processed attribute information is spliced and then a multi-layer perceptron is used for extracting the table game feature vector from the attribute information.

4. The multi-source heterogeneous data-based board game hybrid recommendation method according to claim 1, wherein the root mean square error is defined as:

s is the score in the test set.