CN113554491A

CN113554491A - Mobile application recommendation method based on feature importance and bilinear feature interaction

Info

Publication number: CN113554491A
Application number: CN202110853887.9A
Authority: CN
Inventors: 曹步清; 彭咪; 陈俊杰
Original assignee: Hunan University of Science and Technology
Current assignee: Hunan University of Science and Technology
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-10-26
Anticipated expiration: 2041-07-28
Also published as: CN113554491B

Abstract

The invention discloses a mobile application recommendation method based on feature importance and bilinear feature interaction. The SENET layer converts the embedded layer into embedded features similar to SENET, so that the resolvability of the features is improved. And then the bilinear interaction layer respectively carries out second-order feature interaction modeling on the original embedding and the SEENE-like embedding. The connection layer then integrates the output connections of the bilinear interaction layer. And finally, pouring the features after the cross combination into a neural network, and outputting a prediction score at a prediction layer. The invention belongs to the technical field of mobile application, and particularly relates to a mobile application recommendation method based on feature importance and bilinear feature interaction.

Description

Mobile application recommendation method based on feature importance and bilinear feature interaction

Technical Field

The invention belongs to the technical field of mobile application, and particularly relates to a mobile application recommendation method based on feature importance and bilinear feature interaction.

Background

With the rapid development of mobile applications in large mobile application stores, it becomes a challenge for users to select their desired mobile applications. Therefore, there is a need to provide a high quality mobile application recommendation mechanism to meet the user's expectations. Although the existing method has a remarkable effect on mobile application recommendation, the recommendation accuracy rate can be further improved. Rather, they focus primarily on how to better interact between the functions of mobile applications, ignoring the importance or weighting of the functions themselves.

Disclosure of Invention

In order to solve the problems, the invention provides a mobile application recommendation method based on feature importance and bilinear feature interaction by combining an inner product and a Hadamard product based on a compressed excitation network mechanism and a bilinear function.

The technical scheme adopted by the invention is as follows: the mobile application recommendation method based on feature importance and bilinear feature interaction comprises the following steps:

1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:

E＝[e₁，e₂，...，e_i，...，e_f] (1)

wherein f represents the number of fields, e_i∈R^kRepresenting the characterization of the ith domain, wherein the size of the vector is k;

2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;

3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector p_ijThe calculation can be done in three ways:

all domains

p_ij＝v_i·W⊙v_j (5)

W∈R^k×^kAll vectors (v)_i，v_j) Sharing a W; the additional parameter quantity in the bilinear interaction layer is k x k;

single domain

p_ij＝v_i·W_i⊙v_j (6)

W∈R^k×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer;

interaction between domains

p_ij＝v_i·W_ij⊙v_j (7)

W∈R^k×kCorresponding to the weight matrix between the ith field and the jth field, the additional size is

The parameter amount of (a);

the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E₁，...，p_i，...，p_n]And outputting the interaction vector q ═ q from the SENET-like embedding V₁，...，q_i，...，q_n](ii) a Wherein the vector p_i∈R^kq_i∈R^k；

4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:

C＝[p₁，...，p_n，q₁，...，q_n]＝[c₁，...，c_2n] (8)

5) deep network: will be provided withCombining the DNN component of the deep neural network with the shallow model to form a deep model; the input is an output vector C of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a⁽⁰⁾＝[c₁，...，c_2n]Representing an initial input, will a⁽⁰⁾And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:

a^(l)＝σ(W^la^(l-1)+b^l) (9)

a^(l)σ is a function of sigmoid, W, for the output of the l-th layer of the deep network^lWeight matrix representing the model, b^lRepresents the offset of the model;

after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:

y_d＝σ(W^La^(L+1)+b^L+1) (10)

where L is the number of layers of the depth model.

6) Prediction layer: the output expression of the model prediction layer is as follows:

wherein

As a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is a_iIs the ith weight of the linear part;

the entire training process aims to minimize the following objective function:

wherein y is_iIs the true label for the ith sample,

a prediction tag corresponding to the ith sample; n is the total number of samples for the mobile application.

Further, the specific steps of feature compression in the step 2) are as follows: compressing original embedding E to a statistical vector Z ═ Z using an average pooling operation₁，...，z_i，...，z_f]In z_iCan be calculated by the following formula:

z_iglobal information about the ith feature representation is represented, and k represents the embedded dimension size.

Further, the specific steps of excitation in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector Z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:

A＝σ₂(W₂σ₁(W₁Z)) (3)

wherein A ∈ R^fAs a vector, σ₁And σ₂Is an activation function.

Further, the weighting in step 2) specifically comprises the following steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)₁，...，v_f}; the embedding V of SENET can be calculated as follows:

V＝[a₁·e₁，...，a_f·e_f]＝[v₁，...，v_f] (4)

the invention adopts the structure to obtain the following beneficial effects: the invention provides a mobile application recommendation method based on feature importance and bilinear feature interaction. Then, combining the classical deep neural network component with the shallow model, injecting the cross combination characteristics into the deep model, and predicting the preference of the user on different mobile applications. The method is evaluated by using a Kaggle real data set, and experimental results show that the method can obtain the optimal AUC and Logloss under most conditions. The method can effectively improve the recommendation accuracy of the mobile application.

Drawings

FIG. 1 is a model architecture diagram of a mobile application recommendation method based on feature importance and bilinear feature interaction in accordance with the present invention;

FIG. 2 is a SENET layer structure diagram of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;

FIG. 3 is a diagram of a bilinear interaction layer structure of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;

FIG. 4 is a distribution detail table of the top 20 categories with the largest number in the experiment of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;

FIG. 5 is a table of performance comparisons of training sets of different proportions in an experiment of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;

FIG. 6 is a comparison graph of AUC of different models of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;

FIG. 7 is a Logloss comparison of different models of the mobile application recommendation method based on feature importance and bilinear feature interaction of the present invention;

fig. 8 is a table comparing the influence of neural networks with different layers on AUC according to the mobile application recommendation method based on feature importance and bilinear feature interaction of the present invention;

fig. 9 is a comparison table of the influence of neural networks with different layer numbers on Logloss in the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention.

Detailed Description

The technical solutions of the present invention will be described in further detail with reference to specific implementations, and all the portions of the present invention that are not described in detail in the technical features or the connection relationships of the present invention are the prior art.

The present invention will be described in further detail with reference to examples.

As shown in fig. 1 to 9, the technical solution adopted by the present invention is as follows: the mobile application recommendation method based on feature importance and bilinear feature interaction comprises the following steps:

E＝[e₁，e₂，...，e_i，...，e_f] (1)

wherein f represents the number of fields, e_i∈R^kThen representing the characterization of the ith domain, the size k of the vector;

all domains

p_ij＝v_i·W⊙v_j (5)

W∈R^k×kAll vectors (v)_i，v_j) Sharing the sameOne W; the additional parameter quantity in the bilinear interaction layer is k x k;

single domain

p_ij＝v_i·W_i⊙v_j (6)

W∈R^k×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer; interaction between domains

p_ij＝v_i·W_ij⊙v_j (7)

The parameter amount of (a);

the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E₁，...，p_i，...，p_n]And outputting the interaction vector q ═ q from the SENET-like embedding V₁，...，q_i，...，q_n](ii) a Wherein the vector p_i∈R^k，q_i∈R^k；

C＝[p₁，...，p_n，q₁，...，q_n]＝[c₁，...，c_2n] (8)

5) deep network: combining a deep neural network DNN component with a shallow model to form a deep model; the input is an output vector c of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a⁽⁰⁾＝[c₁，...，c_2n]Representing an initial input, will a⁽⁰⁾And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:

a^(l)＝σ(W^la^(l-1)+b^l) (9)

a^(l)output of the l-th layer of the deep network, σIs a function of sigmoid, W^lWeight matrix representing the model, b^lRepresents the offset of the model;

y_d＝σ(W^La^(L+1)+b^L+1) (10)

wherein L is the number of layers of the depth model;

wherein

the entire training process aims to minimize the following objective function:

wherein y is_iIs the true label for the ith sample,

The specific steps of feature compression in the step 2) are as follows: compressing original embedding E to a statistical vector Z ═ Z using an average pooling operation₁，...，z_i，...，z_f]In z_iCan be calculated by the following formula:

The specific steps of excitation in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector Z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:

A＝σ₂(W₂σ₁(W₁Z)) (3)

wherein A ∈ R^fAs a vector, σ₁And σ₂Is an activation function.

The weighting in the step 2) comprises the following specific steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)₁，...，v_f}; the embedding V of SENET can be calculated as follows:

V＝[α₁·e₁，...，a_f·e_f]＝[v₁，...，v_f] (4)

experiment of

Basic setup

Data set: in the experiment, a Kaggle public data set 'App Store' is adopted, and 11 features are selected, wherein the number of the discrete features is 2, and the number of the continuous feature values is 9. (the feature columns are specifically prime _ gene, control _ rating, price, rating _ count _ tot, rating _ count _ ver, user _ ver, sup _ devices.num, ipad Sc _ urls.num, lang.num and size _ MB). since there is no label in the original dataset, we manually set the mobile application label value to be 1 for a score value of greater than 3 and a score number of more than 800, otherwise to be 0. The ratio of the number of positive samples is about 0.424, so that the experimental result is not influenced too much due to uneven distribution of the samples. The details of the distribution of the top 20 categories with the highest number are shown in fig. 3.

In addition, Adam was used as the optimizer and binary _ cross was used as the loss function in the experiment. The batch size was set to 64 during model training, and the validation set ratio was 0.2. To better illustrate the experimental results, different proportions of training sets will be selected for testing the experimental results.

Evaluation index

AUC and Logloss, which are widely used in click prediction, are selected as evaluation indexes.

In general, for a binary classification problem, we can set a threshold to classify samples into positive and negative classes. And calculating corresponding coordinate points in the ROC according to different threshold values to form an ROC curve. AUC is the area under the ROC curve. When 0.5 < AUC < 1, the model outperforms the stochastic classifier. In particular, the closer the AUC is to 1.0, the higher the authenticity; when it equals 0.5, the authenticity is lowest. The calculation formula is as follows:

in ROC space, a coordinate point (f)_pr，f_pr) The trade-off between FP (false positive case) and TP (true positive case) is described.

Logloss measures the accuracy of a classifier by penalizing erroneous classifications. Minimizing the log loss is substantially equivalent to maximizing the accuracy of the classifier. To calculate the log loss, the classifier must provide a probability value for each class to which the input belongs, not just the most likely class. The logarithmic loss reflects the average deviation of the samples and is often optimized as a loss function of the model. The calculation formula is as follows:

in the formula, y_tIs the authentic label of the ith sample

The prediction tag corresponding to the ith sample, N is the total number of mobile application samples.

Comparison method

Several methods are compared below to better illustrate the experimental results.

MLR: the MLR model is an extension of the linear LR model, and can learn higher-order feature combinations by fitting data in a piecewise linear manner. The basic idea is to adopt a strategy of dividing and treating one another.

AFM: the FM model is improved, and an attention mechanism is introduced into a feature crossing module. An attention network is used to learn the importance of different combined features (second order cross).

FNN: the FNN (neural network based on a factorization machine) uses FM to conduct supervised learning to obtain an embedded layer, so that dimensionality of sparse features can be effectively reduced, and continuous dense features can be obtained.

NFM: the linear intersection of FM on second-order features and the nonlinear intersection of the neural network on high-order features are combined together, and feature combinations which do not appear in a training set can be learned.

Deep FM: combining FM interaction at first and second order features, and at higher order features, while reducing the number of parameters and sharing information by sharing the Embedding (Embedding) of FM and DNN.

Performance comparison

Training set ratios from 0.2 to 0.9 were selected to compare model performance. The experimental results are shown in fig. 5, 6 and 7. Overall, MR-FI performs best. Particularly, when the sequence number is 0.8, the AUC of MR-FI is 19.2%, 20.99%, 1.22%, 0.27% and 1.08% higher than that of MLR, AFM, FNN, NFM and DeepFM, respectively. Compared with the method, the loginos of the MR-FI is respectively reduced by 32.73%, 34.72%, 2.22%, 3.37% and 3.28%. More specifically, we have the following findings:

MR-FI has the best performance in all models, which shows that the precision of the recommendation model can be effectively improved by considering the original feature weight and learning feature interaction by utilizing a bilinear interaction layer.

Depth models such as FNN, deep FM consistently performed better than shallow MLR, etc., with 17.98% and 18.21% improvement at a training size of 0.8, respectively. The depth model can be better modeled when the features are sparse.

In the shallow model, MLR is always superior to AFM model. When the training size is 0.2, the accuracy of the MLR model is improved by 4.73% compared with that of the AFM model, which indicates that learning the feature combinations of higher orders is more important than learning the weights of the feature combinations of lower orders.

Starting from the FNN, the addition of the neural network makes the accuracy of this type of model significantly improved compared to the shallow model. The results show that neural networks can learn more efficiently from the representation of FM, with better results than the shallow model.

The NFM model and the DeepFM model also have better overall performance, which shows that the combination of low-order characteristic and high-order characteristic interaction can improve the model precision to a certain extent.

Hyper-parametric analysis

The hyper-parametric analysis mainly comprises the embedded size and the number of layers of the neural network. However, in the mobile application recommendation feature modeling, the continuity features are more and the sparsity features are less. Therefore, the influence of the embedding size on the model accuracy is small and will not be discussed here. The influence of the neural network hierarchy on the model is mainly analyzed. As can be seen from the experimental results of fig. 8 and 9, when the number of neural network layers is increased from 1 to 3, the AUC is increased to some extent, from 0.83 to 0.91; however, when the number of layers was increased to 4, AUC dropped sharply; when the number of the neural network layers is increased from 4 layers to 7 layers, the AUC is improved to some extent, but the highest point of the layer 3 is not reached, and the trend is reduced. As can be seen from the effect of different neural network layer numbers on logloss, the lowest point of logloss is reached when the number of layers is 3. Therefore, we can conclude that increasing the number of DNN layers will increase the complexity of the model. Specifically, under a certain number of layers, the performance of the model can be improved by increasing the number of layers; but if the number of layers is increased, the performance is degraded. This is because a too complex model easily leads to overfitting. Especially for mobile application recommendations, a good choice is to set the layer size to 3.

The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The mobile application recommendation method based on feature importance and bilinear feature interaction is characterized by comprising the following steps of:

E＝[e₁，e₂，..，e_i，...，e_f] (1)

all domains

p_ij＝v_i·W⊙v_j (5)

W∈R^k×kAll vectors (v)_i，v_j) Sharing a W; the additional parameter quantity in the bilinear interaction layer is k x k;

single domain

p_ij＝v_i·W_i⊙v_j (6)

interaction between domains

p_ij＝v_i·W_ij⊙v_j (7)

The parameter amount of (a);

C＝[p₁，...，p_n，q₁，...，q_n]＝[c₁，...，c_2n] (8)

a^(l)＝σ(W^la^(l-1)+b^l) (9)

a^(l)σ is a function of sigmoid, W, for the output of the l-th layer of the deep network^lDisplay moduleWeight matrix of type, b^lRepresents the offset of the model;

y_d＝σ(W^La^(L+1)+b^L+1) (10)

wherein L is the number of layers of the depth model;

wherein

the entire training process aims to minimize the following objective function:

wherein y is_iIs the true label for the ith sample,

2. The feature importance and bilinear feature interaction-based mobile application recommendation method according to claim 1, wherein the feature compression in the step 2) comprises the following specific steps: compressing original embedding E to a statistical vector z ═ z using an average pooling operation₁，...，z_i，...，z_f]In z_iCan be calculated by the following formula:

3. The feature importance and bilinear feature interaction based mobile application recommendation method according to claim 1, wherein the specific steps stimulated in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:

A＝σ₂(W₂σ₁(W₁z)) (3)

wherein A ∈ R^fAs a vector, σ₁And σ₂Is an activation function.

4. The method for recommending mobile applications based on feature importance and bilinear feature interaction according to claim 1, wherein the step 2) of weighting specifically comprises the following steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)₁，...，v_f}; the embedding V of SENET can be calculated as follows:

V＝[a₁·e₁，...，a_f·e_f]＝[v₁，...，v_f] (4)。