CN113554491A - Mobile application recommendation method based on feature importance and bilinear feature interaction - Google Patents
Mobile application recommendation method based on feature importance and bilinear feature interaction Download PDFInfo
- Publication number
- CN113554491A CN113554491A CN202110853887.9A CN202110853887A CN113554491A CN 113554491 A CN113554491 A CN 113554491A CN 202110853887 A CN202110853887 A CN 202110853887A CN 113554491 A CN113554491 A CN 113554491A
- Authority
- CN
- China
- Prior art keywords
- layer
- feature
- interaction
- features
- bilinear
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 claims abstract description 22
- 239000013598 vector Substances 0.000 claims description 45
- 239000011159 matrix material Substances 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000005284 excitation Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 238000000280 densification Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention discloses a mobile application recommendation method based on feature importance and bilinear feature interaction. The SENET layer converts the embedded layer into embedded features similar to SENET, so that the resolvability of the features is improved. And then the bilinear interaction layer respectively carries out second-order feature interaction modeling on the original embedding and the SEENE-like embedding. The connection layer then integrates the output connections of the bilinear interaction layer. And finally, pouring the features after the cross combination into a neural network, and outputting a prediction score at a prediction layer. The invention belongs to the technical field of mobile application, and particularly relates to a mobile application recommendation method based on feature importance and bilinear feature interaction.
Description
Technical Field
The invention belongs to the technical field of mobile application, and particularly relates to a mobile application recommendation method based on feature importance and bilinear feature interaction.
Background
With the rapid development of mobile applications in large mobile application stores, it becomes a challenge for users to select their desired mobile applications. Therefore, there is a need to provide a high quality mobile application recommendation mechanism to meet the user's expectations. Although the existing method has a remarkable effect on mobile application recommendation, the recommendation accuracy rate can be further improved. Rather, they focus primarily on how to better interact between the functions of mobile applications, ignoring the importance or weighting of the functions themselves.
Disclosure of Invention
In order to solve the problems, the invention provides a mobile application recommendation method based on feature importance and bilinear feature interaction by combining an inner product and a Hadamard product based on a compressed excitation network mechanism and a bilinear function.
The technical scheme adopted by the invention is as follows: the mobile application recommendation method based on feature importance and bilinear feature interaction comprises the following steps:
1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:
E=[e1,e2,...,ei,...,ef] (1)
wherein f represents the number of fields, ei∈RkRepresenting the characterization of the ith domain, wherein the size of the vector is k;
2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;
3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector pijThe calculation can be done in three ways:
all domains
pij=vi·W⊙vj (5)
W∈Rk×kAll vectors (v)i,vj) Sharing a W; the additional parameter quantity in the bilinear interaction layer is k x k;
single domain
pij=vi·Wi⊙vj (6)
W∈Rk×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer;
interaction between domains
pij=vi·Wij⊙vj (7)
W∈Rk×kCorresponding to the weight matrix between the ith field and the jth field, the additional size isThe parameter amount of (a);
the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E1,...,pi,...,pn]And outputting the interaction vector q ═ q from the SENET-like embedding V1,...,qi,...,qn](ii) a Wherein the vector pi∈Rkqi∈Rk;
4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:
C=[p1,...,pn,q1,...,qn]=[c1,...,c2n] (8)
5) deep network: will be provided withCombining the DNN component of the deep neural network with the shallow model to form a deep model; the input is an output vector C of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a(0)=[c1,...,c2n]Representing an initial input, will a(0)And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:
a(l)=σ(Wla(l-1)+bl) (9)
a(l)σ is a function of sigmoid, W, for the output of the l-th layer of the deep networklWeight matrix representing the model, blRepresents the offset of the model;
after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:
yd=σ(WLa(L+1)+bL+1) (10)
where L is the number of layers of the depth model.
6) Prediction layer: the output expression of the model prediction layer is as follows:
whereinAs a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is aiIs the ith weight of the linear part;
the entire training process aims to minimize the following objective function:
wherein y isiIs the true label for the ith sample,a prediction tag corresponding to the ith sample; n is the total number of samples for the mobile application.
Further, the specific steps of feature compression in the step 2) are as follows: compressing original embedding E to a statistical vector Z ═ Z using an average pooling operation1,...,zi,...,zf]In ziCan be calculated by the following formula:
ziglobal information about the ith feature representation is represented, and k represents the embedded dimension size.
Further, the specific steps of excitation in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector Z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:
A=σ2(W2σ1(W1Z)) (3)
wherein A ∈ RfAs a vector, σ1And σ2Is an activation function.
Further, the weighting in step 2) specifically comprises the following steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)1,...,vf}; the embedding V of SENET can be calculated as follows:
V=[a1·e1,...,af·ef]=[v1,...,vf] (4)
the invention adopts the structure to obtain the following beneficial effects: the invention provides a mobile application recommendation method based on feature importance and bilinear feature interaction. Then, combining the classical deep neural network component with the shallow model, injecting the cross combination characteristics into the deep model, and predicting the preference of the user on different mobile applications. The method is evaluated by using a Kaggle real data set, and experimental results show that the method can obtain the optimal AUC and Logloss under most conditions. The method can effectively improve the recommendation accuracy of the mobile application.
Drawings
FIG. 1 is a model architecture diagram of a mobile application recommendation method based on feature importance and bilinear feature interaction in accordance with the present invention;
FIG. 2 is a SENET layer structure diagram of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 3 is a diagram of a bilinear interaction layer structure of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 4 is a distribution detail table of the top 20 categories with the largest number in the experiment of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 5 is a table of performance comparisons of training sets of different proportions in an experiment of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 6 is a comparison graph of AUC of different models of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 7 is a Logloss comparison of different models of the mobile application recommendation method based on feature importance and bilinear feature interaction of the present invention;
fig. 8 is a table comparing the influence of neural networks with different layers on AUC according to the mobile application recommendation method based on feature importance and bilinear feature interaction of the present invention;
fig. 9 is a comparison table of the influence of neural networks with different layer numbers on Logloss in the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention.
Detailed Description
The technical solutions of the present invention will be described in further detail with reference to specific implementations, and all the portions of the present invention that are not described in detail in the technical features or the connection relationships of the present invention are the prior art.
The present invention will be described in further detail with reference to examples.
As shown in fig. 1 to 9, the technical solution adopted by the present invention is as follows: the mobile application recommendation method based on feature importance and bilinear feature interaction comprises the following steps:
1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:
E=[e1,e2,...,ei,...,ef] (1)
wherein f represents the number of fields, ei∈RkThen representing the characterization of the ith domain, the size k of the vector;
2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;
3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector pijThe calculation can be done in three ways:
all domains
pij=vi·W⊙vj (5)
W∈Rk×kAll vectors (v)i,vj) Sharing the sameOne W; the additional parameter quantity in the bilinear interaction layer is k x k;
single domain
pij=vi·Wi⊙vj (6)
W∈Rk×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer; interaction between domains
pij=vi·Wij⊙vj (7)
W∈Rk×kCorresponding to the weight matrix between the ith field and the jth field, the additional size isThe parameter amount of (a);
the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E1,...,pi,...,pn]And outputting the interaction vector q ═ q from the SENET-like embedding V1,...,qi,...,qn](ii) a Wherein the vector pi∈Rk,qi∈Rk;
4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:
C=[p1,...,pn,q1,...,qn]=[c1,...,c2n] (8)
5) deep network: combining a deep neural network DNN component with a shallow model to form a deep model; the input is an output vector c of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a(0)=[c1,...,c2n]Representing an initial input, will a(0)And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:
a(l)=σ(Wla(l-1)+bl) (9)
a(l)output of the l-th layer of the deep network, σIs a function of sigmoid, WlWeight matrix representing the model, blRepresents the offset of the model;
after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:
yd=σ(WLa(L+1)+bL+1) (10)
wherein L is the number of layers of the depth model;
6) prediction layer: the output expression of the model prediction layer is as follows:
whereinAs a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is aiIs the ith weight of the linear part;
the entire training process aims to minimize the following objective function:
wherein y isiIs the true label for the ith sample,a prediction tag corresponding to the ith sample; n is the total number of samples for the mobile application.
The specific steps of feature compression in the step 2) are as follows: compressing original embedding E to a statistical vector Z ═ Z using an average pooling operation1,...,zi,...,zf]In ziCan be calculated by the following formula:
ziglobal information about the ith feature representation is represented, and k represents the embedded dimension size.
The specific steps of excitation in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector Z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:
A=σ2(W2σ1(W1Z)) (3)
wherein A ∈ RfAs a vector, σ1And σ2Is an activation function.
The weighting in the step 2) comprises the following specific steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)1,...,vf}; the embedding V of SENET can be calculated as follows:
V=[α1·e1,...,af·ef]=[v1,...,vf] (4)
experiment of
Basic setup
Data set: in the experiment, a Kaggle public data set 'App Store' is adopted, and 11 features are selected, wherein the number of the discrete features is 2, and the number of the continuous feature values is 9. (the feature columns are specifically prime _ gene, control _ rating, price, rating _ count _ tot, rating _ count _ ver, user _ ver, sup _ devices.num, ipad Sc _ urls.num, lang.num and size _ MB). since there is no label in the original dataset, we manually set the mobile application label value to be 1 for a score value of greater than 3 and a score number of more than 800, otherwise to be 0. The ratio of the number of positive samples is about 0.424, so that the experimental result is not influenced too much due to uneven distribution of the samples. The details of the distribution of the top 20 categories with the highest number are shown in fig. 3.
In addition, Adam was used as the optimizer and binary _ cross was used as the loss function in the experiment. The batch size was set to 64 during model training, and the validation set ratio was 0.2. To better illustrate the experimental results, different proportions of training sets will be selected for testing the experimental results.
Evaluation index
AUC and Logloss, which are widely used in click prediction, are selected as evaluation indexes.
In general, for a binary classification problem, we can set a threshold to classify samples into positive and negative classes. And calculating corresponding coordinate points in the ROC according to different threshold values to form an ROC curve. AUC is the area under the ROC curve. When 0.5 < AUC < 1, the model outperforms the stochastic classifier. In particular, the closer the AUC is to 1.0, the higher the authenticity; when it equals 0.5, the authenticity is lowest. The calculation formula is as follows:
in ROC space, a coordinate point (f)pr,fpr) The trade-off between FP (false positive case) and TP (true positive case) is described.
Logloss measures the accuracy of a classifier by penalizing erroneous classifications. Minimizing the log loss is substantially equivalent to maximizing the accuracy of the classifier. To calculate the log loss, the classifier must provide a probability value for each class to which the input belongs, not just the most likely class. The logarithmic loss reflects the average deviation of the samples and is often optimized as a loss function of the model. The calculation formula is as follows:
in the formula, ytIs the authentic label of the ith sampleThe prediction tag corresponding to the ith sample, N is the total number of mobile application samples.
Comparison method
Several methods are compared below to better illustrate the experimental results.
MLR: the MLR model is an extension of the linear LR model, and can learn higher-order feature combinations by fitting data in a piecewise linear manner. The basic idea is to adopt a strategy of dividing and treating one another.
AFM: the FM model is improved, and an attention mechanism is introduced into a feature crossing module. An attention network is used to learn the importance of different combined features (second order cross).
FNN: the FNN (neural network based on a factorization machine) uses FM to conduct supervised learning to obtain an embedded layer, so that dimensionality of sparse features can be effectively reduced, and continuous dense features can be obtained.
NFM: the linear intersection of FM on second-order features and the nonlinear intersection of the neural network on high-order features are combined together, and feature combinations which do not appear in a training set can be learned.
Deep FM: combining FM interaction at first and second order features, and at higher order features, while reducing the number of parameters and sharing information by sharing the Embedding (Embedding) of FM and DNN.
Performance comparison
Training set ratios from 0.2 to 0.9 were selected to compare model performance. The experimental results are shown in fig. 5, 6 and 7. Overall, MR-FI performs best. Particularly, when the sequence number is 0.8, the AUC of MR-FI is 19.2%, 20.99%, 1.22%, 0.27% and 1.08% higher than that of MLR, AFM, FNN, NFM and DeepFM, respectively. Compared with the method, the loginos of the MR-FI is respectively reduced by 32.73%, 34.72%, 2.22%, 3.37% and 3.28%. More specifically, we have the following findings:
MR-FI has the best performance in all models, which shows that the precision of the recommendation model can be effectively improved by considering the original feature weight and learning feature interaction by utilizing a bilinear interaction layer.
Depth models such as FNN, deep FM consistently performed better than shallow MLR, etc., with 17.98% and 18.21% improvement at a training size of 0.8, respectively. The depth model can be better modeled when the features are sparse.
In the shallow model, MLR is always superior to AFM model. When the training size is 0.2, the accuracy of the MLR model is improved by 4.73% compared with that of the AFM model, which indicates that learning the feature combinations of higher orders is more important than learning the weights of the feature combinations of lower orders.
Starting from the FNN, the addition of the neural network makes the accuracy of this type of model significantly improved compared to the shallow model. The results show that neural networks can learn more efficiently from the representation of FM, with better results than the shallow model.
The NFM model and the DeepFM model also have better overall performance, which shows that the combination of low-order characteristic and high-order characteristic interaction can improve the model precision to a certain extent.
Hyper-parametric analysis
The hyper-parametric analysis mainly comprises the embedded size and the number of layers of the neural network. However, in the mobile application recommendation feature modeling, the continuity features are more and the sparsity features are less. Therefore, the influence of the embedding size on the model accuracy is small and will not be discussed here. The influence of the neural network hierarchy on the model is mainly analyzed. As can be seen from the experimental results of fig. 8 and 9, when the number of neural network layers is increased from 1 to 3, the AUC is increased to some extent, from 0.83 to 0.91; however, when the number of layers was increased to 4, AUC dropped sharply; when the number of the neural network layers is increased from 4 layers to 7 layers, the AUC is improved to some extent, but the highest point of the layer 3 is not reached, and the trend is reduced. As can be seen from the effect of different neural network layer numbers on logloss, the lowest point of logloss is reached when the number of layers is 3. Therefore, we can conclude that increasing the number of DNN layers will increase the complexity of the model. Specifically, under a certain number of layers, the performance of the model can be improved by increasing the number of layers; but if the number of layers is increased, the performance is degraded. This is because a too complex model easily leads to overfitting. Especially for mobile application recommendations, a good choice is to set the layer size to 3.
The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (4)
1. The mobile application recommendation method based on feature importance and bilinear feature interaction is characterized by comprising the following steps of:
1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:
E=[e1,e2,..,ei,...,ef] (1)
wherein f represents the number of fields, ei∈RkThen representing the characterization of the ith domain, the size k of the vector;
2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;
3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector pijThe calculation can be done in three ways:
all domains
pij=vi·W⊙vj (5)
W∈Rk×kAll vectors (v)i,vj) Sharing a W; the additional parameter quantity in the bilinear interaction layer is k x k;
single domain
pij=vi·Wi⊙vj (6)
W∈Rk×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer;
interaction between domains
pij=vi·Wij⊙vj (7)
W∈Rk×kCorresponding to the weight matrix between the ith field and the jth field, the additional size isThe parameter amount of (a);
the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E1,...,pi,...,pn]And outputting the interaction vector q ═ q from the SENET-like embedding v1,...,qi,...,qn](ii) a Wherein the vector pi∈Rk,qi∈Rk;
4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:
C=[p1,...,pn,q1,...,qn]=[c1,...,c2n] (8)
5) deep network: combining a deep neural network DNN component with a shallow model to form a deep model; the input is an output vector C of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a(0)=[c1,...,c2n]Representing an initial input, will a(0)And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:
a(l)=σ(Wla(l-1)+bl) (9)
a(l)σ is a function of sigmoid, W, for the output of the l-th layer of the deep networklDisplay moduleWeight matrix of type, blRepresents the offset of the model;
after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:
yd=σ(WLa(L+1)+bL+1) (10)
wherein L is the number of layers of the depth model;
6) prediction layer: the output expression of the model prediction layer is as follows:
whereinAs a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is aiIs the ith weight of the linear part;
the entire training process aims to minimize the following objective function:
2. The feature importance and bilinear feature interaction-based mobile application recommendation method according to claim 1, wherein the feature compression in the step 2) comprises the following specific steps: compressing original embedding E to a statistical vector z ═ z using an average pooling operation1,...,zi,...,zf]In ziCan be calculated by the following formula:
ziglobal information about the ith feature representation is represented, and k represents the embedded dimension size.
3. The feature importance and bilinear feature interaction based mobile application recommendation method according to claim 1, wherein the specific steps stimulated in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:
A=σ2(W2σ1(W1z)) (3)
wherein A ∈ RfAs a vector, σ1And σ2Is an activation function.
4. The method for recommending mobile applications based on feature importance and bilinear feature interaction according to claim 1, wherein the step 2) of weighting specifically comprises the following steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)1,...,vf}; the embedding V of SENET can be calculated as follows:
V=[a1·e1,...,af·ef]=[v1,...,vf] (4)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110853887.9A CN113554491B (en) | 2021-07-28 | 2021-07-28 | Mobile application recommendation method based on feature importance and bilinear feature interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110853887.9A CN113554491B (en) | 2021-07-28 | 2021-07-28 | Mobile application recommendation method based on feature importance and bilinear feature interaction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113554491A true CN113554491A (en) | 2021-10-26 |
CN113554491B CN113554491B (en) | 2024-04-16 |
Family
ID=78133015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110853887.9A Active CN113554491B (en) | 2021-07-28 | 2021-07-28 | Mobile application recommendation method based on feature importance and bilinear feature interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113554491B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120150532A1 (en) * | 2010-12-08 | 2012-06-14 | At&T Intellectual Property I, L.P. | System and method for feature-rich continuous space language models |
US20150278441A1 (en) * | 2014-03-25 | 2015-10-01 | Nec Laboratories America, Inc. | High-order semi-Restricted Boltzmann Machines and Deep Models for accurate peptide-MHC binding prediction |
CN110826700A (en) * | 2019-11-13 | 2020-02-21 | 中国科学技术大学 | Method for realizing and classifying bilinear graph neural network model for modeling neighbor interaction |
CN112365297A (en) * | 2020-12-04 | 2021-02-12 | 东华理工大学 | Advertisement click rate estimation method |
-
2021
- 2021-07-28 CN CN202110853887.9A patent/CN113554491B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120150532A1 (en) * | 2010-12-08 | 2012-06-14 | At&T Intellectual Property I, L.P. | System and method for feature-rich continuous space language models |
US20150278441A1 (en) * | 2014-03-25 | 2015-10-01 | Nec Laboratories America, Inc. | High-order semi-Restricted Boltzmann Machines and Deep Models for accurate peptide-MHC binding prediction |
CN110826700A (en) * | 2019-11-13 | 2020-02-21 | 中国科学技术大学 | Method for realizing and classifying bilinear graph neural network model for modeling neighbor interaction |
CN112365297A (en) * | 2020-12-04 | 2021-02-12 | 东华理工大学 | Advertisement click rate estimation method |
Non-Patent Citations (1)
Title |
---|
曹步清;肖巧翔;张祥平;刘建勋: "融合SOM功能聚类与DeepFM质量预测的API服务推荐方法", 计算机学报, no. 006 * |
Also Published As
Publication number | Publication date |
---|---|
CN113554491B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vafeiadis et al. | A comparison of machine learning techniques for customer churn prediction | |
Cavalcanti et al. | Combining diversity measures for ensemble pruning | |
Boughida et al. | A novel approach for facial expression recognition based on Gabor filters and genetic algorithm | |
CN107516129A (en) | The depth Web compression method decomposed based on the adaptive Tucker of dimension | |
CN110083770B (en) | Sequence recommendation method based on deeper feature level self-attention network | |
Duma et al. | Sparseness reduction in collaborative filtering using a nearest neighbour artificial immune system with genetic algorithms | |
CN111352965A (en) | Training method of sequence mining model, and processing method and equipment of sequence data | |
CN112085565A (en) | Deep learning-based information recommendation method, device, equipment and storage medium | |
CN109325875A (en) | Implicit group based on the hidden feature of online social user finds method | |
CN112287166A (en) | Movie recommendation method and system based on improved deep belief network | |
Wu et al. | Optimized deep learning framework for water distribution data-driven modeling | |
CN114154565A (en) | Click rate prediction method and device based on multi-level feature interaction | |
Tiezzi et al. | Graph neural networks for graph drawing | |
CN116976505A (en) | Click rate prediction method of decoupling attention network based on information sharing | |
CN116228368A (en) | Advertisement click rate prediction method based on deep multi-behavior network | |
CN115718826A (en) | Method, system, device and medium for classifying target nodes in graph structure data | |
CN113837492A (en) | Method, apparatus, storage medium, and program product for predicting supply amount of article | |
Luan et al. | LRP‐based network pruning and policy distillation of robust and non‐robust DRL agents for embedded systems | |
Deng et al. | Sparsity-control ternary weight networks | |
CN116976461A (en) | Federal learning method, apparatus, device and medium | |
CN113554491A (en) | Mobile application recommendation method based on feature importance and bilinear feature interaction | |
CN114842247B (en) | Characteristic accumulation-based graph convolution network semi-supervised node classification method | |
Su et al. | Sampling-free learning of Bayesian quantized neural networks | |
CN115689639A (en) | Commercial advertisement click rate prediction method based on deep learning | |
Fonseca et al. | A similarity-based surrogate model for enhanced performance in genetic algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |