CN113554491A - Mobile application recommendation method based on feature importance and bilinear feature interaction - Google Patents

Mobile application recommendation method based on feature importance and bilinear feature interaction Download PDF

Info

Publication number
CN113554491A
CN113554491A CN202110853887.9A CN202110853887A CN113554491A CN 113554491 A CN113554491 A CN 113554491A CN 202110853887 A CN202110853887 A CN 202110853887A CN 113554491 A CN113554491 A CN 113554491A
Authority
CN
China
Prior art keywords
layer
feature
interaction
features
bilinear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110853887.9A
Other languages
Chinese (zh)
Other versions
CN113554491B (en
Inventor
曹步清
彭咪
陈俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Science and Technology
Original Assignee
Hunan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Science and Technology filed Critical Hunan University of Science and Technology
Priority to CN202110853887.9A priority Critical patent/CN113554491B/en
Publication of CN113554491A publication Critical patent/CN113554491A/en
Application granted granted Critical
Publication of CN113554491B publication Critical patent/CN113554491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a mobile application recommendation method based on feature importance and bilinear feature interaction. The SENET layer converts the embedded layer into embedded features similar to SENET, so that the resolvability of the features is improved. And then the bilinear interaction layer respectively carries out second-order feature interaction modeling on the original embedding and the SEENE-like embedding. The connection layer then integrates the output connections of the bilinear interaction layer. And finally, pouring the features after the cross combination into a neural network, and outputting a prediction score at a prediction layer. The invention belongs to the technical field of mobile application, and particularly relates to a mobile application recommendation method based on feature importance and bilinear feature interaction.

Description

Mobile application recommendation method based on feature importance and bilinear feature interaction
Technical Field
The invention belongs to the technical field of mobile application, and particularly relates to a mobile application recommendation method based on feature importance and bilinear feature interaction.
Background
With the rapid development of mobile applications in large mobile application stores, it becomes a challenge for users to select their desired mobile applications. Therefore, there is a need to provide a high quality mobile application recommendation mechanism to meet the user's expectations. Although the existing method has a remarkable effect on mobile application recommendation, the recommendation accuracy rate can be further improved. Rather, they focus primarily on how to better interact between the functions of mobile applications, ignoring the importance or weighting of the functions themselves.
Disclosure of Invention
In order to solve the problems, the invention provides a mobile application recommendation method based on feature importance and bilinear feature interaction by combining an inner product and a Hadamard product based on a compressed excitation network mechanism and a bilinear function.
The technical scheme adopted by the invention is as follows: the mobile application recommendation method based on feature importance and bilinear feature interaction comprises the following steps:
1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:
E=[e1,e2,...,ei,...,ef] (1)
wherein f represents the number of fields, ei∈RkRepresenting the characterization of the ith domain, wherein the size of the vector is k;
2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;
3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector pijThe calculation can be done in three ways:
all domains
pij=vi·W⊙vj (5)
W∈Rk×kAll vectors (v)i,vj) Sharing a W; the additional parameter quantity in the bilinear interaction layer is k x k;
single domain
pij=vi·Wi⊙vj (6)
W∈Rk×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer;
interaction between domains
pij=vi·Wij⊙vj (7)
W∈Rk×kCorresponding to the weight matrix between the ith field and the jth field, the additional size is
Figure BDA0003183378030000021
The parameter amount of (a);
the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E1,...,pi,...,pn]And outputting the interaction vector q ═ q from the SENET-like embedding V1,...,qi,...,qn](ii) a Wherein the vector pi∈Rkqi∈Rk
4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:
C=[p1,...,pn,q1,...,qn]=[c1,...,c2n] (8)
5) deep network: will be provided withCombining the DNN component of the deep neural network with the shallow model to form a deep model; the input is an output vector C of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a(0)=[c1,...,c2n]Representing an initial input, will a(0)And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:
a(l)=σ(Wla(l-1)+bl) (9)
a(l)σ is a function of sigmoid, W, for the output of the l-th layer of the deep networklWeight matrix representing the model, blRepresents the offset of the model;
after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:
yd=σ(WLa(L+1)+bL+1) (10)
where L is the number of layers of the depth model.
6) Prediction layer: the output expression of the model prediction layer is as follows:
Figure BDA0003183378030000031
wherein
Figure BDA0003183378030000034
As a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is aiIs the ith weight of the linear part;
the entire training process aims to minimize the following objective function:
Figure BDA0003183378030000032
wherein y isiIs the true label for the ith sample,
Figure BDA0003183378030000033
a prediction tag corresponding to the ith sample; n is the total number of samples for the mobile application.
Further, the specific steps of feature compression in the step 2) are as follows: compressing original embedding E to a statistical vector Z ═ Z using an average pooling operation1,...,zi,...,zf]In ziCan be calculated by the following formula:
Figure BDA0003183378030000041
ziglobal information about the ith feature representation is represented, and k represents the embedded dimension size.
Further, the specific steps of excitation in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector Z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:
A=σ2(W2σ1(W1Z)) (3)
wherein A ∈ RfAs a vector, σ1And σ2Is an activation function.
Further, the weighting in step 2) specifically comprises the following steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)1,...,vf}; the embedding V of SENET can be calculated as follows:
V=[a1·e1,...,af·ef]=[v1,...,vf] (4)
the invention adopts the structure to obtain the following beneficial effects: the invention provides a mobile application recommendation method based on feature importance and bilinear feature interaction. Then, combining the classical deep neural network component with the shallow model, injecting the cross combination characteristics into the deep model, and predicting the preference of the user on different mobile applications. The method is evaluated by using a Kaggle real data set, and experimental results show that the method can obtain the optimal AUC and Logloss under most conditions. The method can effectively improve the recommendation accuracy of the mobile application.
Drawings
FIG. 1 is a model architecture diagram of a mobile application recommendation method based on feature importance and bilinear feature interaction in accordance with the present invention;
FIG. 2 is a SENET layer structure diagram of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 3 is a diagram of a bilinear interaction layer structure of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 4 is a distribution detail table of the top 20 categories with the largest number in the experiment of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 5 is a table of performance comparisons of training sets of different proportions in an experiment of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 6 is a comparison graph of AUC of different models of the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention;
FIG. 7 is a Logloss comparison of different models of the mobile application recommendation method based on feature importance and bilinear feature interaction of the present invention;
fig. 8 is a table comparing the influence of neural networks with different layers on AUC according to the mobile application recommendation method based on feature importance and bilinear feature interaction of the present invention;
fig. 9 is a comparison table of the influence of neural networks with different layer numbers on Logloss in the mobile application recommendation method based on feature importance and bilinear feature interaction according to the present invention.
Detailed Description
The technical solutions of the present invention will be described in further detail with reference to specific implementations, and all the portions of the present invention that are not described in detail in the technical features or the connection relationships of the present invention are the prior art.
The present invention will be described in further detail with reference to examples.
As shown in fig. 1 to 9, the technical solution adopted by the present invention is as follows: the mobile application recommendation method based on feature importance and bilinear feature interaction comprises the following steps:
1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:
E=[e1,e2,...,ei,...,ef] (1)
wherein f represents the number of fields, ei∈RkThen representing the characterization of the ith domain, the size k of the vector;
2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;
3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector pijThe calculation can be done in three ways:
all domains
pij=vi·W⊙vj (5)
W∈Rk×kAll vectors (v)i,vj) Sharing the sameOne W; the additional parameter quantity in the bilinear interaction layer is k x k;
single domain
pij=vi·Wi⊙vj (6)
W∈Rk×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer; interaction between domains
pij=vi·Wij⊙vj (7)
W∈Rk×kCorresponding to the weight matrix between the ith field and the jth field, the additional size is
Figure BDA0003183378030000061
The parameter amount of (a);
the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E1,...,pi,...,pn]And outputting the interaction vector q ═ q from the SENET-like embedding V1,...,qi,...,qn](ii) a Wherein the vector pi∈Rk,qi∈Rk
4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:
C=[p1,...,pn,q1,...,qn]=[c1,...,c2n] (8)
5) deep network: combining a deep neural network DNN component with a shallow model to form a deep model; the input is an output vector c of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a(0)=[c1,...,c2n]Representing an initial input, will a(0)And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:
a(l)=σ(Wla(l-1)+bl) (9)
a(l)output of the l-th layer of the deep network, σIs a function of sigmoid, WlWeight matrix representing the model, blRepresents the offset of the model;
after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:
yd=σ(WLa(L+1)+bL+1) (10)
wherein L is the number of layers of the depth model;
6) prediction layer: the output expression of the model prediction layer is as follows:
Figure BDA0003183378030000071
wherein
Figure BDA0003183378030000072
As a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is aiIs the ith weight of the linear part;
the entire training process aims to minimize the following objective function:
Figure BDA0003183378030000073
wherein y isiIs the true label for the ith sample,
Figure BDA0003183378030000081
a prediction tag corresponding to the ith sample; n is the total number of samples for the mobile application.
The specific steps of feature compression in the step 2) are as follows: compressing original embedding E to a statistical vector Z ═ Z using an average pooling operation1,...,zi,...,zf]In ziCan be calculated by the following formula:
Figure BDA0003183378030000082
ziglobal information about the ith feature representation is represented, and k represents the embedded dimension size.
The specific steps of excitation in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector Z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:
A=σ2(W2σ1(W1Z)) (3)
wherein A ∈ RfAs a vector, σ1And σ2Is an activation function.
The weighting in the step 2) comprises the following specific steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)1,...,vf}; the embedding V of SENET can be calculated as follows:
V=[α1·e1,...,af·ef]=[v1,...,vf] (4)
experiment of
Basic setup
Data set: in the experiment, a Kaggle public data set 'App Store' is adopted, and 11 features are selected, wherein the number of the discrete features is 2, and the number of the continuous feature values is 9. (the feature columns are specifically prime _ gene, control _ rating, price, rating _ count _ tot, rating _ count _ ver, user _ ver, sup _ devices.num, ipad Sc _ urls.num, lang.num and size _ MB). since there is no label in the original dataset, we manually set the mobile application label value to be 1 for a score value of greater than 3 and a score number of more than 800, otherwise to be 0. The ratio of the number of positive samples is about 0.424, so that the experimental result is not influenced too much due to uneven distribution of the samples. The details of the distribution of the top 20 categories with the highest number are shown in fig. 3.
In addition, Adam was used as the optimizer and binary _ cross was used as the loss function in the experiment. The batch size was set to 64 during model training, and the validation set ratio was 0.2. To better illustrate the experimental results, different proportions of training sets will be selected for testing the experimental results.
Evaluation index
AUC and Logloss, which are widely used in click prediction, are selected as evaluation indexes.
In general, for a binary classification problem, we can set a threshold to classify samples into positive and negative classes. And calculating corresponding coordinate points in the ROC according to different threshold values to form an ROC curve. AUC is the area under the ROC curve. When 0.5 < AUC < 1, the model outperforms the stochastic classifier. In particular, the closer the AUC is to 1.0, the higher the authenticity; when it equals 0.5, the authenticity is lowest. The calculation formula is as follows:
Figure BDA0003183378030000091
in ROC space, a coordinate point (f)pr,fpr) The trade-off between FP (false positive case) and TP (true positive case) is described.
Logloss measures the accuracy of a classifier by penalizing erroneous classifications. Minimizing the log loss is substantially equivalent to maximizing the accuracy of the classifier. To calculate the log loss, the classifier must provide a probability value for each class to which the input belongs, not just the most likely class. The logarithmic loss reflects the average deviation of the samples and is often optimized as a loss function of the model. The calculation formula is as follows:
Figure BDA0003183378030000092
in the formula, ytIs the authentic label of the ith sample
Figure BDA0003183378030000093
The prediction tag corresponding to the ith sample, N is the total number of mobile application samples.
Comparison method
Several methods are compared below to better illustrate the experimental results.
MLR: the MLR model is an extension of the linear LR model, and can learn higher-order feature combinations by fitting data in a piecewise linear manner. The basic idea is to adopt a strategy of dividing and treating one another.
AFM: the FM model is improved, and an attention mechanism is introduced into a feature crossing module. An attention network is used to learn the importance of different combined features (second order cross).
FNN: the FNN (neural network based on a factorization machine) uses FM to conduct supervised learning to obtain an embedded layer, so that dimensionality of sparse features can be effectively reduced, and continuous dense features can be obtained.
NFM: the linear intersection of FM on second-order features and the nonlinear intersection of the neural network on high-order features are combined together, and feature combinations which do not appear in a training set can be learned.
Deep FM: combining FM interaction at first and second order features, and at higher order features, while reducing the number of parameters and sharing information by sharing the Embedding (Embedding) of FM and DNN.
Performance comparison
Training set ratios from 0.2 to 0.9 were selected to compare model performance. The experimental results are shown in fig. 5, 6 and 7. Overall, MR-FI performs best. Particularly, when the sequence number is 0.8, the AUC of MR-FI is 19.2%, 20.99%, 1.22%, 0.27% and 1.08% higher than that of MLR, AFM, FNN, NFM and DeepFM, respectively. Compared with the method, the loginos of the MR-FI is respectively reduced by 32.73%, 34.72%, 2.22%, 3.37% and 3.28%. More specifically, we have the following findings:
MR-FI has the best performance in all models, which shows that the precision of the recommendation model can be effectively improved by considering the original feature weight and learning feature interaction by utilizing a bilinear interaction layer.
Depth models such as FNN, deep FM consistently performed better than shallow MLR, etc., with 17.98% and 18.21% improvement at a training size of 0.8, respectively. The depth model can be better modeled when the features are sparse.
In the shallow model, MLR is always superior to AFM model. When the training size is 0.2, the accuracy of the MLR model is improved by 4.73% compared with that of the AFM model, which indicates that learning the feature combinations of higher orders is more important than learning the weights of the feature combinations of lower orders.
Starting from the FNN, the addition of the neural network makes the accuracy of this type of model significantly improved compared to the shallow model. The results show that neural networks can learn more efficiently from the representation of FM, with better results than the shallow model.
The NFM model and the DeepFM model also have better overall performance, which shows that the combination of low-order characteristic and high-order characteristic interaction can improve the model precision to a certain extent.
Hyper-parametric analysis
The hyper-parametric analysis mainly comprises the embedded size and the number of layers of the neural network. However, in the mobile application recommendation feature modeling, the continuity features are more and the sparsity features are less. Therefore, the influence of the embedding size on the model accuracy is small and will not be discussed here. The influence of the neural network hierarchy on the model is mainly analyzed. As can be seen from the experimental results of fig. 8 and 9, when the number of neural network layers is increased from 1 to 3, the AUC is increased to some extent, from 0.83 to 0.91; however, when the number of layers was increased to 4, AUC dropped sharply; when the number of the neural network layers is increased from 4 layers to 7 layers, the AUC is improved to some extent, but the highest point of the layer 3 is not reached, and the trend is reduced. As can be seen from the effect of different neural network layer numbers on logloss, the lowest point of logloss is reached when the number of layers is 3. Therefore, we can conclude that increasing the number of DNN layers will increase the complexity of the model. Specifically, under a certain number of layers, the performance of the model can be improved by increasing the number of layers; but if the number of layers is increased, the performance is degraded. This is because a too complex model easily leads to overfitting. Especially for mobile application recommendations, a good choice is to set the layer size to 3.
The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. The mobile application recommendation method based on feature importance and bilinear feature interaction is characterized by comprising the following steps of:
1) embedding layer: the sparse input layer adopts sparse representation to the original input features, the embedding layer can embed the sparse features into low-dimensional continuous real-valued vectors, the sparse matrix is changed into a dense matrix through linear transformation to perform matrix densification, the implicit features of the matrix are extracted, and meanwhile, the generalization capability of the model is improved; the output of the embedding layer is represented as follows:
E=[e1,e2,..,ei,...,ef] (1)
wherein f represents the number of fields, ei∈RkThen representing the characterization of the ith domain, the size k of the vector;
2) SENET layer: different characteristics have different importance to the target task, and for a specific CTR prediction task, the weight of the important characteristics is dynamically increased through a SENET mechanism, and the weight of the insufficient information characteristics is reduced; SENET consists of three steps: compressing, exciting and re-weighting the characteristics; firstly, carrying out Squeeze operation on the embedded features obtained in the last step to obtain global features, then carrying out Excitation operation on the global features, learning the relation among the features to obtain the weights of different features, and finally multiplying the weights by the original embedded features to obtain final features;
3) bilinear interaction layer: learning feature interactions with additional parameters in conjunction with the inner product and the hadamard product; interaction vector pijThe calculation can be done in three ways:
all domains
pij=vi·W⊙vj (5)
W∈Rk×kAll vectors (v)i,vj) Sharing a W; the additional parameter quantity in the bilinear interaction layer is k x k;
single domain
pij=vi·Wi⊙vj (6)
W∈Rk×kThe parameter matrix corresponding to the ith domain, so that f × k × k parameters need to be maintained in the layer;
interaction between domains
pij=vi·Wij⊙vj (7)
W∈Rk×kCorresponding to the weight matrix between the ith field and the jth field, the additional size is
Figure FDA0003183378020000021
The parameter amount of (a);
the bilinear interaction layer outputs an interaction vector p ═ p from the original embedded E1,...,pi,...,pn]And outputting the interaction vector q ═ q from the SENET-like embedding v1,...,qi,...,qn](ii) a Wherein the vector pi∈Rk,qi∈Rk
4) Connecting layers: the connecting layer connects the interactive vectors p and q and injects the connected vectors into the neural network layer; the specific process can be expressed as follows:
C=[p1,...,pn,q1,...,qn]=[c1,...,c2n] (8)
5) deep network: combining a deep neural network DNN component with a shallow model to form a deep model; the input is an output vector C of a connection layer, and the deep network is composed of a plurality of full connection layers and can implicitly capture high-order features; let a(0)=[c1,...,c2n]Representing an initial input, will a(0)And (3) irrigating the deep neural network, wherein the feed-forward process is as follows:
a(l)=σ(Wla(l-1)+bl) (9)
a(l)σ is a function of sigmoid, W, for the output of the l-th layer of the deep networklDisplay moduleWeight matrix of type, blRepresents the offset of the model;
after L layers, generating a dense real-valued feature vector, and inputting the dense real-valued feature vector into a sigmoid function for CTR prediction:
yd=σ(WLa(L+1)+bL+1) (10)
wherein L is the number of layers of the depth model;
6) prediction layer: the output expression of the model prediction layer is as follows:
Figure FDA0003183378020000022
wherein
Figure FDA0003183378020000023
As a predicted value of the model, σ is a sigmoid function; m is taken as the characteristic size; w is aiIs the ith weight of the linear part;
the entire training process aims to minimize the following objective function:
Figure FDA0003183378020000031
wherein y isiIs the true label for the ith sample,
Figure FDA0003183378020000032
a prediction tag corresponding to the ith sample; n is the total number of samples for the mobile application.
2. The feature importance and bilinear feature interaction-based mobile application recommendation method according to claim 1, wherein the feature compression in the step 2) comprises the following specific steps: compressing original embedding E to a statistical vector z ═ z using an average pooling operation1,...,zi,...,zf]In ziCan be calculated by the following formula:
Figure FDA0003183378020000033
ziglobal information about the ith feature representation is represented, and k represents the embedded dimension size.
3. The feature importance and bilinear feature interaction based mobile application recommendation method according to claim 1, wherein the specific steps stimulated in the step 2) are as follows: learning the weights using two fully-connected (FC) layers based on the embedded weights of each domain of the statistical vector z, the first fully-connected layer being a dimension-reduction layer of the parameter w that uses σ as a non-linear function; the dimension of the second full connection layer is increased through the parameter w and is restored to the original dimension; formally, the weight of the domain embedding can be computed as follows:
A=σ2(W2σ1(W1z)) (3)
wherein A ∈ RfAs a vector, σ1And σ2Is an activation function.
4. The method for recommending mobile applications based on feature importance and bilinear feature interaction according to claim 1, wherein the step 2) of weighting specifically comprises the following steps: multiplying each domain of the embedding layer by the corresponding weight to obtain the final embedding result of V ═ V { (V)1,...,vf}; the embedding V of SENET can be calculated as follows:
V=[a1·e1,...,af·ef]=[v1,...,vf] (4)。
CN202110853887.9A 2021-07-28 2021-07-28 Mobile application recommendation method based on feature importance and bilinear feature interaction Active CN113554491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110853887.9A CN113554491B (en) 2021-07-28 2021-07-28 Mobile application recommendation method based on feature importance and bilinear feature interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110853887.9A CN113554491B (en) 2021-07-28 2021-07-28 Mobile application recommendation method based on feature importance and bilinear feature interaction

Publications (2)

Publication Number Publication Date
CN113554491A true CN113554491A (en) 2021-10-26
CN113554491B CN113554491B (en) 2024-04-16

Family

ID=78133015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110853887.9A Active CN113554491B (en) 2021-07-28 2021-07-28 Mobile application recommendation method based on feature importance and bilinear feature interaction

Country Status (1)

Country Link
CN (1) CN113554491B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150532A1 (en) * 2010-12-08 2012-06-14 At&T Intellectual Property I, L.P. System and method for feature-rich continuous space language models
US20150278441A1 (en) * 2014-03-25 2015-10-01 Nec Laboratories America, Inc. High-order semi-Restricted Boltzmann Machines and Deep Models for accurate peptide-MHC binding prediction
CN110826700A (en) * 2019-11-13 2020-02-21 中国科学技术大学 Method for realizing and classifying bilinear graph neural network model for modeling neighbor interaction
CN112365297A (en) * 2020-12-04 2021-02-12 东华理工大学 Advertisement click rate estimation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150532A1 (en) * 2010-12-08 2012-06-14 At&T Intellectual Property I, L.P. System and method for feature-rich continuous space language models
US20150278441A1 (en) * 2014-03-25 2015-10-01 Nec Laboratories America, Inc. High-order semi-Restricted Boltzmann Machines and Deep Models for accurate peptide-MHC binding prediction
CN110826700A (en) * 2019-11-13 2020-02-21 中国科学技术大学 Method for realizing and classifying bilinear graph neural network model for modeling neighbor interaction
CN112365297A (en) * 2020-12-04 2021-02-12 东华理工大学 Advertisement click rate estimation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹步清;肖巧翔;张祥平;刘建勋: "融合SOM功能聚类与DeepFM质量预测的API服务推荐方法", 计算机学报, no. 006 *

Also Published As

Publication number Publication date
CN113554491B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
Vafeiadis et al. A comparison of machine learning techniques for customer churn prediction
Cavalcanti et al. Combining diversity measures for ensemble pruning
Boughida et al. A novel approach for facial expression recognition based on Gabor filters and genetic algorithm
CN107516129A (en) The depth Web compression method decomposed based on the adaptive Tucker of dimension
CN110083770B (en) Sequence recommendation method based on deeper feature level self-attention network
Duma et al. Sparseness reduction in collaborative filtering using a nearest neighbour artificial immune system with genetic algorithms
CN111352965A (en) Training method of sequence mining model, and processing method and equipment of sequence data
CN112085565A (en) Deep learning-based information recommendation method, device, equipment and storage medium
CN109325875A (en) Implicit group based on the hidden feature of online social user finds method
CN112287166A (en) Movie recommendation method and system based on improved deep belief network
Wu et al. Optimized deep learning framework for water distribution data-driven modeling
CN114154565A (en) Click rate prediction method and device based on multi-level feature interaction
Tiezzi et al. Graph neural networks for graph drawing
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
CN116228368A (en) Advertisement click rate prediction method based on deep multi-behavior network
CN115718826A (en) Method, system, device and medium for classifying target nodes in graph structure data
CN113837492A (en) Method, apparatus, storage medium, and program product for predicting supply amount of article
Luan et al. LRP‐based network pruning and policy distillation of robust and non‐robust DRL agents for embedded systems
Deng et al. Sparsity-control ternary weight networks
CN116976461A (en) Federal learning method, apparatus, device and medium
CN113554491A (en) Mobile application recommendation method based on feature importance and bilinear feature interaction
CN114842247B (en) Characteristic accumulation-based graph convolution network semi-supervised node classification method
Su et al. Sampling-free learning of Bayesian quantized neural networks
CN115689639A (en) Commercial advertisement click rate prediction method based on deep learning
Fonseca et al. A similarity-based surrogate model for enhanced performance in genetic algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant