CN114780841B - KPHAN-based sequence recommendation method - Google Patents
KPHAN-based sequence recommendation method Download PDFInfo
- Publication number
- CN114780841B CN114780841B CN202210416700.3A CN202210416700A CN114780841B CN 114780841 B CN114780841 B CN 114780841B CN 202210416700 A CN202210416700 A CN 202210416700A CN 114780841 B CN114780841 B CN 114780841B
- Authority
- CN
- China
- Prior art keywords
- item
- user
- sequence
- attention
- embedding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000007774 longterm Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 7
- 230000004927 fusion Effects 0.000 claims abstract description 6
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 32
- 230000003993 interaction Effects 0.000 claims description 25
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 239000013598 vector Substances 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 17
- 230000003044 adaptive effect Effects 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000003068 static effect Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 210000002569 neuron Anatomy 0.000 claims description 6
- 230000008447 perception Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 239000012633 leachable Substances 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims 1
- 230000001976 improved effect Effects 0.000 abstract description 20
- 230000001186 cumulative effect Effects 0.000 abstract description 7
- 230000009286 beneficial effect Effects 0.000 abstract description 5
- 238000002474 experimental method Methods 0.000 description 5
- 238000002679 ablation Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- YUXIIBHHAPNFCQ-UHFFFAOYSA-N 1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,10-henicosafluorodecane-1-sulfonamide Chemical compound FC(C(C(C(C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F)(S(=O)(=O)N)F YUXIIBHHAPNFCQ-UHFFFAOYSA-N 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Mathematics (AREA)
- Animal Behavior & Ethology (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of recommendation, in particular to a KPHAN-based sequence recommendation method. KPHAN consist essentially of KGSP and LSPF modules. At KGSP, the item corresponding entity and entity context are encoded using knowledge embedding techniques; the project representation is enhanced by adopting a knowledge bidirectional fusion mode, and the global short-term preference characteristics of the user are captured by modeling the project sequence. In the LSPF, the long-term preference of the user is captured by using a personalized hierarchy attention mechanism, and the short-term preference of the user is fused to complete the training and prediction of the user preference. The beneficial effects are that: semantic association is provided for the sequence items, so that the item identification degree is improved, and the similarity between the items is well revealed. The personalized hierarchical attention network merges the long-term preference of the user and obtains more accurate and comprehensive personalized user preference. The hit rate is improved by 10.7% on average, and the normalized folding loss cumulative gain is improved by 13.5% on average.
Description
Technical Field
The invention relates to the technical field of recommendation, in particular to a KPHAN-based sequence recommendation method.
Background
With the advent of the big data age, information overload is urgent to be solved, and recommendation systems have been developed. The main objective of the recommendation system is to search from mass online content and services, find partial products meeting the interest preferences of the user and recommend the partial products to the user. The current recommendation technology has been widely applied to a plurality of e-commerce platforms such as Taobao, amazon, yelp and the like, plays an important role in promoting the service growth and dynamic decision process of the platform, and improves the user satisfaction to a great extent. Although recommendation systems have achieved great success, conventional recommendation techniques generally assume that the user's preferences are stable and constant, and only capture the user's long-term general preferences, resulting in recommendation results with certain limitations; in fact, over time, both the user's preferences and the popularity of the merchandise are dynamic. Sequential recommendation aims at recommending items to be interacted next time for a user according to the historical sequence of the user interaction items, and can flexibly capture useful sequential modes and interest drift of the user to make more accurate and dynamic recommendation. Due to its utility, more and more researchers began study sequence recommendations.
The existing sequence recommendation methods are mainly divided into two main categories, namely, the first category: the traditional method mainly comprises sequence pattern mining, markov chain and matrix decomposition. The Markov chain-based method mainly deduces the future interests of the user according to the last or last few behaviors, the next item is deduced by utilizing the last interactive item based on first-order Markov, and FPMC is a representative model which fuses matrix decomposition and Markov chains to carry out personalized next sequence basket recommendation. FOSSIL, FISM make sequence recommendations by combining a similarity-based approach with a higher order markov chain approach that takes into account the dependence of more behavior. Such methods cannot capture long-term complex dependencies and are susceptible to data sparseness. The second category is sequence recommendation based on deep learning models. The cyclic neural network is very effective in modeling data with sequence characteristics, the GRU4Rec captures long-term dependence in a session by using the GRU for the first time, and the interaction in the session is used as sequence history for sequence modeling, however, the methods such as RNN have limited capture of long-term preference; a convolutional neural network-based method has also emerged, primarily for capturing local features in a sequence. Models based on self-attention mechanisms achieve the most advanced performance in machine translation tasks, so the SASRec model first applies self-attention to sequence recommendations, which can solve the limitations set forth above and becomes the mainstream framework.
Although the self-attention mechanism based approach improves the performance of sequence recommendations to some extent, two problems remain. Firstly, the method only considers the sequence conversion mode among item features, ignores the importance of auxiliary information on improving the sequence recommendation performance, in fact, semantic association among items has a certain auxiliary effect on improving the identification of the items, revealing the similarity of the items, mining the preference of a user, such as a movie buddha and a finger ring king, which are both directors by the Jackson, and the two items have stronger semantic association at the level of directors, and although the prior art utilizes a knowledge-enhanced memory network to capture the preference of the attribute level of the user, the semantic association among the sequence items cannot be dynamically captured. Second, previous self-attention mechanism based methods all use a representation of the last time step of the sequence model to represent the sequential behavior of the entire sequence and as a final preference representation for the user, whereas the representation of the last time step can only represent the current user's preference, i.e. a short-term preference, ignoring the long-term static preferences of the user in the sequence, i.e. the user's long-term unchanged preferences, and more importantly the invention herein devised a sequence-aware penalty function to make the recommendation more accurate and efficient.
Disclosure of Invention
The invention aims to provide a KPHAN-based sequence recommendation method and a KPHAN-based sequence recommendation device, so that the defects in the prior art are overcome, and KPHAN refers to a knowledge-enhanced personalized hierarchical attention network.
The technical scheme of the invention is that a knowledge-enhanced personalized hierarchical attention network (KPHAN) is provided, knowledge information is utilized, and long-term and short-term preferences of users are considered to improve recommendation performance.
The method comprises two main modules: a knowledge-enhanced global short-term preference module (KGSP) and a personalized hierarchical attention-aware long-term preference fusion module (LSPF), comprising the steps of:
Step one: in order to enrich the representation of the items, the semantic association between the items is enhanced, the item identification is improved, and the knowledge information of the items is integrated into the items. Therefore, the invention proposes to embed the knowledge graph into the representation of the modeling rich item, and encode the entities and the relations in the knowledge graph into low-dimensional dense vectors on the premise of retaining the structural information and semantic association of the knowledge graph;
Specifically, (1.1) firstly, using an entity linking method in KB4Rec (specific implementation reference [1 ]), finding an entity E E corresponding to an item v in a knowledge graph, wherein E is a set of all entities of the knowledge graph;
(1.2) then finding out a first-order sub-graph (e, R, e t) e G from the entity, wherein (e, e t) represents the head node and the tail node in the first-order sub-graph G, respectively, and R e R represents all the relations in the first-order sub-graph;
(1.3) training the first-order subgraph through TransE model (specific implementation reference [2 ]) to obtain low-dimensional dense vector of entity corresponding to the project The rest of the entities in the first-order subgraph are embedded/>The entity and the relation are expressed in the same space through TransE model, so that the closer the sum of the head node vector and the relation vector is to the tail node, namely, e+r is approximately equal to e t, the score function of TransE is as follows:
Wherein the method comprises the steps of Represents an L2 norm;
(1.4) in order to train TransE models, the optimized objective function uses a loss function of negative sampling and maximum spacing strategy, so that the positive sample score is higher and the negative sample score is lower and the loss function is as follows:
wherein dpos represents the score of the positive triplet, dneg represents the score of the negative triplet, margin represents the maximum interval, and max is the maximum function;
Step two: since the first-order neighbors of an entity (the first-order neighbor entity is called a context entity in the present invention) are usually important features for providing semantic association, the context entity is introduced on the basis of embedding the entity corresponding to the item in the step one, and the context entity of the entity corresponding to the item is defined as:
context(e)={et|(e,r,et)∈G}
the corresponding context entity embedding is obtained by calculating an average value of the context entity embedding:
Wherein the method comprises the steps of Context entity embedding of the entity corresponding to the item is represented, wherein I context (e) is the number of the context entities, and sigma represents summation operation;
step three: the maximum length of the model interaction sequence is set as n, the maximum sequence length of the user interaction sequence is valued in a sliding window mode, and if the interaction sequence is less than n, 0 is supplemented on the left side of the sequence; creating an item embedding matrix M epsilon R |V|×d, wherein d represents the item embedding dimension, v represents the number of items, and each time the sequence modeling retrieves the corresponding input embedding matrix Adding a leachable location embedding/>, for each itemThe final sequence item is embedded as/>Previous methods will directly/>As an input of the adaptive attention module, however, the semantic features of the item are ignored only by adopting the mode of item embedding, so that the entity embedding corresponding to the item in the first step and the context embedding corresponding to the item in the second step are integrated into the item to enhance the representation of the item and improve the item identification, and the bidirectional integration f Bi-interaction is adopted to embed/>, on the final sequence itemSequence item embedding/>, obtained by integrating knowledge and enhanced by knowledgeAnd dropoutlayer (neurons are discarded with a certain probability during training) is used for relieving the problem of overfitting of the deep neural network, wherein leakyReLU is a nonlinear activation function, W 1 and W 2 are learnable parameters, and e represents vector corresponding element multiplication;
Wherein the method comprises the steps of Is the entity embedding matrix corresponding to the sequence item v,/>Is the context embedding matrix of the entity corresponding to item v,/>And/>Embedding a context entity corresponding to the project into the same space of the project through the same full-connection network as the context entity and applying a tanh nonlinear activation function, wherein W, W1, W2 and b are learnable parameters;
step four: embedding the sequence item with enhanced knowledge on the basis of the third step Acquiring short-term interests of a user as input to an adaptive attention module; specifically, the adaptive attention module includes an adaptive attention and feed forward neural network (FFN); wherein the self-attention portion employs scalar dot product defined as: /(I)
S=Attention(Q,K,V)
Wherein Q represents queries, K represents keys, V represents values, Is a learnable parameter, layerNorm represents layer normalization; notably, to ensure the chronological order of the sequences, a mask mechanism is adopted in the computation of Q, K and V, Q i and K j satisfy the dependency relationship between item i and item j calculated by the attention weights between i < j, Q i and K j, and the sequential change of the sequences is captured by calculating the attention weighted sum of the previous items; in order to add nonlinearity to the model, the feedforward neural network is performed after the attention to obtain deeper features, and a two-layer feedforward neural network is adopted, specifically:
F=FFN(S)=Relu(dropout(Relu(dropout(SW(1)+b)))W(2)+b))
Wherein Relu is a nonlinear activation function, dropout means that neurons are discarded according to a certain probability in training, W (1),W(2), b is a learnable parameter of a neural network, and S is a sequence representation after self-attention; experiments show that propagating low-level features to high-level is more beneficial for model learning, so residual connection is used in both the self-attention module and the feedforward neural network part, and in order to learn more complex dependency relationships, omega self-adaptive attention blocks (self-attention and FFN) are stacked, and b < th > block (b > 1) is defined as:
S(b)=Attention(F(b-1))
F(b)=FFN(S(b))
when b=1, s=attention (Q, K, V), f=ffn (S), For the feature vector of the last item in the last block, it merges the features of all previous items to obtain more accurate current item features, which are important for user preference prediction, so will/>Consider short-term preferences of the user; after the global dynamic short-term preference module with enhanced knowledge is described, the short-term preference fusion module with personalized level attention perception is described in the following steps;
Step five: in addition to the user's current preferences acting on the user's next item interactions, the user's long-term static preferences, i.e., preferences that the user has not changed for a long time, also play an indispensable role in predicting the user's next item interactions, the user's long-term preferences being present in items that the user interacted with, where it will be And carrying out similarity calculation with the candidate items to obtain long-term static preference of the user, wherein the similarity between the candidate items and the item at each moment is calculated as follows:
where q is the number of candidate items, Representing the first n-1 item representations after stacking ω adaptive attention modules,/>For the attention score of the project at the time t of the user, softmax is a normalization function, and finally the long-term preference/>, of the user is obtainedExpressed as a weighted sum of the first n-1 moments:
Step six: long-term and short-term preferences play a different role in predicting the next interaction item of the user, so it is more important to explore which of the long-term and short-term preferences is in predicting the next item further by using the attention:
Wherein the method comprises the steps of Is an implicit user preference extracted from items that the user interacted with, consists of long-term preferences and short-term preferences,/>Weight for short-term preference,/>Q is the weight of the long-term preference, q is the item to be interacted with next;
Step seven: to further provide personalized sequence recommendations, an explicit user preference matrix is introduced on a step six basis Where d represents the dimension, |u| represents the number of users, and fusing implicit user preferences with explicit user preferences via a weight parameter α to obtain the final user preference representation/>
To predict the next possible item of the user, a dot product calculation is performed on the end user preference and the candidate set of items to obtain the probability of the user interacting with the next item, whereinFor candidate item set/>For the score of the ith user to the candidate jth item, the higher the score is, the greater the probability of the user interacting with the item is, the candidate item sets are ordered in a descending order, and the first K items are selected as recommended items;
the whole sequence recommendation flow is described; the model is trained in an end-to-end manner, and a binary cross entropy loss function of sequence perception is designed:
Where k is the number of sequences that the user ui can maximally divide, j is the positive sample of each divided sequence, For the negative samples sampled by the user ui when taking positive samples j, here 100 negative examples are sampled for each positive example,/>Representation regularizes all parameters and embedding (item embedding) of model,/>Is positive sample interaction probability,/>Is a negative sample interaction probability.
The invention has the beneficial effects that (1) the optimal performance is obtained on two evaluation indexes, especially on a music data set, the hit rate HR is improved by 32%, and the normalized folding cumulative gain NDCG is improved by 33%. Illustrating the effectiveness of the knowledge-enhanced personalized hierarchical attention network presented herein. The knowledge-enhanced global short-term preference module provides semantic association between sequential items, improving item recognition and revealing similarity between items well. The personalized hierarchical attention network merges the long-term preference of the user and obtains more accurate and comprehensive personalized user preference. (2) Compared with the prior models FDSA and KSR added with auxiliary information, the hit rate is improved by 16.3 percent on average, and the NDCG is improved by 18.4 percent on average. Compared with the prior method, the method can capture the sequence conversion of the items and the semantic association between the items, consider the preference of long and short periods and improve the recommendation performance. (3) To verify the validity of the knowledge enhancement project presentation module and the personalized hierarchical attention module, ablation experiments were performed on three variants of KPHAN, KPHAN-K & a representing a basic model that ignores the knowledge enhancement module and the personalized hierarchical attention module. KPHAN-A is a personalized hierarchical attention ignoring the model, discussing the usefulness of knowledge. The effect of the basic model is obviously improved after knowledge content is added, the hit rate is averagely improved by 7.8% on three different data sets, and the normalized folding cumulative gain is averagely improved by 7.7%. KPHAN-K is a knowledge enhancement module of the neglect model, discusses the usefulness of the level attention, improves the hit rate on three different data sets by 1.4% on average, and improves the normalized break cumulative gain by 2.4% on average. KPHAN is a model presented herein, with significantly improved effects compared to the previous three variants. Indicating that the knowledge enhancement module and the personalized hierarchical attention module complement each other and that the model achieves the best performance.
Drawings
FIG. 1 is a flow chart of the proposed sequence recommendation model.
FIG. 2 shows the comparison result of the present invention and the reference model.
FIG. 3 shows the results of an ablation experiment according to the present invention.
Detailed Description
The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention; in addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other. The invention will be described in more detail below with reference to the accompanying drawings. Like elements are denoted by like reference numerals throughout the various figures. For purposes of clarity, the various parts of the drawings are not drawn to scale; a KPHAN-based sequence recommendation method according to an embodiment of the present invention is described below with reference to fig. 1 to 3:
s1, acquiring a first-order sub-graph knowledge graph corresponding to an item and implementing embedded coding;
s2, acquiring entity and context entity codes corresponding to the items;
s3, project features are enhanced by fusing project, entity and context entity features;
s4, acquiring global short-term preference characteristics of a user;
S5, acquiring long-term static preference characteristics of a user;
s6, fusing the long-period preference characteristics to obtain implicit user preference characteristics;
s7, merging the explicit user preference characteristics and the implicit user preference characteristics to obtain final user preference for training and prediction;
where S1-S4 are knowledge-enhanced global short-term preference modules (KGSP) and S5-S7 are personalized hierarchical attention-aware long-short-term preference fusion modules (LSPF).
The invention comprises the following specific steps:
step one: knowledge-graph can produce semantic associations between items, providing more accurate recommendations, and enhancing representations of items by modeling using knowledge-graph embedding. The aim of the knowledge graph embedding is to encode the entities and the relations in the knowledge graph into low-dimensional dense vectors on the premise of retaining the structural information and semantic association of the knowledge graph;
Specifically, (1.1) first, using KB4Rec (implementing the entity linking method in reference [1]:Zhao,Wayne Xin,et al."Kb4rec:A data set for linking knowledge bases with recommender systems."Data Intelligence 1.2(2019):121-136.Bordes,Antoine,et al.), find entity E corresponding to item v in the knowledge graph, where E is the set of all entities in the knowledge graph;
(1.2) then finding out a first-order sub-graph (e, R, e t) e G from the entity, wherein (e, e t) represents the head node and the tail node in the first-order sub-graph G, respectively, and R e R represents all the relations in the first-order sub-graph;
(1.3) finally training the first-order subgraph through TransE model (specific implementation reference [2]:"Translating embeddings for modeling multi-relational data."Advances in neural information processing systems 26(2013).) to obtain low-dimensional dense vector of entity corresponding to the project The rest of the entities in the first-order subgraph are embedded/>The entity and the relation are expressed in the same space through TransE model, so that the closer the sum of the head node vector and the relation vector is to the tail node, namely, e+r is approximately equal to e t, the score function of TransE is as follows:
Wherein the method comprises the steps of Represents an L2 norm;
(1.4) in order to train TransE models, the optimized objective function uses a loss function of negative sampling and maximum spacing strategy, so that the positive sample score is higher and the negative sample score is lower and the loss function is as follows:
wherein dpos represents the score of the positive triplet, dneg represents the score of the negative triplet, margin represents the maximum interval, and max is the maximum function;
Step two: since the first-order neighbors of an entity (the first-order neighbor entity is called a context entity in the present invention) are usually important features for providing semantic association, the context entity is introduced on the basis of embedding the entity corresponding to the item in the step one, and the context entity of the entity corresponding to the item is defined as:
context(e)={et|(e,r,et)∈G}
the corresponding context entity embedding is obtained by calculating an average value of the context entity embedding:
Wherein the method comprises the steps of Context entity embedding of the entity corresponding to the item is represented, wherein I context (e) is the number of the context entities, and sigma represents summation operation;
step three: the maximum length of the model interaction sequence is set as n, the maximum sequence length of the user interaction sequence is valued in a sliding window mode, and if the interaction sequence is less than n, 0 is supplemented on the left side of the sequence; creating an item embedding matrix M epsilon R |V|×d, wherein d represents the item embedding dimension, v represents the number of items, and each time the sequence modeling retrieves the corresponding input embedding matrix Adding a leachable location embedding/>, for each itemThe final sequence item is embedded as/>Previous methods will directly/>As an input of the adaptive attention module, however, the semantic features of the item are ignored only by adopting the mode of item embedding, so that the entity embedding corresponding to the item in the first step and the context embedding corresponding to the item in the second step are integrated into the item to enhance the representation of the item and improve the item identification, and the bidirectional integration f Bi-interaction is adopted to embed/>, on the final sequence itemSequence item embedding/>, obtained by integrating knowledge and enhanced by knowledgeAnd dorpout layer (discarding neurons with a certain probability during training) is used for alleviating the problem of deep neural network overfitting, wherein leakyReLU is a nonlinear activation function, W 1 and W 2 are learnable parameters, and the element multiplication corresponding to the vector is represented by the element multiplication;
Wherein the method comprises the steps of Is the entity embedding matrix corresponding to the sequence item v,/>Is the context embedding matrix of the entity corresponding to item v,/>And/>Embedding a context entity corresponding to the project into the same space of the project through the same full-connection network as the context entity and applying a tanh nonlinear activation function, wherein W, W1, W2 and b are learnable parameters;
step four: embedding the sequence item with enhanced knowledge on the basis of the third step Acquiring short-term interests of a user as input to an adaptive attention module; specifically, the adaptive attention module includes an adaptive attention and feed forward neural network (FFN); wherein the self-attention portion employs scalar dot product defined as: /(I)
S=Attention(Q,K,V)
Wherein Q represents queries, K represents keys, V represents values, Is a learnable parameter, layerNorm represents layer normalization; notably, to ensure the chronological order of the sequences, a mask mechanism is adopted in the computation of Q, K and V, Q i and K j satisfy the dependency relationship between item i and item j calculated by the attention weights between i < j, Q i and K j, and the sequential change of the sequences is captured by calculating the attention weighted sum of the previous items; in order to add nonlinearity to the model, the feedforward neural network is performed after the attention to obtain deeper features, and a two-layer feedforward neural network is adopted, specifically:
F=FFN(S)=Relu(dropout(Relu(dropout(SW(1)+b)))W(2)+b))
Wherein Relu is a nonlinear activation function, dropout means that neurons are discarded according to a certain probability in training, W (1),W(2), b is a learnable parameter of a neural network, and S is a sequence representation after self-attention; experiments show that propagating low-level features to high-level is more beneficial for model learning, so residual connections are used in both the self-attention module and the feedforward neural network part, and in order to learn more complex dependency relationships, ω self-attention blocks, namely (self-attention and FFN), are stacked, and the b < th > block (b > 1) is defined as:
S(b)=Attention(F(b-1))
F(b)=FFN(S(b))
when b=1, s=attention (Q, K, V), f=ffn (S), For the feature vector of the last item in the last block, it merges the features of all previous items to obtain more accurate current item features, which are important for user preference prediction, so will/>Consider short-term preferences of the user; after the global short-term preference module with enhanced knowledge is described, the following steps begin to describe the long-term preference fusion module with personalized level attention perception;
Step five: in addition to the user's current preferences acting on the user's next item interactions, the user's long-term static preferences, i.e., preferences that the user has not changed for a long time, also play an indispensable role in predicting the user's next item interactions, the user's long-term preferences being present in items that the user interacted with, where it will be And carrying out similarity calculation with the candidate items to obtain long-term static preference of the user, wherein the similarity between the candidate items and the item at each moment is calculated as follows:
where q is the number of candidate items, Representing the first n-1 item representations after stacking ω adaptive attention modules,/>For the attention score of the project at the time t of the user, softmax is a normalization function, and finally the long-term preference/>, of the user is obtainedExpressed as a weighted sum of the first n-1 moments:
Step six: long-term and short-term preferences play a different role in predicting the next interaction item of the user, so it is more important to explore which of the long-term and short-term preferences is in predicting the next item further by using the attention:
Wherein the method comprises the steps of Is an implicit user preference extracted from items that the user interacted with, consists of long-term preferences and short-term preferences,/>Weight for short-term preference,/>Q is the weight of the long-term preference, q is the item to be interacted with next;
Step seven: to further provide personalized sequence recommendations, an explicit user preference matrix is introduced on a step six basis Wherein d represents the dimension, u represents the number of users, and the implicit user preference and the explicit user preference are fused through a weight parameter alpha to obtain the final user preference representation/>
To predict the next possible item of the user, a dot product calculation is performed on the end user preference and the candidate set of items to obtain the probability of the user interacting with the next item, whereinFor candidate item set,/>For the score of the ith user to the candidate jth item, the higher the score is, the greater the probability of the user interacting with the item is, the candidate item sets are ordered in a descending order, and the first K items are selected as recommended items;
the whole sequence recommendation flow is described; the model is trained in an end-to-end manner, and a binary cross entropy loss function of sequence perception is designed:
Where k is the number of sequences that the user ui can maximally divide, j is the positive sample of each divided sequence, For the negative samples sampled by the user ui when taking positive samples j, here 100 negative examples are sampled for each positive example,/>Representing regularization of all parameters and embedding of the model (i.e., item embedding)/>Is positive sample interaction probability,/>Is a negative sample interaction probability.
The invention has the beneficial effects that as shown in fig. 2, the optimal performance is obtained on both evaluation indexes, especially on a music data set, the hit rate HR is improved by 32%, and the normalized folding cumulative gain NDCG is improved by 33%. Illustrating the effectiveness of the knowledge-enhanced personalized hierarchical attention network presented herein. The knowledge-enhanced global short-term preference module provides semantic association between sequential items, improving item recognition and revealing similarity between items well. The personalized hierarchical attention network merges the long-term preference of the user and obtains more accurate and comprehensive personalized user preference.
Compared with the prior models FDSA and KSR added with auxiliary information, the hit rate is improved by 16.3 percent on average, and the NDCG is improved by 18.4 percent on average. Compared with the prior method, the method can capture the sequence conversion of the items and the semantic association between the items, consider the preference of long and short periods and improve the recommendation performance.
Further, to verify the validity of the knowledge-enhanced project presentation module and the personalized hierarchical attention module, ablation experiments were performed on three variants of KPHAN, as shown in fig. 3. KPHAN-K & a represent the basic model of ignoring the knowledge enhancement module and the personalized hierarchical attention module. KPHAN-A is a personalized hierarchical attention ignoring the model, discussing the usefulness of knowledge. From the table, it can be seen that the effect of the basic model is obviously improved after knowledge content is added, the hit rate is averagely improved by 7.8% on three different data sets, and the normalized break cumulative gain is averagely improved by 7.7%. KPHAN-K is a knowledge enhancement module of the neglect model, discusses the usefulness of the level attention, improves the hit rate on three different data sets by 1.4% on average, and improves the normalized break cumulative gain by 2.4% on average. KPHAN is a model presented herein, with significantly improved effects compared to the previous three variants. Indicating that the knowledge enhancement module and the personalized hierarchical attention module complement each other and that the model achieves the best performance.
It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Claims (2)
1. A KPHAN-based sequence recommendation method, which is characterized in that KPHAN refers to a knowledge-enhanced personalized hierarchical attention network;
The method specifically comprises the following steps:
Step one: the method comprises the steps that a knowledge graph is used for embedding the representation of modeling enhancement items, and entities and relations in the knowledge graph are encoded into low-dimensional dense vectors on the premise that knowledge graph structural information and semantic association are reserved;
firstly, using an entity linking method in KB4Rec to find an entity E E corresponding to a project v in a knowledge graph, wherein E is a set of all entities of the knowledge graph;
(1.2) then finding out a first-order sub-graph (e, R, e t) e G from the entity, wherein (e, e t) represents the head node and the tail node in the first-order sub-graph G, respectively, and R e R represents all the relations in the first-order sub-graph;
(1.3) training the first-order subgraph through TransE model to obtain low-dimensional dense vector of entity corresponding to the project The rest of the entities in the first-order subgraph are embedded/>The entity and relationship are represented in the same space by TransE model, so that the sum of the head node vector and the relationship vector, i.e. e+r≡e t, therefore the scoring function of TransE is:
wherein/> Representing L2 norms
(1.4) In order to train TransE models, the optimized objective function uses a loss function of negative sampling and maximum spacing strategy, so that the positive sample score is higher and the negative sample score is lower and the loss function is as follows:
wherein dpos represents the score of the positive triplet, dneg represents the score of the negative triplet, margin represents the maximum interval, and max is the maximum function;
Step two: since the first-order neighbors of the entity are usually important features for providing semantic association, a context entity is introduced on the basis of embedding the entity corresponding to the item in the step one, and the context entity of the entity corresponding to the item is defined as:
context(e)={et|(e,r,et)∈G}
corresponding context entity embedding is obtained by calculating an average value of the context entity embedding
Wherein the method comprises the steps ofContext entity embedding of the entity corresponding to the item is represented, wherein I context (e) is the number of the context entities, and sigma represents summation operation;
Step three: setting the maximum length of the model interaction sequence as n, and carrying out maximum sequence length value on the user interaction sequence in the form of a sliding window, if the interaction sequence is less than n, supplementing 0 on the left side of the sequence; creating an item embedding matrix M epsilon R |V|×d, wherein d represents the item embedding dimension, v represents the number of items, and each time the sequence modeling retrieves the corresponding input embedding matrix Adding a leachable location embedding/>, for each itemThe final sequence item is embedded as/>Previous methods will directly/>As an input of the adaptive attention module, however, the semantic features of the item are ignored only by adopting the mode of item embedding, so that the entity embedding corresponding to the item in the first step and the context embedding corresponding to the item in the second step are integrated into the item to enhance the representation of the item and improve the item identification, and the bidirectional integration f Bi-interaction is adopted to embed/>, on the final sequence itemSequence item embedding/>, obtained by integrating knowledge and enhanced by knowledgeAnd dorpout layer is used for solving the problem of overfitting of the deep neural network by discarding neurons with a certain probability during training, wherein leakyReLU is a nonlinear activation function, W 1 and W 2 are learnable parameters, and e represents multiplication of corresponding elements of vectors;
Wherein the method comprises the steps of Is the entity embedding matrix corresponding to the sequence item v,/>Is the context embedding matrix of the entity corresponding to item v,And/>Embedding a context entity corresponding to the project into the same space of the project through the same full-connection network as the context entity and applying a tanh nonlinear activation function, wherein W, W1, W2 and b are learnable parameters;
step four: embedding the sequence item with enhanced knowledge on the basis of the third step Acquiring short-term interests of a user as input to an adaptive attention module; the adaptive attention module includes an adaptive attention and feed forward neural network (FFN); wherein the self-attention portion employs scalar dot product defined as:
S=Attention(Q,K,V)
wherein Q represents queries, K represents keys, V represents values, Is a learnable parameter, layerNorm represents layer normalization; in order to ensure the time sequence of the sequence, mask is adopted in Q, K and V calculation, Q i and K j meet the requirement that i < j, Q i and K j are attention weight calculation, namely the dependency relationship between item i and item j is calculated, and the sequence change is captured by calculating the attention weighted sum of the previous items;
In order to add nonlinearity to the model, the feedforward neural network is performed after the attention to obtain deeper features, and a two-layer feedforward neural network is adopted, specifically:
F=FFN(S)=Relu(dropout(Relu(dropout(SW(1)+b)))W(2)+b))
Wherein Relu is a nonlinear activation function, dropout means that neurons are discarded according to a certain probability in training, W (1),W(2), b is a learnable parameter of a neural network, and S is a sequence representation after self-attention; residual connections are used in both the self-attention module and the feedforward neural network part, and in order to learn more complex dependencies, ω self-attention blocks, self attention and FFN, b-th block, and b >1 are stacked, defined as:
S(b)=Attention(F(b-1))
F(b)=FFN(S(b))
when b=1, s=attention (Q, K, V), f=ffn (S), For the feature vector of the last item in the last block, it merges the features of all previous items to obtain more accurate current item features, which are important for user preference prediction, so will/>Consider short-term preferences of the user;
Step five: in addition to the user's current preferences acting on the user's next item interactions, the user's long-term static preferences also acting on predicting the user's next item interactions, the user's long-term preferences existing in the items that the user interacted with, as will be described herein And carrying out similarity calculation with the candidate items to obtain long-term static preference of the user, wherein the similarity between the candidate items and the item at each moment is calculated as follows:
where q is the number of candidate items, Representing the first n-1 item representations after stacking ω adaptive attention modules,/>For the attention score of the project at the time t of the user, softmax is a normalization function, and finally the long-term preference/>, of the user is obtainedExpressed as a weighted sum of the first n-1 moments:
Step six: exploring which of the long-term and short-term preferences is more important in predicting the next item with attention:
Wherein the method comprises the steps of Is an implicit user preference extracted from items that the user interacted with, consists of long-term preferences and short-term preferences,/>Weight for short-term preference,/>For the weight of long-term preference, q is the next item to be interacted with, softmax is the normalization function;
Step seven: to further provide personalized sequence recommendations, an explicit user preference matrix is introduced on a step six basis Where d represents the dimension, |u| represents the number of users, and fusing implicit user preferences with explicit user preferences via a weight parameter α to obtain the final user preference representation/>
To predict the next possible item of the user, a dot product calculation is performed on the end user preference and the candidate set of items to obtain the probability of the user interacting with the next item, whereinFor candidate item set,/>For the score of the ith user to the candidate jth item, the higher the score is, the greater the probability of the user interacting with the item is, the candidate item sets are ordered in a descending order, and the first K items are selected as recommended items;
the whole sequence recommendation flow is described; the model is trained in an end-to-end manner, and a binary cross entropy loss function of sequence perception is designed:
Where k is the number of sequences that the user ui can maximally divide, j is the positive sample of each divided sequence, For the negative samples sampled by the user ui when taking positive samples j, here 100 negative examples are sampled for each positive example,/>Representing all parameters and embedding of the model, namely embedding the items, and regularizing;
is positive sample interaction probability,/> Is a negative sample interaction probability.
2. The KPHAN-based sequence recommendation method as claimed in claim 1, comprising two modules: a global short-term preference module with enhanced knowledge, KGSP for short, and a long-term preference fusion module with personalized level attention perception, LSPF for short; using knowledge information to enhance the representation of the items in KGSP, modeling semantic association among the items, obtaining more accurate short-term preference characteristics of the user, and realizing the steps one to four; in the LSPF, capturing the long-term preference characteristics of the user through an attention mechanism and fusing the short-term preference characteristics of the user to obtain the final preference characteristics of the user, so as to realize the steps five to seven.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210416700.3A CN114780841B (en) | 2022-04-20 | 2022-04-20 | KPHAN-based sequence recommendation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210416700.3A CN114780841B (en) | 2022-04-20 | 2022-04-20 | KPHAN-based sequence recommendation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114780841A CN114780841A (en) | 2022-07-22 |
CN114780841B true CN114780841B (en) | 2024-04-30 |
Family
ID=82431136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210416700.3A Active CN114780841B (en) | 2022-04-20 | 2022-04-20 | KPHAN-based sequence recommendation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114780841B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116304279B (en) * | 2023-03-22 | 2024-01-26 | 烟台大学 | Active perception method and system for evolution of user preference based on graph neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516160A (en) * | 2019-08-30 | 2019-11-29 | 中国科学院自动化研究所 | User modeling method, the sequence of recommendation method of knowledge based map |
CN113590900A (en) * | 2021-07-29 | 2021-11-02 | 南京工业大学 | Sequence recommendation method fusing dynamic knowledge maps |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11443346B2 (en) * | 2019-10-14 | 2022-09-13 | Visa International Service Association | Group item recommendations for ephemeral groups based on mutual information maximization |
-
2022
- 2022-04-20 CN CN202210416700.3A patent/CN114780841B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516160A (en) * | 2019-08-30 | 2019-11-29 | 中国科学院自动化研究所 | User modeling method, the sequence of recommendation method of knowledge based map |
CN113590900A (en) * | 2021-07-29 | 2021-11-02 | 南京工业大学 | Sequence recommendation method fusing dynamic knowledge maps |
Also Published As
Publication number | Publication date |
---|---|
CN114780841A (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113254803B (en) | Social recommendation method based on multi-feature heterogeneous graph neural network | |
CN111797321B (en) | Personalized knowledge recommendation method and system for different scenes | |
CN112115377B (en) | Graph neural network link prediction recommendation method based on social relationship | |
Xi et al. | Towards open-world recommendation with knowledge augmentation from large language models | |
CN113590900A (en) | Sequence recommendation method fusing dynamic knowledge maps | |
CN110659411B (en) | Personalized recommendation method based on neural attention self-encoder | |
CN113705811B (en) | Model training method, device, computer program product and equipment | |
CN113918832B (en) | Graph convolution collaborative filtering recommendation system based on social relationship | |
CN113918833B (en) | Product recommendation method realized through graph convolution collaborative filtering of social network relationship | |
CN111581520A (en) | Item recommendation method and system based on item importance in session | |
CN114780841B (en) | KPHAN-based sequence recommendation method | |
CN113918834A (en) | Graph convolution collaborative filtering recommendation method fusing social relations | |
CN112734104A (en) | Cross-domain recommendation method for generating countermeasure network and self-encoder by fusing double generators and double discriminators | |
CN114817508A (en) | Sparse graph and multi-hop attention fused session recommendation system | |
CN115618101A (en) | Streaming media content recommendation method and device based on negative feedback and electronic equipment | |
CN111259264A (en) | Time sequence scoring prediction method based on generation countermeasure network | |
CN114282077A (en) | Session recommendation method and system based on session data | |
CN117171440A (en) | News recommendation method and system based on news event and news style joint modeling | |
CN115293812A (en) | E-commerce platform session perception recommendation prediction method based on long-term and short-term interests | |
CN113268657B (en) | Deep learning recommendation method and system based on comments and item descriptions | |
Wang et al. | Joint knowledge graph and user preference for explainable recommendation | |
CN114547276A (en) | Three-channel diagram neural network-based session recommendation method | |
CN114996566A (en) | Intelligent recommendation system and method for industrial internet platform | |
CN114022233A (en) | Novel commodity recommendation method | |
Wang et al. | A Tri‐Attention Neural Network Model‐BasedRecommendation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |