CN114780841B - KPHAN-based sequence recommendation method - Google Patents

KPHAN-based sequence recommendation method Download PDF

Info

Publication number
CN114780841B
CN114780841B CN202210416700.3A CN202210416700A CN114780841B CN 114780841 B CN114780841 B CN 114780841B CN 202210416700 A CN202210416700 A CN 202210416700A CN 114780841 B CN114780841 B CN 114780841B
Authority
CN
China
Prior art keywords
item
user
sequence
attention
embedding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210416700.3A
Other languages
Chinese (zh)
Other versions
CN114780841A (en
Inventor
杨超
阮书琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202210416700.3A priority Critical patent/CN114780841B/en
Publication of CN114780841A publication Critical patent/CN114780841A/en
Application granted granted Critical
Publication of CN114780841B publication Critical patent/CN114780841B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Mathematics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of recommendation, in particular to a KPHAN-based sequence recommendation method. KPHAN consist essentially of KGSP and LSPF modules. At KGSP, the item corresponding entity and entity context are encoded using knowledge embedding techniques; the project representation is enhanced by adopting a knowledge bidirectional fusion mode, and the global short-term preference characteristics of the user are captured by modeling the project sequence. In the LSPF, the long-term preference of the user is captured by using a personalized hierarchy attention mechanism, and the short-term preference of the user is fused to complete the training and prediction of the user preference. The beneficial effects are that: semantic association is provided for the sequence items, so that the item identification degree is improved, and the similarity between the items is well revealed. The personalized hierarchical attention network merges the long-term preference of the user and obtains more accurate and comprehensive personalized user preference. The hit rate is improved by 10.7% on average, and the normalized folding loss cumulative gain is improved by 13.5% on average.

Description

KPHAN-based sequence recommendation method
Technical Field
The invention relates to the technical field of recommendation, in particular to a KPHAN-based sequence recommendation method.
Background
With the advent of the big data age, information overload is urgent to be solved, and recommendation systems have been developed. The main objective of the recommendation system is to search from mass online content and services, find partial products meeting the interest preferences of the user and recommend the partial products to the user. The current recommendation technology has been widely applied to a plurality of e-commerce platforms such as Taobao, amazon, yelp and the like, plays an important role in promoting the service growth and dynamic decision process of the platform, and improves the user satisfaction to a great extent. Although recommendation systems have achieved great success, conventional recommendation techniques generally assume that the user's preferences are stable and constant, and only capture the user's long-term general preferences, resulting in recommendation results with certain limitations; in fact, over time, both the user's preferences and the popularity of the merchandise are dynamic. Sequential recommendation aims at recommending items to be interacted next time for a user according to the historical sequence of the user interaction items, and can flexibly capture useful sequential modes and interest drift of the user to make more accurate and dynamic recommendation. Due to its utility, more and more researchers began study sequence recommendations.
The existing sequence recommendation methods are mainly divided into two main categories, namely, the first category: the traditional method mainly comprises sequence pattern mining, markov chain and matrix decomposition. The Markov chain-based method mainly deduces the future interests of the user according to the last or last few behaviors, the next item is deduced by utilizing the last interactive item based on first-order Markov, and FPMC is a representative model which fuses matrix decomposition and Markov chains to carry out personalized next sequence basket recommendation. FOSSIL, FISM make sequence recommendations by combining a similarity-based approach with a higher order markov chain approach that takes into account the dependence of more behavior. Such methods cannot capture long-term complex dependencies and are susceptible to data sparseness. The second category is sequence recommendation based on deep learning models. The cyclic neural network is very effective in modeling data with sequence characteristics, the GRU4Rec captures long-term dependence in a session by using the GRU for the first time, and the interaction in the session is used as sequence history for sequence modeling, however, the methods such as RNN have limited capture of long-term preference; a convolutional neural network-based method has also emerged, primarily for capturing local features in a sequence. Models based on self-attention mechanisms achieve the most advanced performance in machine translation tasks, so the SASRec model first applies self-attention to sequence recommendations, which can solve the limitations set forth above and becomes the mainstream framework.
Although the self-attention mechanism based approach improves the performance of sequence recommendations to some extent, two problems remain. Firstly, the method only considers the sequence conversion mode among item features, ignores the importance of auxiliary information on improving the sequence recommendation performance, in fact, semantic association among items has a certain auxiliary effect on improving the identification of the items, revealing the similarity of the items, mining the preference of a user, such as a movie buddha and a finger ring king, which are both directors by the Jackson, and the two items have stronger semantic association at the level of directors, and although the prior art utilizes a knowledge-enhanced memory network to capture the preference of the attribute level of the user, the semantic association among the sequence items cannot be dynamically captured. Second, previous self-attention mechanism based methods all use a representation of the last time step of the sequence model to represent the sequential behavior of the entire sequence and as a final preference representation for the user, whereas the representation of the last time step can only represent the current user's preference, i.e. a short-term preference, ignoring the long-term static preferences of the user in the sequence, i.e. the user's long-term unchanged preferences, and more importantly the invention herein devised a sequence-aware penalty function to make the recommendation more accurate and efficient.
Disclosure of Invention
The invention aims to provide a KPHAN-based sequence recommendation method and a KPHAN-based sequence recommendation device, so that the defects in the prior art are overcome, and KPHAN refers to a knowledge-enhanced personalized hierarchical attention network.
The technical scheme of the invention is that a knowledge-enhanced personalized hierarchical attention network (KPHAN) is provided, knowledge information is utilized, and long-term and short-term preferences of users are considered to improve recommendation performance.
The method comprises two main modules: a knowledge-enhanced global short-term preference module (KGSP) and a personalized hierarchical attention-aware long-term preference fusion module (LSPF), comprising the steps of:
Step one: in order to enrich the representation of the items, the semantic association between the items is enhanced, the item identification is improved, and the knowledge information of the items is integrated into the items. Therefore, the invention proposes to embed the knowledge graph into the representation of the modeling rich item, and encode the entities and the relations in the knowledge graph into low-dimensional dense vectors on the premise of retaining the structural information and semantic association of the knowledge graph;
Specifically, (1.1) firstly, using an entity linking method in KB4Rec (specific implementation reference [1 ]), finding an entity E E corresponding to an item v in a knowledge graph, wherein E is a set of all entities of the knowledge graph;
(1.2) then finding out a first-order sub-graph (e, R, e t) e G from the entity, wherein (e, e t) represents the head node and the tail node in the first-order sub-graph G, respectively, and R e R represents all the relations in the first-order sub-graph;
(1.3) training the first-order subgraph through TransE model (specific implementation reference [2 ]) to obtain low-dimensional dense vector of entity corresponding to the project The rest of the entities in the first-order subgraph are embedded/>The entity and the relation are expressed in the same space through TransE model, so that the closer the sum of the head node vector and the relation vector is to the tail node, namely, e+r is approximately equal to e t, the score function of TransE is as follows:
Wherein the method comprises the steps of Represents an L2 norm;
(1.4) in order to train TransE models, the optimized objective function uses a loss function of negative sampling and maximum spacing strategy, so that the positive sample score is higher and the negative sample score is lower and the loss function is as follows:
wherein dpos represents the score of the positive triplet, dneg represents the score of the negative triplet, margin represents the maximum interval, and max is the maximum function;
Step two: since the first-order neighbors of an entity (the first-order neighbor entity is called a context entity in the present invention) are usually important features for providing semantic association, the context entity is introduced on the basis of embedding the entity corresponding to the item in the step one, and the context entity of the entity corresponding to the item is defined as:
context(e)={et|(e,r,et)∈G}
the corresponding context entity embedding is obtained by calculating an average value of the context entity embedding:
Wherein the method comprises the steps of Context entity embedding of the entity corresponding to the item is represented, wherein I context (e) is the number of the context entities, and sigma represents summation operation;
step three: the maximum length of the model interaction sequence is set as n, the maximum sequence length of the user interaction sequence is valued in a sliding window mode, and if the interaction sequence is less than n, 0 is supplemented on the left side of the sequence; creating an item embedding matrix M epsilon R |V|×d, wherein d represents the item embedding dimension, v represents the number of items, and each time the sequence modeling retrieves the corresponding input embedding matrix Adding a leachable location embedding/>, for each itemThe final sequence item is embedded as/>Previous methods will directly/>As an input of the adaptive attention module, however, the semantic features of the item are ignored only by adopting the mode of item embedding, so that the entity embedding corresponding to the item in the first step and the context embedding corresponding to the item in the second step are integrated into the item to enhance the representation of the item and improve the item identification, and the bidirectional integration f Bi-interaction is adopted to embed/>, on the final sequence itemSequence item embedding/>, obtained by integrating knowledge and enhanced by knowledgeAnd dropoutlayer (neurons are discarded with a certain probability during training) is used for relieving the problem of overfitting of the deep neural network, wherein leakyReLU is a nonlinear activation function, W 1 and W 2 are learnable parameters, and e represents vector corresponding element multiplication;
Wherein the method comprises the steps of Is the entity embedding matrix corresponding to the sequence item v,/>Is the context embedding matrix of the entity corresponding to item v,/>And/>Embedding a context entity corresponding to the project into the same space of the project through the same full-connection network as the context entity and applying a tanh nonlinear activation function, wherein W, W1, W2 and b are learnable parameters;
step four: embedding the sequence item with enhanced knowledge on the basis of the third step Acquiring short-term interests of a user as input to an adaptive attention module; specifically, the adaptive attention module includes an adaptive attention and feed forward neural network (FFN); wherein the self-attention portion employs scalar dot product defined as: /(I)
S=Attention(Q,K,V)
Wherein Q represents queries, K represents keys, V represents values, Is a learnable parameter, layerNorm represents layer normalization; notably, to ensure the chronological order of the sequences, a mask mechanism is adopted in the computation of Q, K and V, Q i and K j satisfy the dependency relationship between item i and item j calculated by the attention weights between i < j, Q i and K j, and the sequential change of the sequences is captured by calculating the attention weighted sum of the previous items; in order to add nonlinearity to the model, the feedforward neural network is performed after the attention to obtain deeper features, and a two-layer feedforward neural network is adopted, specifically:
F=FFN(S)=Relu(dropout(Relu(dropout(SW(1)+b)))W(2)+b))
Wherein Relu is a nonlinear activation function, dropout means that neurons are discarded according to a certain probability in training, W (1),W(2), b is a learnable parameter of a neural network, and S is a sequence representation after self-attention; experiments show that propagating low-level features to high-level is more beneficial for model learning, so residual connection is used in both the self-attention module and the feedforward neural network part, and in order to learn more complex dependency relationships, omega self-adaptive attention blocks (self-attention and FFN) are stacked, and b < th > block (b > 1) is defined as:
S(b)=Attention(F(b-1))
F(b)=FFN(S(b))
when b=1, s=attention (Q, K, V), f=ffn (S), For the feature vector of the last item in the last block, it merges the features of all previous items to obtain more accurate current item features, which are important for user preference prediction, so will/>Consider short-term preferences of the user; after the global dynamic short-term preference module with enhanced knowledge is described, the short-term preference fusion module with personalized level attention perception is described in the following steps;
Step five: in addition to the user's current preferences acting on the user's next item interactions, the user's long-term static preferences, i.e., preferences that the user has not changed for a long time, also play an indispensable role in predicting the user's next item interactions, the user's long-term preferences being present in items that the user interacted with, where it will be And carrying out similarity calculation with the candidate items to obtain long-term static preference of the user, wherein the similarity between the candidate items and the item at each moment is calculated as follows:
where q is the number of candidate items, Representing the first n-1 item representations after stacking ω adaptive attention modules,/>For the attention score of the project at the time t of the user, softmax is a normalization function, and finally the long-term preference/>, of the user is obtainedExpressed as a weighted sum of the first n-1 moments:
Step six: long-term and short-term preferences play a different role in predicting the next interaction item of the user, so it is more important to explore which of the long-term and short-term preferences is in predicting the next item further by using the attention:
Wherein the method comprises the steps of Is an implicit user preference extracted from items that the user interacted with, consists of long-term preferences and short-term preferences,/>Weight for short-term preference,/>Q is the weight of the long-term preference, q is the item to be interacted with next;
Step seven: to further provide personalized sequence recommendations, an explicit user preference matrix is introduced on a step six basis Where d represents the dimension, |u| represents the number of users, and fusing implicit user preferences with explicit user preferences via a weight parameter α to obtain the final user preference representation/>
To predict the next possible item of the user, a dot product calculation is performed on the end user preference and the candidate set of items to obtain the probability of the user interacting with the next item, whereinFor candidate item set/>For the score of the ith user to the candidate jth item, the higher the score is, the greater the probability of the user interacting with the item is, the candidate item sets are ordered in a descending order, and the first K items are selected as recommended items;
the whole sequence recommendation flow is described; the model is trained in an end-to-end manner, and a binary cross entropy loss function of sequence perception is designed:
Where k is the number of sequences that the user ui can maximally divide, j is the positive sample of each divided sequence, For the negative samples sampled by the user ui when taking positive samples j, here 100 negative examples are sampled for each positive example,/>Representation regularizes all parameters and embedding (item embedding) of model,/>Is positive sample interaction probability,/>Is a negative sample interaction probability.
The invention has the beneficial effects that (1) the optimal performance is obtained on two evaluation indexes, especially on a music data set, the hit rate HR is improved by 32%, and the normalized folding cumulative gain NDCG is improved by 33%. Illustrating the effectiveness of the knowledge-enhanced personalized hierarchical attention network presented herein. The knowledge-enhanced global short-term preference module provides semantic association between sequential items, improving item recognition and revealing similarity between items well. The personalized hierarchical attention network merges the long-term preference of the user and obtains more accurate and comprehensive personalized user preference. (2) Compared with the prior models FDSA and KSR added with auxiliary information, the hit rate is improved by 16.3 percent on average, and the NDCG is improved by 18.4 percent on average. Compared with the prior method, the method can capture the sequence conversion of the items and the semantic association between the items, consider the preference of long and short periods and improve the recommendation performance. (3) To verify the validity of the knowledge enhancement project presentation module and the personalized hierarchical attention module, ablation experiments were performed on three variants of KPHAN, KPHAN-K & a representing a basic model that ignores the knowledge enhancement module and the personalized hierarchical attention module. KPHAN-A is a personalized hierarchical attention ignoring the model, discussing the usefulness of knowledge. The effect of the basic model is obviously improved after knowledge content is added, the hit rate is averagely improved by 7.8% on three different data sets, and the normalized folding cumulative gain is averagely improved by 7.7%. KPHAN-K is a knowledge enhancement module of the neglect model, discusses the usefulness of the level attention, improves the hit rate on three different data sets by 1.4% on average, and improves the normalized break cumulative gain by 2.4% on average. KPHAN is a model presented herein, with significantly improved effects compared to the previous three variants. Indicating that the knowledge enhancement module and the personalized hierarchical attention module complement each other and that the model achieves the best performance.
Drawings
FIG. 1 is a flow chart of the proposed sequence recommendation model.
FIG. 2 shows the comparison result of the present invention and the reference model.
FIG. 3 shows the results of an ablation experiment according to the present invention.
Detailed Description
The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention; in addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other. The invention will be described in more detail below with reference to the accompanying drawings. Like elements are denoted by like reference numerals throughout the various figures. For purposes of clarity, the various parts of the drawings are not drawn to scale; a KPHAN-based sequence recommendation method according to an embodiment of the present invention is described below with reference to fig. 1 to 3:
s1, acquiring a first-order sub-graph knowledge graph corresponding to an item and implementing embedded coding;
s2, acquiring entity and context entity codes corresponding to the items;
s3, project features are enhanced by fusing project, entity and context entity features;
s4, acquiring global short-term preference characteristics of a user;
S5, acquiring long-term static preference characteristics of a user;
s6, fusing the long-period preference characteristics to obtain implicit user preference characteristics;
s7, merging the explicit user preference characteristics and the implicit user preference characteristics to obtain final user preference for training and prediction;
where S1-S4 are knowledge-enhanced global short-term preference modules (KGSP) and S5-S7 are personalized hierarchical attention-aware long-short-term preference fusion modules (LSPF).
The invention comprises the following specific steps:
step one: knowledge-graph can produce semantic associations between items, providing more accurate recommendations, and enhancing representations of items by modeling using knowledge-graph embedding. The aim of the knowledge graph embedding is to encode the entities and the relations in the knowledge graph into low-dimensional dense vectors on the premise of retaining the structural information and semantic association of the knowledge graph;
Specifically, (1.1) first, using KB4Rec (implementing the entity linking method in reference [1]:Zhao,Wayne Xin,et al."Kb4rec:A data set for linking knowledge bases with recommender systems."Data Intelligence 1.2(2019):121-136.Bordes,Antoine,et al.), find entity E corresponding to item v in the knowledge graph, where E is the set of all entities in the knowledge graph;
(1.2) then finding out a first-order sub-graph (e, R, e t) e G from the entity, wherein (e, e t) represents the head node and the tail node in the first-order sub-graph G, respectively, and R e R represents all the relations in the first-order sub-graph;
(1.3) finally training the first-order subgraph through TransE model (specific implementation reference [2]:"Translating embeddings for modeling multi-relational data."Advances in neural information processing systems 26(2013).) to obtain low-dimensional dense vector of entity corresponding to the project The rest of the entities in the first-order subgraph are embedded/>The entity and the relation are expressed in the same space through TransE model, so that the closer the sum of the head node vector and the relation vector is to the tail node, namely, e+r is approximately equal to e t, the score function of TransE is as follows:
Wherein the method comprises the steps of Represents an L2 norm;
(1.4) in order to train TransE models, the optimized objective function uses a loss function of negative sampling and maximum spacing strategy, so that the positive sample score is higher and the negative sample score is lower and the loss function is as follows:
wherein dpos represents the score of the positive triplet, dneg represents the score of the negative triplet, margin represents the maximum interval, and max is the maximum function;
Step two: since the first-order neighbors of an entity (the first-order neighbor entity is called a context entity in the present invention) are usually important features for providing semantic association, the context entity is introduced on the basis of embedding the entity corresponding to the item in the step one, and the context entity of the entity corresponding to the item is defined as:
context(e)={et|(e,r,et)∈G}
the corresponding context entity embedding is obtained by calculating an average value of the context entity embedding:
Wherein the method comprises the steps of Context entity embedding of the entity corresponding to the item is represented, wherein I context (e) is the number of the context entities, and sigma represents summation operation;
step three: the maximum length of the model interaction sequence is set as n, the maximum sequence length of the user interaction sequence is valued in a sliding window mode, and if the interaction sequence is less than n, 0 is supplemented on the left side of the sequence; creating an item embedding matrix M epsilon R |V|×d, wherein d represents the item embedding dimension, v represents the number of items, and each time the sequence modeling retrieves the corresponding input embedding matrix Adding a leachable location embedding/>, for each itemThe final sequence item is embedded as/>Previous methods will directly/>As an input of the adaptive attention module, however, the semantic features of the item are ignored only by adopting the mode of item embedding, so that the entity embedding corresponding to the item in the first step and the context embedding corresponding to the item in the second step are integrated into the item to enhance the representation of the item and improve the item identification, and the bidirectional integration f Bi-interaction is adopted to embed/>, on the final sequence itemSequence item embedding/>, obtained by integrating knowledge and enhanced by knowledgeAnd dorpout layer (discarding neurons with a certain probability during training) is used for alleviating the problem of deep neural network overfitting, wherein leakyReLU is a nonlinear activation function, W 1 and W 2 are learnable parameters, and the element multiplication corresponding to the vector is represented by the element multiplication;
Wherein the method comprises the steps of Is the entity embedding matrix corresponding to the sequence item v,/>Is the context embedding matrix of the entity corresponding to item v,/>And/>Embedding a context entity corresponding to the project into the same space of the project through the same full-connection network as the context entity and applying a tanh nonlinear activation function, wherein W, W1, W2 and b are learnable parameters;
step four: embedding the sequence item with enhanced knowledge on the basis of the third step Acquiring short-term interests of a user as input to an adaptive attention module; specifically, the adaptive attention module includes an adaptive attention and feed forward neural network (FFN); wherein the self-attention portion employs scalar dot product defined as: /(I)
S=Attention(Q,K,V)
Wherein Q represents queries, K represents keys, V represents values, Is a learnable parameter, layerNorm represents layer normalization; notably, to ensure the chronological order of the sequences, a mask mechanism is adopted in the computation of Q, K and V, Q i and K j satisfy the dependency relationship between item i and item j calculated by the attention weights between i < j, Q i and K j, and the sequential change of the sequences is captured by calculating the attention weighted sum of the previous items; in order to add nonlinearity to the model, the feedforward neural network is performed after the attention to obtain deeper features, and a two-layer feedforward neural network is adopted, specifically:
F=FFN(S)=Relu(dropout(Relu(dropout(SW(1)+b)))W(2)+b))
Wherein Relu is a nonlinear activation function, dropout means that neurons are discarded according to a certain probability in training, W (1),W(2), b is a learnable parameter of a neural network, and S is a sequence representation after self-attention; experiments show that propagating low-level features to high-level is more beneficial for model learning, so residual connections are used in both the self-attention module and the feedforward neural network part, and in order to learn more complex dependency relationships, ω self-attention blocks, namely (self-attention and FFN), are stacked, and the b < th > block (b > 1) is defined as:
S(b)=Attention(F(b-1))
F(b)=FFN(S(b))
when b=1, s=attention (Q, K, V), f=ffn (S), For the feature vector of the last item in the last block, it merges the features of all previous items to obtain more accurate current item features, which are important for user preference prediction, so will/>Consider short-term preferences of the user; after the global short-term preference module with enhanced knowledge is described, the following steps begin to describe the long-term preference fusion module with personalized level attention perception;
Step five: in addition to the user's current preferences acting on the user's next item interactions, the user's long-term static preferences, i.e., preferences that the user has not changed for a long time, also play an indispensable role in predicting the user's next item interactions, the user's long-term preferences being present in items that the user interacted with, where it will be And carrying out similarity calculation with the candidate items to obtain long-term static preference of the user, wherein the similarity between the candidate items and the item at each moment is calculated as follows:
where q is the number of candidate items, Representing the first n-1 item representations after stacking ω adaptive attention modules,/>For the attention score of the project at the time t of the user, softmax is a normalization function, and finally the long-term preference/>, of the user is obtainedExpressed as a weighted sum of the first n-1 moments:
Step six: long-term and short-term preferences play a different role in predicting the next interaction item of the user, so it is more important to explore which of the long-term and short-term preferences is in predicting the next item further by using the attention:
Wherein the method comprises the steps of Is an implicit user preference extracted from items that the user interacted with, consists of long-term preferences and short-term preferences,/>Weight for short-term preference,/>Q is the weight of the long-term preference, q is the item to be interacted with next;
Step seven: to further provide personalized sequence recommendations, an explicit user preference matrix is introduced on a step six basis Wherein d represents the dimension, u represents the number of users, and the implicit user preference and the explicit user preference are fused through a weight parameter alpha to obtain the final user preference representation/>
To predict the next possible item of the user, a dot product calculation is performed on the end user preference and the candidate set of items to obtain the probability of the user interacting with the next item, whereinFor candidate item set,/>For the score of the ith user to the candidate jth item, the higher the score is, the greater the probability of the user interacting with the item is, the candidate item sets are ordered in a descending order, and the first K items are selected as recommended items;
the whole sequence recommendation flow is described; the model is trained in an end-to-end manner, and a binary cross entropy loss function of sequence perception is designed:
Where k is the number of sequences that the user ui can maximally divide, j is the positive sample of each divided sequence, For the negative samples sampled by the user ui when taking positive samples j, here 100 negative examples are sampled for each positive example,/>Representing regularization of all parameters and embedding of the model (i.e., item embedding)/>Is positive sample interaction probability,/>Is a negative sample interaction probability.
The invention has the beneficial effects that as shown in fig. 2, the optimal performance is obtained on both evaluation indexes, especially on a music data set, the hit rate HR is improved by 32%, and the normalized folding cumulative gain NDCG is improved by 33%. Illustrating the effectiveness of the knowledge-enhanced personalized hierarchical attention network presented herein. The knowledge-enhanced global short-term preference module provides semantic association between sequential items, improving item recognition and revealing similarity between items well. The personalized hierarchical attention network merges the long-term preference of the user and obtains more accurate and comprehensive personalized user preference.
Compared with the prior models FDSA and KSR added with auxiliary information, the hit rate is improved by 16.3 percent on average, and the NDCG is improved by 18.4 percent on average. Compared with the prior method, the method can capture the sequence conversion of the items and the semantic association between the items, consider the preference of long and short periods and improve the recommendation performance.
Further, to verify the validity of the knowledge-enhanced project presentation module and the personalized hierarchical attention module, ablation experiments were performed on three variants of KPHAN, as shown in fig. 3. KPHAN-K & a represent the basic model of ignoring the knowledge enhancement module and the personalized hierarchical attention module. KPHAN-A is a personalized hierarchical attention ignoring the model, discussing the usefulness of knowledge. From the table, it can be seen that the effect of the basic model is obviously improved after knowledge content is added, the hit rate is averagely improved by 7.8% on three different data sets, and the normalized break cumulative gain is averagely improved by 7.7%. KPHAN-K is a knowledge enhancement module of the neglect model, discusses the usefulness of the level attention, improves the hit rate on three different data sets by 1.4% on average, and improves the normalized break cumulative gain by 2.4% on average. KPHAN is a model presented herein, with significantly improved effects compared to the previous three variants. Indicating that the knowledge enhancement module and the personalized hierarchical attention module complement each other and that the model achieves the best performance.
It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Claims (2)

1. A KPHAN-based sequence recommendation method, which is characterized in that KPHAN refers to a knowledge-enhanced personalized hierarchical attention network;
The method specifically comprises the following steps:
Step one: the method comprises the steps that a knowledge graph is used for embedding the representation of modeling enhancement items, and entities and relations in the knowledge graph are encoded into low-dimensional dense vectors on the premise that knowledge graph structural information and semantic association are reserved;
firstly, using an entity linking method in KB4Rec to find an entity E E corresponding to a project v in a knowledge graph, wherein E is a set of all entities of the knowledge graph;
(1.2) then finding out a first-order sub-graph (e, R, e t) e G from the entity, wherein (e, e t) represents the head node and the tail node in the first-order sub-graph G, respectively, and R e R represents all the relations in the first-order sub-graph;
(1.3) training the first-order subgraph through TransE model to obtain low-dimensional dense vector of entity corresponding to the project The rest of the entities in the first-order subgraph are embedded/>The entity and relationship are represented in the same space by TransE model, so that the sum of the head node vector and the relationship vector, i.e. e+r≡e t, therefore the scoring function of TransE is:
wherein/> Representing L2 norms
(1.4) In order to train TransE models, the optimized objective function uses a loss function of negative sampling and maximum spacing strategy, so that the positive sample score is higher and the negative sample score is lower and the loss function is as follows:
wherein dpos represents the score of the positive triplet, dneg represents the score of the negative triplet, margin represents the maximum interval, and max is the maximum function;
Step two: since the first-order neighbors of the entity are usually important features for providing semantic association, a context entity is introduced on the basis of embedding the entity corresponding to the item in the step one, and the context entity of the entity corresponding to the item is defined as:
context(e)={et|(e,r,et)∈G}
corresponding context entity embedding is obtained by calculating an average value of the context entity embedding
Wherein the method comprises the steps ofContext entity embedding of the entity corresponding to the item is represented, wherein I context (e) is the number of the context entities, and sigma represents summation operation;
Step three: setting the maximum length of the model interaction sequence as n, and carrying out maximum sequence length value on the user interaction sequence in the form of a sliding window, if the interaction sequence is less than n, supplementing 0 on the left side of the sequence; creating an item embedding matrix M epsilon R |V|×d, wherein d represents the item embedding dimension, v represents the number of items, and each time the sequence modeling retrieves the corresponding input embedding matrix Adding a leachable location embedding/>, for each itemThe final sequence item is embedded as/>Previous methods will directly/>As an input of the adaptive attention module, however, the semantic features of the item are ignored only by adopting the mode of item embedding, so that the entity embedding corresponding to the item in the first step and the context embedding corresponding to the item in the second step are integrated into the item to enhance the representation of the item and improve the item identification, and the bidirectional integration f Bi-interaction is adopted to embed/>, on the final sequence itemSequence item embedding/>, obtained by integrating knowledge and enhanced by knowledgeAnd dorpout layer is used for solving the problem of overfitting of the deep neural network by discarding neurons with a certain probability during training, wherein leakyReLU is a nonlinear activation function, W 1 and W 2 are learnable parameters, and e represents multiplication of corresponding elements of vectors;
Wherein the method comprises the steps of Is the entity embedding matrix corresponding to the sequence item v,/>Is the context embedding matrix of the entity corresponding to item v,And/>Embedding a context entity corresponding to the project into the same space of the project through the same full-connection network as the context entity and applying a tanh nonlinear activation function, wherein W, W1, W2 and b are learnable parameters;
step four: embedding the sequence item with enhanced knowledge on the basis of the third step Acquiring short-term interests of a user as input to an adaptive attention module; the adaptive attention module includes an adaptive attention and feed forward neural network (FFN); wherein the self-attention portion employs scalar dot product defined as:
S=Attention(Q,K,V)
wherein Q represents queries, K represents keys, V represents values, Is a learnable parameter, layerNorm represents layer normalization; in order to ensure the time sequence of the sequence, mask is adopted in Q, K and V calculation, Q i and K j meet the requirement that i < j, Q i and K j are attention weight calculation, namely the dependency relationship between item i and item j is calculated, and the sequence change is captured by calculating the attention weighted sum of the previous items;
In order to add nonlinearity to the model, the feedforward neural network is performed after the attention to obtain deeper features, and a two-layer feedforward neural network is adopted, specifically:
F=FFN(S)=Relu(dropout(Relu(dropout(SW(1)+b)))W(2)+b))
Wherein Relu is a nonlinear activation function, dropout means that neurons are discarded according to a certain probability in training, W (1),W(2), b is a learnable parameter of a neural network, and S is a sequence representation after self-attention; residual connections are used in both the self-attention module and the feedforward neural network part, and in order to learn more complex dependencies, ω self-attention blocks, self attention and FFN, b-th block, and b >1 are stacked, defined as:
S(b)=Attention(F(b-1))
F(b)=FFN(S(b))
when b=1, s=attention (Q, K, V), f=ffn (S), For the feature vector of the last item in the last block, it merges the features of all previous items to obtain more accurate current item features, which are important for user preference prediction, so will/>Consider short-term preferences of the user;
Step five: in addition to the user's current preferences acting on the user's next item interactions, the user's long-term static preferences also acting on predicting the user's next item interactions, the user's long-term preferences existing in the items that the user interacted with, as will be described herein And carrying out similarity calculation with the candidate items to obtain long-term static preference of the user, wherein the similarity between the candidate items and the item at each moment is calculated as follows:
where q is the number of candidate items, Representing the first n-1 item representations after stacking ω adaptive attention modules,/>For the attention score of the project at the time t of the user, softmax is a normalization function, and finally the long-term preference/>, of the user is obtainedExpressed as a weighted sum of the first n-1 moments:
Step six: exploring which of the long-term and short-term preferences is more important in predicting the next item with attention:
Wherein the method comprises the steps of Is an implicit user preference extracted from items that the user interacted with, consists of long-term preferences and short-term preferences,/>Weight for short-term preference,/>For the weight of long-term preference, q is the next item to be interacted with, softmax is the normalization function;
Step seven: to further provide personalized sequence recommendations, an explicit user preference matrix is introduced on a step six basis Where d represents the dimension, |u| represents the number of users, and fusing implicit user preferences with explicit user preferences via a weight parameter α to obtain the final user preference representation/>
To predict the next possible item of the user, a dot product calculation is performed on the end user preference and the candidate set of items to obtain the probability of the user interacting with the next item, whereinFor candidate item set,/>For the score of the ith user to the candidate jth item, the higher the score is, the greater the probability of the user interacting with the item is, the candidate item sets are ordered in a descending order, and the first K items are selected as recommended items;
the whole sequence recommendation flow is described; the model is trained in an end-to-end manner, and a binary cross entropy loss function of sequence perception is designed:
Where k is the number of sequences that the user ui can maximally divide, j is the positive sample of each divided sequence, For the negative samples sampled by the user ui when taking positive samples j, here 100 negative examples are sampled for each positive example,/>Representing all parameters and embedding of the model, namely embedding the items, and regularizing;
is positive sample interaction probability,/> Is a negative sample interaction probability.
2. The KPHAN-based sequence recommendation method as claimed in claim 1, comprising two modules: a global short-term preference module with enhanced knowledge, KGSP for short, and a long-term preference fusion module with personalized level attention perception, LSPF for short; using knowledge information to enhance the representation of the items in KGSP, modeling semantic association among the items, obtaining more accurate short-term preference characteristics of the user, and realizing the steps one to four; in the LSPF, capturing the long-term preference characteristics of the user through an attention mechanism and fusing the short-term preference characteristics of the user to obtain the final preference characteristics of the user, so as to realize the steps five to seven.
CN202210416700.3A 2022-04-20 2022-04-20 KPHAN-based sequence recommendation method Active CN114780841B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210416700.3A CN114780841B (en) 2022-04-20 2022-04-20 KPHAN-based sequence recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210416700.3A CN114780841B (en) 2022-04-20 2022-04-20 KPHAN-based sequence recommendation method

Publications (2)

Publication Number Publication Date
CN114780841A CN114780841A (en) 2022-07-22
CN114780841B true CN114780841B (en) 2024-04-30

Family

ID=82431136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210416700.3A Active CN114780841B (en) 2022-04-20 2022-04-20 KPHAN-based sequence recommendation method

Country Status (1)

Country Link
CN (1) CN114780841B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304279B (en) * 2023-03-22 2024-01-26 烟台大学 Active perception method and system for evolution of user preference based on graph neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516160A (en) * 2019-08-30 2019-11-29 中国科学院自动化研究所 User modeling method, the sequence of recommendation method of knowledge based map
CN113590900A (en) * 2021-07-29 2021-11-02 南京工业大学 Sequence recommendation method fusing dynamic knowledge maps

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11443346B2 (en) * 2019-10-14 2022-09-13 Visa International Service Association Group item recommendations for ephemeral groups based on mutual information maximization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516160A (en) * 2019-08-30 2019-11-29 中国科学院自动化研究所 User modeling method, the sequence of recommendation method of knowledge based map
CN113590900A (en) * 2021-07-29 2021-11-02 南京工业大学 Sequence recommendation method fusing dynamic knowledge maps

Also Published As

Publication number Publication date
CN114780841A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN113254803B (en) Social recommendation method based on multi-feature heterogeneous graph neural network
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN112115377B (en) Graph neural network link prediction recommendation method based on social relationship
Xi et al. Towards open-world recommendation with knowledge augmentation from large language models
CN113590900A (en) Sequence recommendation method fusing dynamic knowledge maps
CN110659411B (en) Personalized recommendation method based on neural attention self-encoder
CN113705811B (en) Model training method, device, computer program product and equipment
CN113918832B (en) Graph convolution collaborative filtering recommendation system based on social relationship
CN113918833B (en) Product recommendation method realized through graph convolution collaborative filtering of social network relationship
CN111581520A (en) Item recommendation method and system based on item importance in session
CN114780841B (en) KPHAN-based sequence recommendation method
CN113918834A (en) Graph convolution collaborative filtering recommendation method fusing social relations
CN112734104A (en) Cross-domain recommendation method for generating countermeasure network and self-encoder by fusing double generators and double discriminators
CN114817508A (en) Sparse graph and multi-hop attention fused session recommendation system
CN115618101A (en) Streaming media content recommendation method and device based on negative feedback and electronic equipment
CN111259264A (en) Time sequence scoring prediction method based on generation countermeasure network
CN114282077A (en) Session recommendation method and system based on session data
CN117171440A (en) News recommendation method and system based on news event and news style joint modeling
CN115293812A (en) E-commerce platform session perception recommendation prediction method based on long-term and short-term interests
CN113268657B (en) Deep learning recommendation method and system based on comments and item descriptions
Wang et al. Joint knowledge graph and user preference for explainable recommendation
CN114547276A (en) Three-channel diagram neural network-based session recommendation method
CN114996566A (en) Intelligent recommendation system and method for industrial internet platform
CN114022233A (en) Novel commodity recommendation method
Wang et al. A Tri‐Attention Neural Network Model‐BasedRecommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant