CN111506814B - Sequence recommendation method based on variational self-attention network - Google Patents

Sequence recommendation method based on variational self-attention network Download PDF

Info

Publication number
CN111506814B
CN111506814B CN202010273754.XA CN202010273754A CN111506814B CN 111506814 B CN111506814 B CN 111506814B CN 202010273754 A CN202010273754 A CN 202010273754A CN 111506814 B CN111506814 B CN 111506814B
Authority
CN
China
Prior art keywords
self
attention
attention network
sequence
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010273754.XA
Other languages
Chinese (zh)
Other versions
CN111506814A (en
Inventor
赵朋朋
赵静
周晓方
崔志明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202010273754.XA priority Critical patent/CN111506814B/en
Publication of CN111506814A publication Critical patent/CN111506814A/en
Application granted granted Critical
Publication of CN111506814B publication Critical patent/CN111506814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a sequence recommendation method based on a variation self-attention network, which introduces a variation self-encoder into the self-attention network to capture potential preference of a user, on one hand, the obtained self-attention vector is expressed as density through variation inference, the variance of the self-attention vector can well represent uncertainty of the preference of the user, and on the other hand, the self-attention network is adopted to learn an inference process and a generation process of the variation self-encoder, so that the self-attention encoder can well capture long-term and short-term dependence, can better capture uncertainty and dynamics of the preference of the user, and improves the accuracy of recommendation results. In addition, the application also provides a sequence recommending device and equipment based on the variation self-attention network, and the technical effects of the sequence recommending device and equipment correspond to those of the method.

Description

Sequence recommendation method based on variational self-attention network
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for sequence recommendation based on a variation self-attention network.
Background
In the age of information explosion, recommendation systems play an increasingly important role. The key to the recommendation system is to be able to describe the interests and preferences of the user accurately, however these interests and preferences always change naturally and are filled with uncertainty. Sequential recommendations attempt to capture the dynamic preferences of users, and currently have become a very attractive topic in academia and industry.
In the relevant literature, researchers have proposed various methods to predict the next item that a user might like based on the user's historical interaction sequence. FPMC is a classical approach that combines markov chains and matrix factorization models linearly to capture user preferences. However, since the weights of the different components in this approach are linearly fixed, it is not sufficient to model advanced interactions. Inspired by deep learning, many have conducted intensive research into Recurrent Neural Networks (RNNs) and have been successful in sequence recommendations. However, these RNN-based models, even with advanced memory cell structures such as Long Short Term Memory (LSTM) and gate-controlled loop units (GRU), have difficulty maintaining long-term dependencies due to gradient vanishing problems. For example, khandelwal et al demonstrate that on average about 200 contextual labels can be applied using the language model of LSTM, but only 50 nearby labels can be clearly distinguished, indicating that even LSTM is difficult to capture long-term dependencies. Furthermore, the sequential nature of RNNs makes it necessary to learn to pass useful information step by step, which makes parallelization challenging.
In recent years, self-attention networks (SANs) have found great utility in many Natural Language Processing (NLP) tasks, such as machine translation, emotion analysis, and problem solutions. SANs also show good performance and efficiency in sequence recommendations compared to traditional RNNs and Convolutional Neural Networks (CNNs). For example, kang et al have proposed a self-attention sequence recommendation (SASRec) model to capture long-term and local dependencies of items in a sequence, which were previously typically modeled by RNNs and CNNs. Unlike RNN-based models, SASRec can capture long-term dependencies well because it can access any part of the history regardless of distance. However, all of the above models model the sequential behavior of the user using deterministic methods that typically treat the user's preferences as fixed-point vectors, and thus cannot characterize the uncertainty of the user's preferences without the constraint of error terms.
FIG. 1 is a schematic diagram of a deterministic recommendation method, in FIG. 1, u is a user representation, i 1 ,i 2 ,i 3 ,i 4 Are all item representations, dashed ellipses represent potential preferences of user u, where i 1 ,i 2 ,i 3 The categories are different from each other, i 1 ,i 4 The categories are the same. As shown in FIG. 1, assume user u is associated with a sequence of items i 1 And i 2 Interactions are performed, u may be located i in the potential feature space (2D map) when learning the user's preferences using deterministic methods 1 And i 2 Is provided. If a recommendation is made based on the distance between u and the candidate item, it is possible to recommend item i to user u 3 Rather than the real item i 4 (category and i) 1 Identical) because u and i 3 Between which are locatedIs smaller. Thus, the fixed point representation cannot capture uncertainty and is prone to incorrect recommendations.
In summary, current sequential recommendation schemes represent the preferences of potential users as fixed points in the potential feature space, while fixed point vectors lack the ability to capture the uncertainty and dynamics of user preferences that are common in recommendation systems, with large limitations in capturing the user's potential preferences, resulting in inaccurate recommendation results.
Disclosure of Invention
The application aims to provide a sequence recommending method, device and equipment based on a variational self-attention network, which are used for solving the problem that an existing sequence recommending scheme cannot capture uncertainty and dynamic performance of user preference, so that a recommending result is inaccurate. The specific scheme is as follows:
in a first aspect, the present application provides a sequence recommendation method based on a variant self-attention network, including:
generating an input embedding matrix according to a historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
determining potential variables according to the variation parameters by using a re-parameterization method;
generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;
and determining candidate items with the highest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results.
Preferably, the generating the input embedding matrix according to the historical interaction sequence includes:
generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix is as follows:where i.epsilon. (1, n), n denotes the sequence length, A i Item embedded information representing the ith item, P i The location embedded information representing the i-th item.
Preferably, the step of interpolating the input embedded matrix input to a self-attention network to obtain a self-attention vector includes:
determining an inferred projection result from the input embedding matrix and the inferred projection matrix using a projection layer in the inferred self-attention network;
generating a self-attention vector from the inferred projection results using a first predetermined number of self-attention blocks in the inferred self-attention networkWherein h is 1 For the first preset number, n represents a sequence length.
Preferably, the determining a variation parameter according to the self-attention vector includes:
from the self-attention vector, the mean and variance are determined as an approximate posterior distribution q λ (z|S u ) Wherein the mean value isThe variance is->l 1 (. Cndot.) represents a linear transformation, l 2 (. Cndot.) represents another linear transformation, lambda represents the approximate parameters of the variational self-attention network, S u Representing the historical interaction sequence, z represents the potential variable.
Preferably, the determining the potential variable according to the variation parameter by using a re-parameterization method includes:
determining potential variables according to the variation parameters by using a re-parameterization method, wherein the potential variables are as follows: z=u λλ ε, where ε represents the standard Gaussian variable.
Preferably, the generating, using a generated self-attention network, a representation of the historical interaction sequence from the potential variables includes:
determining a generated projection result according to the latent variable and the generated projection matrix by utilizing a projection layer in the generated self-attention network;
using a conditional distribution p based on generating a second preset number of self-attention blocks in the self-attention network θ (S u Z), generating a representation of the historical interaction sequence from the generated projection resultsWherein h is 2 For the second preset number, θ represents the actual parameter of the variable self-attention network.
Preferably, the determining, by the prediction layer, a candidate item with the largest interaction probability according to the user preference representation, as a recommendation result, includes:
according to a target formula, the candidate item with the largest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number to serve as a recommendation result, wherein the target formula is as follows:wherein the method comprises the steps ofRepresenting the candidate item with the highest interaction probability of the predicted user u at time t,/for>Representing user preference representationsT line of> N represents the number of candidate items in the candidate item set, and d represents the vector dimension.
Preferably, the method further comprises:
optimizing real parameters and approximate parameters of the variational self-attention network according to a loss function, wherein the loss function is as follows:
wherein y is (u,t) Representing the actual interaction item of user u at time t, < >>Representing the candidate item with the highest interaction probability of the predicted user u at t time, S u Representing the historical interaction sequence, θ and λ representing the real and approximate parameters, σ, respectively, of the variational self-attention network λj Representation sigma λ J-th row, mu λj Represents u λ Is the j-th row of (2).
In a second aspect, the present application provides a sequence recommendation device based on a variant self-attention network, including:
and (3) an embedding module: the input embedding matrix is used for generating an input embedding matrix according to the historical interaction sequence, and the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
an inference module: the input embedding matrix is used for inputting and deducing a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
and a parameterization module: for determining potential variables from the variation parameters using a re-parameterization method;
the generation module is used for: for generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;
and a prediction module: and the candidate item with the largest interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.
In a third aspect, the present application provides a sequence recommendation device based on a variant self-attention network, comprising:
a memory: for storing a computer program;
a processor: for executing the computer program to implement the steps of the variant self-attention network based sequence recommendation method as described above.
The application provides a sequence recommendation method based on a variation self-attention network, which comprises the following steps: according to the history interaction sequence, an input embedding matrix comprising item embedding information and position embedding information is included; the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector; determining potential variables according to the variation parameters by utilizing a re-parameterization method; generating a representation of the historical interaction sequence from the latent variables as a representation of user preferences using the generated self-attention network; and determining candidate items with the highest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results.
In summary, the method introduces the variational self-encoder into the self-attention network to capture the potential preference of the user, on one hand, the obtained self-attention vector is expressed as density through variational inference, the variance of the self-attention vector can well represent the uncertainty of the preference of the user, and on the other hand, the self-attention network is adopted to learn the reasoning process and the generating process of the variational self-encoder, so that the self-attention encoder can well capture long-term and short-term dependence, the uncertainty and the dynamics of the preference of the user can be better captured, and the accuracy of the recommendation result is improved.
In addition, the application also provides a sequence recommending device and equipment based on the variation self-attention network, and the technical effects of the sequence recommending device and equipment correspond to those of the method, and are not repeated here.
Drawings
For a clearer description of embodiments of the application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of uncertainty provided by the present application for explaining that deterministic recommendation methods do not benefit very well from user preferences;
fig. 2 is a flowchart of an implementation of a first embodiment of a sequence recommendation method based on a variational self-attention network provided by the present application;
fig. 3 is a flowchart of a second embodiment of a sequence recommendation method based on a variational self-attention network according to the present application;
fig. 4 is a schematic diagram of a variable self-attention network according to a second embodiment of a sequence recommendation method based on a variable self-attention network provided by the present application;
FIG. 5 is a functional block diagram of an embodiment of a variable-score self-attention network-based sequence recommendation device according to the present application;
fig. 6 is a schematic structural diagram of an embodiment of a sequence recommendation device based on a variational self-attention network according to the present application.
Detailed Description
In order to better understand the aspects of the present application, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Currently, sequence recommendations have become an attractive topic in recommendation systems. Current sequence recommendation methods, including most advanced self-attention based methods, typically employ deterministic neural networks to represent the preferences of potential users as fixed points in a potential feature space. However, the fixed point vector lacks the ability to capture uncertainty and dynamics of user preferences that are common in recommendation systems, resulting in inaccurate recommendation results.
Aiming at the problem, the application provides a sequence recommending method, device and equipment based on a variation self-attention network, which introduce a variation self-encoder into the self-attention network to capture the potential preference of a user, can better capture the uncertainty and the dynamics of the preference of the user, and improves the accuracy of a recommending result.
First, the problem of the present application will be described. In this embodiment, the user is denoted as u= { U 1 ,u 2 ,...,u M And record the item as x= { X } 1 ,x 2 ,...,x N M and N represent the number of users and items, respectively. For each user U E U, ordering the interaction records of the user U according to time sequence, and obtaining the sequence interaction of the user U as followsWherein the method comprises the steps of|N u The i indicates the number of items accessed by user u. The application aims at providing the S u In the case of (1) by modeling S u To predict the next item that the user may like.
In response to the above problems, the present application, inspired by a variational self-encoder (Variational Auto Encoder, VAE), uses a variational self-attention network (VSAN) to implement sequence recommendations, hopefully to maximize the probability of the next item depending on the user's historical interaction sequenceWherein S is t u The items that user u interacted with at time t are represented. Extending this target to the entire training set, the conditional probabilities of all interaction terms in all sequences are as follows:
then, the emphasis of the model becomes how to combine the probabilities p (S u ) Modeling is performed. Following VAE, the present application first assumes a continuous latent variable z, which is sampled from a standard normal distribution, i.e., z-N (0;I). Then, by conditional distribution p θ (S u Z) to model a historical interaction sequence S u The condition distribution is parameterized by θ. Thus, joint probability p θ (S u ) Can be specified by a marginal distribution as follows:
p θ (S u )=∫p θ (S u |z)p θ (z)。
in order to optimize the parameter θ, the best approach is to maximize the above equation. However, a true posterior distribution p θ (z|S u ) Are often complex and difficult to resolve. Thus, the present application introduces a relatively simple posterior distribution q λ (z|S u ) To approximate the true posterior distribution described above. This is inferred by means of a variation, where λ represents another set of parameters. For convenience of description, θ will be referred to as a real parameter, and λ will be referred to as an approximate parameter, and p will be correspondingly referred to θ (z|S u ) Called true posterior distribution, and will q λ (z|S u ) Known as approximate posterior distribution.
By derivation and recombination, the relationship between log-likelihood and the introduced posterior distribution is:
where KL represents the Kullback-Leibler divergence. The goal of the present application has thus far been transformed into two terms to maximize the right of the above equation, which are called lower bound evidence targets (ELBO).
Finally, the application models the VAE with the aid of two neural networks, namely inferring the self-attention network and generating the self-attention network, wherein the former is through q λ (z|S u ) Based on S u To infer the diveAt vector z, the latter passes through p θ (S u I z) generates a corresponding user representation based on the potential vector z, the learning process being controlled by the ELBO described above.
The following describes a sequence recommendation method embodiment of a variable-score self-attention network, referring to fig. 2, where the first embodiment includes:
s201, generating an input embedding matrix according to a historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
the history interaction sequence refers to the project access records of the user in the past period, and the projects in the sequence are arranged according to the access sequence. In the embedding layer, the input embedding matrix includes item embedding information, and thus the input embedding matrix of the present embodiment further includes location embedding information in consideration of the fact that the self-attention network ignores location information of the history interaction sequence.
S202, the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
inputting the input embedding matrix into the inferred self-attention network to obtain an approximate posterior distribution q λ (z|S u ) Corresponding variation parameter u of (2) λ Sum sigma λ . Specifically, a self-attention vector is obtained through the projection layer and the plurality of self-attention blocks; then, a posterior distribution q is estimated from the self-attention vector λ (z|S u ) The mean and variance are the variation parameters. Unlike the conventional self-attention network, in this embodiment, it is inferred that the self-attention network employs single-head, and in this embodiment, the number of self-attention blocks stacked by the self-attention network is inferred to be a first preset number.
S203, determining potential variables according to the variation parameters by utilizing a re-parameterization method;
specifically, according to q λ (z|S u ) The latent variable z is sampled. However, dependent on u λ Sum sigma λ The sampling being an indeterminate function and not differentiable. Thus, the present embodiment utilizes a reparameterization technique, and then reparameters the latent variable z to u λ Sum sigma λ Is a function of (2).
S204, generating a representation of the historical interaction sequence according to the potential variable by using the generated self-attention network to serve as a user preference representation;
it should be noted that the above generation process focuses on the next item of the history interaction sequence, and as a preferred embodiment, may focus on a certain number of next items. Unlike the conventional self-attention network, in this embodiment, it is inferred that the self-attention network employs single-head, and in this embodiment, the number of self-attention blocks stacked by the self-attention network is inferred to be a second preset number.
And S205, determining candidate items with the largest interaction probability according to the user preference representation by utilizing a prediction layer to serve as recommendation results.
The present embodiment provides a sequence recommendation method based on a variational self-attention network, which firstly inputs the content embedded in the input into an inferred self-attention network, and expresses the obtained self-attention vector as density by applying Gaussian distribution, and uses the density to process uncertainty of user preference; next, obtaining corresponding potential variables according to the variation parameters output from the inferred self-attention network; then, to capture long-term and local dependencies of the user, another self-attention network is employed to model the generation process, generating a final user preference representation based on the latent variables; finally, the generated user preference representation is utilized to predict the next possible interactive item of the user.
The second embodiment of the sequence recommendation method based on the variation self-attention network provided by the application is realized based on the first embodiment, and is expanded to a certain extent based on the first embodiment.
Referring to fig. 3, the second embodiment specifically includes:
s301, calculating the item embedding and position embedding of each item according to the historical interaction sequence at an embedding layer to obtain an input embedding matrix;
in this embodiment, the input includes item embedding and location embedding. First, the history sequence of the user is setSequences converted to a fixed length +.>Where n represents the maximum sequence length that the variational self-attention network can model. Generating a sequence of n interaction records, constructing a continuous item embedding matrixThen obtain the input embedding matrix +.>Wherein d represents the embedding dimension and->Furthermore, a position matrix which can be learned is +.>Added to the input matrix a as the final input embedment.
In summary, the input embedding matrix of the present embodiment includes item embedding information and position embedding information of each item in the history interaction sequence. Specifically, the input embedding matrix is:
where i.epsilon. (1, n), n denotes the sequence length, A i Item embedded information representing the ith item, P i The location embedded information representing the i-th item.
S302, the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter of approximate posterior distribution is determined according to the self-attention vector, wherein the inferred self-attention network comprises a first preset number of self-attention blocks;
after the final input embedding matrix is obtained, it is input into the inferred self-attention network to output a posterior distribution q λ (z|S u ) Is used for the transformation of the corresponding variation parameters. The left side of fig. 4 demonstrates a specific structure of the inferred self-attention network. The self-attention network is defined as follows:
in the above-mentioned method, the step of,wherein->Representing the projection matrix (for purposes of distinguishing the description, the projection matrix herein is referred to as an inferred projection matrix, and the projection matrix generated from the attention network is referred to hereinafter as a generated projection matrix). In order to propagate low-level features to higher levels, the present embodiment applies the remaining connections in the network; then, in order to make training of the neural network fast and stable, layer normalization is adopted; in addition, two-layer fully connected networks with ReLU activation functions are used to take into account interactions between different potential dimensions and to make the network non-linear in capability. Finally, the whole inference process is as follows:
E=LayerNorm(D+I),
F=ReLU(EW 1 +b 1 )W 2 +b 2
G i =LayerNorm(F+E),
wherein W is 1 ,W 2 ,b 1 ,b 2 Are network parameters. For convenience and simplicity, the entire self-care network described above is defined as:
G i =SAN(I)。
through the above process, G i Essentially integrating the embedding of all previous items. In order to capture the more complex conversion of items,can be stacked by a first preset amount h 1 The following self-attention blocks:
wherein the method comprises the steps of
Then, a posterior distribution q is estimated from the final self-attention vector λ (z|S u ) The mean and variance are as follows:
wherein l 1 (. Cndot.) represents a linear transformation, l 2 (. Cndot.) represents another linear transformation. Thus, in this manner, deterministic self-attention vectorsThe variance of the gaussian distribution, which corresponds to a gaussian distribution rather than a conventional fixed point, can well capture the uncertainty of the user's preferences.
S303, determining potential variables according to the variation parameters by utilizing a re-parameterization method;
specifically, the latent variables are: z=u λλ Epsilon, where epsilon represents a standard gaussian variable, which acts to introduce noise.
S304, generating a representation of the historical interaction sequence according to the potential variables and the real condition distribution by utilizing the generated self-attention network to serve as a user preference representation;
to define the generation process, the present embodiment also uses a generated self-attention network for p-based θ (S u Z) generates the corresponding S u . Based on the latent variable z, p θ (S u Z) is expressed as:
wherein,is a representation of the user preferences that is generated from the final output of the attention layer. The structure of the generated self-attention network is shown on the right side of fig. 4, and the projection parameters are as follows:
wherein->And->Representing the projection matrix. Since the generation of the self-attention network and the inference of the self-attention network differ only in the input part, a detailed description is omitted here.
To sum up, the process of S304 is as follows: determining a generated projection result according to the latent variable and the generated projection matrix by utilizing a projection layer in the generated self-attention network; using a conditional distribution p based on generating a second preset number of self-attention blocks in the self-attention network θ (S u Z), generating a representation of the historical interaction sequence from the generated projection resultsWherein h is 2 For the second preset number, θ represents the actual parameter of the variable self-attention network.
It should be noted that the generation process described above focuses only on the next item in the user history sequence. Preferably, the next k items can be addressed. The most straightforward approach would be toConsidered as a time ordered multi-set:
for distinguishing, in this embodiment, the output of the inferred self-attention network is written asThe output generated from the attention network is noted +.>Wherein->The subscript i of (1) has no practical meaning, but takes the initial of reference (inferred), ++>The upper corner of (a) indicates the number of self-attention blocks inferred in the self-attention network; />The subscript g of (1) has no practical meaning, but takes the initials of general (generated),>the upper corner of (a) indicates the number of self-attention blocks generated in the self-attention network.
S305, determining candidate items with the largest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results;
according to a target formula, the candidate item with the largest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number to serve as a recommendation result, wherein the target formula is as follows:
wherein the method comprises the steps ofRepresenting the candidate item with the highest interaction probability of the predicted user u at time t,/for>Representing user preference representation +.>T line of>N represents the number of candidate items in the candidate item set, and d represents the vector dimension.
S306, in the evaluation stage, optimizing real parameters and approximate parameters of the variation self-attention network according to the loss function.
During the evaluation phase, the mean value of the variation distribution (i.e., u λ ) As S u Is z. Following the evidence lower bound ELBO described earlier, the loss function of the variant self-attention network in this example is:
wherein y is (u,t) Representing the actual interaction item of user u at time t,representing the candidate item with the highest interaction probability of the predicted user u at t time, S u Representing the historical interaction sequence, θ and λ representing the real and approximate parameters, σ, respectively, of the variational self-attention network λj Representation sigma λ J-th row, mu λj Represents u λ Is the j-th row of (2). By minimizing the loss function, the real parameter θ and the approximate parameter λ can be jointly optimized.
The sequence recommending device based on the variable self-attention network provided by the embodiment of the application is introduced below, and the sequence recommending device based on the variable self-attention network described below and the sequence recommending method based on the variable self-attention network described above can be correspondingly referred to each other.
The sequence recommending device based on the variation self-attention network in this embodiment, as shown in fig. 5, includes:
the embedding module 501: the input embedding matrix is used for generating an input embedding matrix according to the historical interaction sequence, and the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
the inference module 502: the input embedding matrix is used for inputting and deducing a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
parameterization module 503: for determining potential variables from the variation parameters using a re-parameterization method;
generating module 504: for generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;
prediction module 505: and the candidate item with the largest interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.
The sequence recommending device based on the variant self-attention network of the present embodiment is used to implement the sequence recommending method based on the variant self-attention network, so that the specific implementation in the device can be seen from the example parts of the sequence recommending method based on the variant self-attention network in the foregoing, for example, the embedding module 501, the deducing module 502, the parameterizing module 503, the generating module 504 and the predicting module 505 are respectively used to implement steps S201, S202, S203, S204 and S205 in the sequence recommending method based on the variant self-attention network. Therefore, the detailed description will be omitted herein with reference to the accompanying drawings, which illustrate examples of the respective parts.
In addition, since the sequence recommending device based on the variable-score self-attention network of the present embodiment is used for implementing the sequence recommending method based on the variable-score self-attention network, the function thereof corresponds to the function of the method described above, and the description thereof is omitted here.
In addition, the application also provides a sequence recommending device based on the variation self-attention network, as shown in fig. 6, comprising:
memory 100: for storing a computer program;
processor 200: for executing the computer program to implement the steps of the variant self-attention network based sequence recommendation method as described above.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the principles and embodiments of the application may be better understood, and in order that the present application may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (6)

1. A method for sequence recommendation based on a variational self-attention network, comprising:
generating an input embedding matrix according to a historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
determining potential variables according to the variation parameters by using a re-parameterization method;
generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;
determining candidate items with the largest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results;
wherein said embedding the input into a matrix input extrapolates from a self-attention network to obtain a self-attention vector, comprising:
determining an inferred projection result from the input embedding matrix and the inferred projection matrix using a projection layer in the inferred self-attention network;
generating a self-attention vector from the inferred projection results using a first predetermined number of self-attention blocks in the inferred self-attention networkWherein h is 1 For the first preset number, n represents a sequence length;
wherein said determining a variation parameter from said self-attention vector comprises:
from the self-attention vector, the mean and variance are determined as an approximate posterior distribution q λ (z|S u ) Wherein the mean value isThe variance is->l 1 (. Cndot.) represents a linear transformation, l 2 (. Cndot.) represents another linear transformation, lambda represents the approximate parameters of the variational self-attention network, S u Representing the historical interaction sequence, z representing the potential variable;
wherein the determining potential variables according to the variation parameters using the re-parameterization method comprises:
determining potential variables according to the variation parameters by using a re-parameterization method, wherein the potential variables are as follows: z=u λλ Epsilon, where epsilon represents a standard gaussian variable;
wherein said generating, with a self-attention network, a representation of said historical interaction sequence from said potential variables comprises:
determining a generated projection result according to the latent variable and the generated projection matrix by utilizing a projection layer in the generated self-attention network;
using a conditional distribution p based on generating a second preset number of self-attention blocks in the self-attention network θ (S u Z), generating a representation of the historical interaction sequence from the generated projection resultsWherein h is 2 For the second preset number, θ represents the actual parameter of the variable self-attention network.
2. The method of claim 1, wherein generating the input embedding matrix from the historical interaction sequence comprises:
generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix is as follows:where i.epsilon. (1, n), n denotes the sequence length, A i Item embedded information representing the ith item, P i The location embedded information representing the i-th item.
3. The method of claim 1, wherein determining candidate items with the greatest interaction probability as recommendation results according to the user preference representation using a prediction layer comprises:
according to a target formula, the candidate item with the largest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number to serve as a recommendation result, wherein the target formula is as follows:wherein->Representing the candidate item with the highest interaction probability of the predicted user u at time t,/for>Representing user preference representation +.>T line of>N represents the number of candidate items in the candidate item set, and d represents the vector dimension.
4. A method as claimed in any one of claims 1 to 3, further comprising:
optimizing real parameters and approximate parameters of the variational self-attention network according to a loss function, wherein the loss function is as follows:
the actual interaction item of user u at time t,representing predictedCandidate item with maximum interaction probability of user u at t time, S u Representing the historical interaction sequence, θ and λ representing the real and approximate parameters, σ, respectively, of the variational self-attention network λj Representation sigma λ J-th row, mu λj Represents u λ Is the j-th row of (2).
5. A variable self-attention network-based sequence recommendation device, comprising:
and (3) an embedding module: the input embedding matrix is used for generating an input embedding matrix according to the historical interaction sequence, and the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;
an inference module: the input embedding matrix is used for inputting and deducing a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;
and a parameterization module: for determining potential variables from the variation parameters using a re-parameterization method;
the generation module is used for: for generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;
and a prediction module: the candidate item with the largest interaction probability is determined according to the user preference representation by using a prediction layer to serve as a recommendation result;
the inference module is specifically configured to determine an inferred projection result according to the input embedding matrix and the inferred projection matrix by using a projection layer in the inferred self-attention network;
generating a self-attention vector from the inferred projection results using a first predetermined number of self-attention blocks in the inferred self-attention networkWherein h is 1 For the first preset number, n represents a sequence length;
wherein the inference module is specifically configured to, according to the self-attention vector,determining the mean and variance as an approximate posterior distribution q λ (z|S u ) Wherein the mean value isThe variance is->l 1 (. Cndot.) represents a linear transformation, l 2 (. Cndot.) represents another linear transformation, lambda represents the approximate parameters of the variational self-attention network, S u Representing the historical interaction sequence, z representing the potential variable;
the parameterization module is specifically configured to determine potential variables according to the variation parameters by using a re-parameterization method, where the potential variables are: z=u λλ Epsilon, where epsilon represents a standard gaussian variable;
the generation module is specifically configured to determine, by using a projection layer in the generated self-attention network, to generate a projection result according to the latent variable and the generated projection matrix;
using a conditional distribution p based on generating a second preset number of self-attention blocks in the self-attention network θ (S u Z), generating a representation of the historical interaction sequence from the generated projection resultsWherein h is 2 For the second preset number, θ represents the actual parameter of the variable self-attention network.
6. A variable self-attention network-based sequence recommendation device, comprising:
a memory: for storing a computer program;
a processor: steps for executing the computer program for implementing the variant self-attention network based sequence recommendation method according to any of the claims 1-4.
CN202010273754.XA 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network Active CN111506814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010273754.XA CN111506814B (en) 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010273754.XA CN111506814B (en) 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network

Publications (2)

Publication Number Publication Date
CN111506814A CN111506814A (en) 2020-08-07
CN111506814B true CN111506814B (en) 2023-11-28

Family

ID=71864057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010273754.XA Active CN111506814B (en) 2020-04-09 2020-04-09 Sequence recommendation method based on variational self-attention network

Country Status (1)

Country Link
CN (1) CN111506814B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446489B (en) * 2020-11-25 2023-05-05 天津大学 Dynamic network embedded link prediction method based on variation self-encoder
CN113160898B (en) * 2021-05-18 2023-09-08 北京信息科技大学 Iron-based alloy Gibbs free energy prediction method and system
CN113688315B (en) * 2021-08-19 2023-04-18 电子科技大学 Sequence recommendation method based on no-information-loss graph coding
CN114154071B (en) * 2021-12-09 2023-05-09 电子科技大学 Emotion time sequence recommendation method based on attention mechanism
CN117236198B (en) * 2023-11-14 2024-02-27 中国石油大学(华东) Machine learning solving method of flame propagation model of blasting under sparse barrier
CN117251295B (en) * 2023-11-15 2024-02-02 成方金融科技有限公司 Training method, device, equipment and medium of resource prediction model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359140A (en) * 2018-11-30 2019-02-19 苏州大学 A kind of sequence of recommendation method and device based on adaptive attention
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110245299A (en) * 2019-06-19 2019-09-17 中国人民解放军国防科技大学 Sequence recommendation method and system based on dynamic interaction attention mechanism

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190354839A1 (en) * 2018-05-18 2019-11-21 Google Llc Systems and Methods for Slate Optimization with Recurrent Neural Networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359140A (en) * 2018-11-30 2019-02-19 苏州大学 A kind of sequence of recommendation method and device based on adaptive attention
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism
CN110245299A (en) * 2019-06-19 2019-09-17 中国人民解放军国防科技大学 Sequence recommendation method and system based on dynamic interaction attention mechanism

Also Published As

Publication number Publication date
CN111506814A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
CN111506814B (en) Sequence recommendation method based on variational self-attention network
KR20180091850A (en) Augmenting neural networks with external memory
CN114297036B (en) Data processing method, device, electronic equipment and readable storage medium
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN112418482A (en) Cloud computing energy consumption prediction method based on time series clustering
CN110377707B (en) Cognitive diagnosis method based on depth item reaction theory
CN111369299A (en) Method, device and equipment for identification and computer readable storage medium
WO2021055442A1 (en) Small and fast video processing networks via neural architecture search
CN115983497A (en) Time sequence data prediction method and device, computer equipment and storage medium
CN111046655A (en) Data processing method and device and computer readable storage medium
CN116610218A (en) AI digital person interaction method, device and system
CN113065321B (en) User behavior prediction method and system based on LSTM model and hypergraph
Skalse et al. STARC: A General Framework For Quantifying Differences Between Reward Functions
CN115907000A (en) Small sample learning method for optimal power flow prediction of power system
CN112905166B (en) Artificial intelligence programming system, computer device, and computer-readable storage medium
CN115168722A (en) Content interaction prediction method and related equipment
WO2023155301A1 (en) Answer sequence prediction method based on improved irt structure, and controller and storage medium
CN115310004A (en) Graph nerve collaborative filtering recommendation method fusing project time sequence relation
KR20190129422A (en) Method and device for variational interference using neural network
CN111626472B (en) Scene trend judgment index computing system and method based on depth hybrid cloud model
CN112528015A (en) Method and device for judging rumor in message interactive transmission
CN114925808B (en) Anomaly detection method based on incomplete time sequence in cloud network end resource
CN115130669A (en) Model training method and device, electronic equipment and computer readable storage medium
CN117076930A (en) Training sample processing method, abnormal transaction detection method, device and equipment
Paniagua et al. Nonlinear system identification using modified variational autoencoders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant