CN111506814B

CN111506814B - Sequence recommendation method based on variational self-attention network

Info

Publication number: CN111506814B
Application number: CN202010273754.XA
Authority: CN
Inventors: 赵朋朋; 赵静; 周晓方; 崔志明
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2020-04-09
Filing date: 2020-04-09
Publication date: 2023-11-28
Anticipated expiration: 2040-04-09
Also published as: CN111506814A

Abstract

The application discloses a sequence recommendation method based on a variation self-attention network, which introduces a variation self-encoder into the self-attention network to capture potential preference of a user, on one hand, the obtained self-attention vector is expressed as density through variation inference, the variance of the self-attention vector can well represent uncertainty of the preference of the user, and on the other hand, the self-attention network is adopted to learn an inference process and a generation process of the variation self-encoder, so that the self-attention encoder can well capture long-term and short-term dependence, can better capture uncertainty and dynamics of the preference of the user, and improves the accuracy of recommendation results. In addition, the application also provides a sequence recommending device and equipment based on the variation self-attention network, and the technical effects of the sequence recommending device and equipment correspond to those of the method.

Description

Sequence recommendation method based on variational self-attention network

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for sequence recommendation based on a variation self-attention network.

Background

In the age of information explosion, recommendation systems play an increasingly important role. The key to the recommendation system is to be able to describe the interests and preferences of the user accurately, however these interests and preferences always change naturally and are filled with uncertainty. Sequential recommendations attempt to capture the dynamic preferences of users, and currently have become a very attractive topic in academia and industry.

In the relevant literature, researchers have proposed various methods to predict the next item that a user might like based on the user's historical interaction sequence. FPMC is a classical approach that combines markov chains and matrix factorization models linearly to capture user preferences. However, since the weights of the different components in this approach are linearly fixed, it is not sufficient to model advanced interactions. Inspired by deep learning, many have conducted intensive research into Recurrent Neural Networks (RNNs) and have been successful in sequence recommendations. However, these RNN-based models, even with advanced memory cell structures such as Long Short Term Memory (LSTM) and gate-controlled loop units (GRU), have difficulty maintaining long-term dependencies due to gradient vanishing problems. For example, khandelwal et al demonstrate that on average about 200 contextual labels can be applied using the language model of LSTM, but only 50 nearby labels can be clearly distinguished, indicating that even LSTM is difficult to capture long-term dependencies. Furthermore, the sequential nature of RNNs makes it necessary to learn to pass useful information step by step, which makes parallelization challenging.

In recent years, self-attention networks (SANs) have found great utility in many Natural Language Processing (NLP) tasks, such as machine translation, emotion analysis, and problem solutions. SANs also show good performance and efficiency in sequence recommendations compared to traditional RNNs and Convolutional Neural Networks (CNNs). For example, kang et al have proposed a self-attention sequence recommendation (SASRec) model to capture long-term and local dependencies of items in a sequence, which were previously typically modeled by RNNs and CNNs. Unlike RNN-based models, SASRec can capture long-term dependencies well because it can access any part of the history regardless of distance. However, all of the above models model the sequential behavior of the user using deterministic methods that typically treat the user's preferences as fixed-point vectors, and thus cannot characterize the uncertainty of the user's preferences without the constraint of error terms.

FIG. 1 is a schematic diagram of a deterministic recommendation method, in FIG. 1, u is a user representation, i ₁ ，i ₂ ，i ₃ ，i ₄ Are all item representations, dashed ellipses represent potential preferences of user u, where i ₁ ，i ₂ ，i ₃ The categories are different from each other, i ₁ ，i ₄ The categories are the same. As shown in FIG. 1, assume user u is associated with a sequence of items i ₁ And i ₂ Interactions are performed, u may be located i in the potential feature space (2D map) when learning the user's preferences using deterministic methods ₁ And i ₂ Is provided. If a recommendation is made based on the distance between u and the candidate item, it is possible to recommend item i to user u ₃ Rather than the real item i ₄ (category and i) ₁ Identical) because u and i ₃ Between which are locatedIs smaller. Thus, the fixed point representation cannot capture uncertainty and is prone to incorrect recommendations.

In summary, current sequential recommendation schemes represent the preferences of potential users as fixed points in the potential feature space, while fixed point vectors lack the ability to capture the uncertainty and dynamics of user preferences that are common in recommendation systems, with large limitations in capturing the user's potential preferences, resulting in inaccurate recommendation results.

Disclosure of Invention

The application aims to provide a sequence recommending method, device and equipment based on a variational self-attention network, which are used for solving the problem that an existing sequence recommending scheme cannot capture uncertainty and dynamic performance of user preference, so that a recommending result is inaccurate. The specific scheme is as follows:

in a first aspect, the present application provides a sequence recommendation method based on a variant self-attention network, including:

generating an input embedding matrix according to a historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;

the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;

determining potential variables according to the variation parameters by using a re-parameterization method;

generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;

and determining candidate items with the highest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results.

Preferably, the generating the input embedding matrix according to the historical interaction sequence includes:

generating an input embedding matrix according to the historical interaction sequence, wherein the input embedding matrix is as follows:where i.epsilon. (1, n), n denotes the sequence length, A _i Item embedded information representing the ith item, P _i The location embedded information representing the i-th item.

Preferably, the step of interpolating the input embedded matrix input to a self-attention network to obtain a self-attention vector includes:

determining an inferred projection result from the input embedding matrix and the inferred projection matrix using a projection layer in the inferred self-attention network;

generating a self-attention vector from the inferred projection results using a first predetermined number of self-attention blocks in the inferred self-attention networkWherein h is ₁ For the first preset number, n represents a sequence length.

Preferably, the determining a variation parameter according to the self-attention vector includes:

from the self-attention vector, the mean and variance are determined as an approximate posterior distribution q _λ (z|S ^u ) Wherein the mean value isThe variance is->l ₁ (. Cndot.) represents a linear transformation, l ₂ (. Cndot.) represents another linear transformation, lambda represents the approximate parameters of the variational self-attention network, S ^u Representing the historical interaction sequence, z represents the potential variable.

Preferably, the determining the potential variable according to the variation parameter by using a re-parameterization method includes:

determining potential variables according to the variation parameters by using a re-parameterization method, wherein the potential variables are as follows: z=u _λ +σ _λ ε, where ε represents the standard Gaussian variable.

Preferably, the generating, using a generated self-attention network, a representation of the historical interaction sequence from the potential variables includes:

determining a generated projection result according to the latent variable and the generated projection matrix by utilizing a projection layer in the generated self-attention network;

using a conditional distribution p based on generating a second preset number of self-attention blocks in the self-attention network _θ (S ^u Z), generating a representation of the historical interaction sequence from the generated projection resultsWherein h is ₂ For the second preset number, θ represents the actual parameter of the variable self-attention network.

Preferably, the determining, by the prediction layer, a candidate item with the largest interaction probability according to the user preference representation, as a recommendation result, includes:

according to a target formula, the candidate item with the largest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number to serve as a recommendation result, wherein the target formula is as follows:wherein the method comprises the steps ofRepresenting the candidate item with the highest interaction probability of the predicted user u at time t,/for>Representing user preference representationsT line of> N represents the number of candidate items in the candidate item set, and d represents the vector dimension.

Preferably, the method further comprises:

optimizing real parameters and approximate parameters of the variational self-attention network according to a loss function, wherein the loss function is as follows:

wherein y is ^(u,t) Representing the actual interaction item of user u at time t, < >>Representing the candidate item with the highest interaction probability of the predicted user u at t time, S ^u Representing the historical interaction sequence, θ and λ representing the real and approximate parameters, σ, respectively, of the variational self-attention network _λj Representation sigma _λ J-th row, mu _λj Represents u _λ Is the j-th row of (2).

In a second aspect, the present application provides a sequence recommendation device based on a variant self-attention network, including:

and (3) an embedding module: the input embedding matrix is used for generating an input embedding matrix according to the historical interaction sequence, and the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;

an inference module: the input embedding matrix is used for inputting and deducing a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;

and a parameterization module: for determining potential variables from the variation parameters using a re-parameterization method;

the generation module is used for: for generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;

and a prediction module: and the candidate item with the largest interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.

In a third aspect, the present application provides a sequence recommendation device based on a variant self-attention network, comprising:

a memory: for storing a computer program;

a processor: for executing the computer program to implement the steps of the variant self-attention network based sequence recommendation method as described above.

The application provides a sequence recommendation method based on a variation self-attention network, which comprises the following steps: according to the history interaction sequence, an input embedding matrix comprising item embedding information and position embedding information is included; the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector; determining potential variables according to the variation parameters by utilizing a re-parameterization method; generating a representation of the historical interaction sequence from the latent variables as a representation of user preferences using the generated self-attention network; and determining candidate items with the highest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results.

In summary, the method introduces the variational self-encoder into the self-attention network to capture the potential preference of the user, on one hand, the obtained self-attention vector is expressed as density through variational inference, the variance of the self-attention vector can well represent the uncertainty of the preference of the user, and on the other hand, the self-attention network is adopted to learn the reasoning process and the generating process of the variational self-encoder, so that the self-attention encoder can well capture long-term and short-term dependence, the uncertainty and the dynamics of the preference of the user can be better captured, and the accuracy of the recommendation result is improved.

In addition, the application also provides a sequence recommending device and equipment based on the variation self-attention network, and the technical effects of the sequence recommending device and equipment correspond to those of the method, and are not repeated here.

Drawings

For a clearer description of embodiments of the application or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the application, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of uncertainty provided by the present application for explaining that deterministic recommendation methods do not benefit very well from user preferences;

fig. 2 is a flowchart of an implementation of a first embodiment of a sequence recommendation method based on a variational self-attention network provided by the present application;

fig. 3 is a flowchart of a second embodiment of a sequence recommendation method based on a variational self-attention network according to the present application;

fig. 4 is a schematic diagram of a variable self-attention network according to a second embodiment of a sequence recommendation method based on a variable self-attention network provided by the present application;

FIG. 5 is a functional block diagram of an embodiment of a variable-score self-attention network-based sequence recommendation device according to the present application;

fig. 6 is a schematic structural diagram of an embodiment of a sequence recommendation device based on a variational self-attention network according to the present application.

Detailed Description

In order to better understand the aspects of the present application, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Currently, sequence recommendations have become an attractive topic in recommendation systems. Current sequence recommendation methods, including most advanced self-attention based methods, typically employ deterministic neural networks to represent the preferences of potential users as fixed points in a potential feature space. However, the fixed point vector lacks the ability to capture uncertainty and dynamics of user preferences that are common in recommendation systems, resulting in inaccurate recommendation results.

Aiming at the problem, the application provides a sequence recommending method, device and equipment based on a variation self-attention network, which introduce a variation self-encoder into the self-attention network to capture the potential preference of a user, can better capture the uncertainty and the dynamics of the preference of the user, and improves the accuracy of a recommending result.

First, the problem of the present application will be described. In this embodiment, the user is denoted as u= { U ₁ ,u ₂ ,...,u _M And record the item as x= { X } ₁ ,x ₂ ,...,x _N M and N represent the number of users and items, respectively. For each user U E U, ordering the interaction records of the user U according to time sequence, and obtaining the sequence interaction of the user U as followsWherein the method comprises the steps of|N ^u The i indicates the number of items accessed by user u. The application aims at providing the S ^u In the case of (1) by modeling S ^u To predict the next item that the user may like.

In response to the above problems, the present application, inspired by a variational self-encoder (Variational Auto Encoder, VAE), uses a variational self-attention network (VSAN) to implement sequence recommendations, hopefully to maximize the probability of the next item depending on the user's historical interaction sequenceWherein S is _t ^u The items that user u interacted with at time t are represented. Extending this target to the entire training set, the conditional probabilities of all interaction terms in all sequences are as follows:

then, the emphasis of the model becomes how to combine the probabilities p (S ^u ) Modeling is performed. Following VAE, the present application first assumes a continuous latent variable z, which is sampled from a standard normal distribution, i.e., z-N (0;I). Then, by conditional distribution p _θ (S ^u Z) to model a historical interaction sequence S ^u The condition distribution is parameterized by θ. Thus, joint probability p _θ (S ^u ) Can be specified by a marginal distribution as follows:

p _θ (S ^u )＝∫p _θ (S ^u |z)p _θ (z)。

in order to optimize the parameter θ, the best approach is to maximize the above equation. However, a true posterior distribution p _θ (z|S ^u ) Are often complex and difficult to resolve. Thus, the present application introduces a relatively simple posterior distribution q _λ (z|S ^u ) To approximate the true posterior distribution described above. This is inferred by means of a variation, where λ represents another set of parameters. For convenience of description, θ will be referred to as a real parameter, and λ will be referred to as an approximate parameter, and p will be correspondingly referred to _θ (z|S ^u ) Called true posterior distribution, and will q _λ (z|S ^u ) Known as approximate posterior distribution.

By derivation and recombination, the relationship between log-likelihood and the introduced posterior distribution is:

where KL represents the Kullback-Leibler divergence. The goal of the present application has thus far been transformed into two terms to maximize the right of the above equation, which are called lower bound evidence targets (ELBO).

Finally, the application models the VAE with the aid of two neural networks, namely inferring the self-attention network and generating the self-attention network, wherein the former is through q _λ (z|S ^u ) Based on S ^u To infer the diveAt vector z, the latter passes through p _θ (S ^u I z) generates a corresponding user representation based on the potential vector z, the learning process being controlled by the ELBO described above.

The following describes a sequence recommendation method embodiment of a variable-score self-attention network, referring to fig. 2, where the first embodiment includes:

s201, generating an input embedding matrix according to a historical interaction sequence, wherein the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;

the history interaction sequence refers to the project access records of the user in the past period, and the projects in the sequence are arranged according to the access sequence. In the embedding layer, the input embedding matrix includes item embedding information, and thus the input embedding matrix of the present embodiment further includes location embedding information in consideration of the fact that the self-attention network ignores location information of the history interaction sequence.

S202, the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;

inputting the input embedding matrix into the inferred self-attention network to obtain an approximate posterior distribution q _λ (z|S ^u ) Corresponding variation parameter u of (2) _λ Sum sigma _λ . Specifically, a self-attention vector is obtained through the projection layer and the plurality of self-attention blocks; then, a posterior distribution q is estimated from the self-attention vector _λ (z|S ^u ) The mean and variance are the variation parameters. Unlike the conventional self-attention network, in this embodiment, it is inferred that the self-attention network employs single-head, and in this embodiment, the number of self-attention blocks stacked by the self-attention network is inferred to be a first preset number.

S203, determining potential variables according to the variation parameters by utilizing a re-parameterization method;

specifically, according to q _λ (z|S ^u ) The latent variable z is sampled. However, dependent on u _λ Sum sigma _λ The sampling being an indeterminate function and not differentiable. Thus, the present embodiment utilizes a reparameterization technique, and then reparameters the latent variable z to u _λ Sum sigma _λ Is a function of (2).

S204, generating a representation of the historical interaction sequence according to the potential variable by using the generated self-attention network to serve as a user preference representation;

it should be noted that the above generation process focuses on the next item of the history interaction sequence, and as a preferred embodiment, may focus on a certain number of next items. Unlike the conventional self-attention network, in this embodiment, it is inferred that the self-attention network employs single-head, and in this embodiment, the number of self-attention blocks stacked by the self-attention network is inferred to be a second preset number.

And S205, determining candidate items with the largest interaction probability according to the user preference representation by utilizing a prediction layer to serve as recommendation results.

The present embodiment provides a sequence recommendation method based on a variational self-attention network, which firstly inputs the content embedded in the input into an inferred self-attention network, and expresses the obtained self-attention vector as density by applying Gaussian distribution, and uses the density to process uncertainty of user preference; next, obtaining corresponding potential variables according to the variation parameters output from the inferred self-attention network; then, to capture long-term and local dependencies of the user, another self-attention network is employed to model the generation process, generating a final user preference representation based on the latent variables; finally, the generated user preference representation is utilized to predict the next possible interactive item of the user.

The second embodiment of the sequence recommendation method based on the variation self-attention network provided by the application is realized based on the first embodiment, and is expanded to a certain extent based on the first embodiment.

Referring to fig. 3, the second embodiment specifically includes:

s301, calculating the item embedding and position embedding of each item according to the historical interaction sequence at an embedding layer to obtain an input embedding matrix;

in this embodiment, the input includes item embedding and location embedding. First, the history sequence of the user is setSequences converted to a fixed length +.>Where n represents the maximum sequence length that the variational self-attention network can model. Generating a sequence of n interaction records, constructing a continuous item embedding matrixThen obtain the input embedding matrix +.>Wherein d represents the embedding dimension and->Furthermore, a position matrix which can be learned is +.>Added to the input matrix a as the final input embedment.

In summary, the input embedding matrix of the present embodiment includes item embedding information and position embedding information of each item in the history interaction sequence. Specifically, the input embedding matrix is:

where i.epsilon. (1, n), n denotes the sequence length, A _i Item embedded information representing the ith item, P _i The location embedded information representing the i-th item.

S302, the input embedding matrix is input to infer a self-attention network to obtain a self-attention vector, and a variation parameter of approximate posterior distribution is determined according to the self-attention vector, wherein the inferred self-attention network comprises a first preset number of self-attention blocks;

after the final input embedding matrix is obtained, it is input into the inferred self-attention network to output a posterior distribution q _λ (z|S ^u ) Is used for the transformation of the corresponding variation parameters. The left side of fig. 4 demonstrates a specific structure of the inferred self-attention network. The self-attention network is defined as follows:

in the above-mentioned method, the step of,wherein->Representing the projection matrix (for purposes of distinguishing the description, the projection matrix herein is referred to as an inferred projection matrix, and the projection matrix generated from the attention network is referred to hereinafter as a generated projection matrix). In order to propagate low-level features to higher levels, the present embodiment applies the remaining connections in the network; then, in order to make training of the neural network fast and stable, layer normalization is adopted; in addition, two-layer fully connected networks with ReLU activation functions are used to take into account interactions between different potential dimensions and to make the network non-linear in capability. Finally, the whole inference process is as follows:

E＝LayerNorm(D+I)，

F＝ReLU(EW ₁ +b ₁ )W ₂ +b ₂ ，

G _i ＝LayerNorm(F+E)，

wherein W is ₁ ,W ₂ ,b ₁ ,b ₂ Are network parameters. For convenience and simplicity, the entire self-care network described above is defined as:

G _i ＝SAN(I)。

through the above process, G _i Essentially integrating the embedding of all previous items. In order to capture the more complex conversion of items,can be stacked by a first preset amount h ₁ The following self-attention blocks:

wherein the method comprises the steps of

Then, a posterior distribution q is estimated from the final self-attention vector _λ (z|S ^u ) The mean and variance are as follows:

wherein l ₁ (. Cndot.) represents a linear transformation, l ₂ (. Cndot.) represents another linear transformation. Thus, in this manner, deterministic self-attention vectorsThe variance of the gaussian distribution, which corresponds to a gaussian distribution rather than a conventional fixed point, can well capture the uncertainty of the user's preferences.

S303, determining potential variables according to the variation parameters by utilizing a re-parameterization method;

specifically, the latent variables are: z=u _λ +σ _λ Epsilon, where epsilon represents a standard gaussian variable, which acts to introduce noise.

S304, generating a representation of the historical interaction sequence according to the potential variables and the real condition distribution by utilizing the generated self-attention network to serve as a user preference representation;

to define the generation process, the present embodiment also uses a generated self-attention network for p-based _θ (S ^u Z) generates the corresponding S ^u . Based on the latent variable z, p _θ (S ^u Z) is expressed as:

wherein,is a representation of the user preferences that is generated from the final output of the attention layer. The structure of the generated self-attention network is shown on the right side of fig. 4, and the projection parameters are as follows:

wherein->And->Representing the projection matrix. Since the generation of the self-attention network and the inference of the self-attention network differ only in the input part, a detailed description is omitted here.

To sum up, the process of S304 is as follows: determining a generated projection result according to the latent variable and the generated projection matrix by utilizing a projection layer in the generated self-attention network; using a conditional distribution p based on generating a second preset number of self-attention blocks in the self-attention network _θ (S ^u Z), generating a representation of the historical interaction sequence from the generated projection resultsWherein h is ₂ For the second preset number, θ represents the actual parameter of the variable self-attention network.

It should be noted that the generation process described above focuses only on the next item in the user history sequence. Preferably, the next k items can be addressed. The most straightforward approach would be toConsidered as a time ordered multi-set:

for distinguishing, in this embodiment, the output of the inferred self-attention network is written asThe output generated from the attention network is noted +.>Wherein->The subscript i of (1) has no practical meaning, but takes the initial of reference (inferred), ++>The upper corner of (a) indicates the number of self-attention blocks inferred in the self-attention network; />The subscript g of (1) has no practical meaning, but takes the initials of general (generated),>the upper corner of (a) indicates the number of self-attention blocks generated in the self-attention network.

S305, determining candidate items with the largest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results;

according to a target formula, the candidate item with the largest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number to serve as a recommendation result, wherein the target formula is as follows:

wherein the method comprises the steps ofRepresenting the candidate item with the highest interaction probability of the predicted user u at time t,/for>Representing user preference representation +.>T line of>N represents the number of candidate items in the candidate item set, and d represents the vector dimension.

S306, in the evaluation stage, optimizing real parameters and approximate parameters of the variation self-attention network according to the loss function.

During the evaluation phase, the mean value of the variation distribution (i.e., u _λ ) As S ^u Is z. Following the evidence lower bound ELBO described earlier, the loss function of the variant self-attention network in this example is:

wherein y is ^(u,t) Representing the actual interaction item of user u at time t,representing the candidate item with the highest interaction probability of the predicted user u at t time, S ^u Representing the historical interaction sequence, θ and λ representing the real and approximate parameters, σ, respectively, of the variational self-attention network _λj Representation sigma _λ J-th row, mu _λj Represents u _λ Is the j-th row of (2). By minimizing the loss function, the real parameter θ and the approximate parameter λ can be jointly optimized.

The sequence recommending device based on the variable self-attention network provided by the embodiment of the application is introduced below, and the sequence recommending device based on the variable self-attention network described below and the sequence recommending method based on the variable self-attention network described above can be correspondingly referred to each other.

The sequence recommending device based on the variation self-attention network in this embodiment, as shown in fig. 5, includes:

the embedding module 501: the input embedding matrix is used for generating an input embedding matrix according to the historical interaction sequence, and the input embedding matrix comprises item embedding information and position embedding information of each item in the historical interaction sequence;

the inference module 502: the input embedding matrix is used for inputting and deducing a self-attention network to obtain a self-attention vector, and a variation parameter is determined according to the self-attention vector;

parameterization module 503: for determining potential variables from the variation parameters using a re-parameterization method;

generating module 504: for generating a representation of the historical interaction sequence from the potential variables as a user preference representation using a generated self-attention network;

prediction module 505: and the candidate item with the largest interaction probability is determined by utilizing a prediction layer according to the user preference representation to serve as a recommendation result.

The sequence recommending device based on the variant self-attention network of the present embodiment is used to implement the sequence recommending method based on the variant self-attention network, so that the specific implementation in the device can be seen from the example parts of the sequence recommending method based on the variant self-attention network in the foregoing, for example, the embedding module 501, the deducing module 502, the parameterizing module 503, the generating module 504 and the predicting module 505 are respectively used to implement steps S201, S202, S203, S204 and S205 in the sequence recommending method based on the variant self-attention network. Therefore, the detailed description will be omitted herein with reference to the accompanying drawings, which illustrate examples of the respective parts.

In addition, since the sequence recommending device based on the variable-score self-attention network of the present embodiment is used for implementing the sequence recommending method based on the variable-score self-attention network, the function thereof corresponds to the function of the method described above, and the description thereof is omitted here.

In addition, the application also provides a sequence recommending device based on the variation self-attention network, as shown in fig. 6, comprising:

memory 100: for storing a computer program;

processor 200: for executing the computer program to implement the steps of the variant self-attention network based sequence recommendation method as described above.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the principles and embodiments of the application may be better understood, and in order that the present application may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for sequence recommendation based on a variational self-attention network, comprising:

determining candidate items with the largest interaction probability according to the user preference representation by using a prediction layer to serve as recommendation results;

wherein said embedding the input into a matrix input extrapolates from a self-attention network to obtain a self-attention vector, comprising:

generating a self-attention vector from the inferred projection results using a first predetermined number of self-attention blocks in the inferred self-attention networkWherein h is ₁ For the first preset number, n represents a sequence length;

wherein said determining a variation parameter from said self-attention vector comprises:

from the self-attention vector, the mean and variance are determined as an approximate posterior distribution q _λ (z|S ^u ) Wherein the mean value isThe variance is->l ₁ (. Cndot.) represents a linear transformation, l ₂ (. Cndot.) represents another linear transformation, lambda represents the approximate parameters of the variational self-attention network, S ^u Representing the historical interaction sequence, z representing the potential variable;

wherein the determining potential variables according to the variation parameters using the re-parameterization method comprises:

determining potential variables according to the variation parameters by using a re-parameterization method, wherein the potential variables are as follows: z=u _λ +σ _λ Epsilon, where epsilon represents a standard gaussian variable;

wherein said generating, with a self-attention network, a representation of said historical interaction sequence from said potential variables comprises:

2. The method of claim 1, wherein generating the input embedding matrix from the historical interaction sequence comprises:

3. The method of claim 1, wherein determining candidate items with the greatest interaction probability as recommendation results according to the user preference representation using a prediction layer comprises:

according to a target formula, the candidate item with the largest interaction probability in the predicted candidate item set is represented by a prediction layer based on the user number to serve as a recommendation result, wherein the target formula is as follows:wherein->Representing the candidate item with the highest interaction probability of the predicted user u at time t,/for>Representing user preference representation +.>T line of>N represents the number of candidate items in the candidate item set, and d represents the vector dimension.

4. A method as claimed in any one of claims 1 to 3, further comprising:

the actual interaction item of user u at time t,representing predictedCandidate item with maximum interaction probability of user u at t time, S ^u Representing the historical interaction sequence, θ and λ representing the real and approximate parameters, σ, respectively, of the variational self-attention network _λj Representation sigma _λ J-th row, mu _λj Represents u _λ Is the j-th row of (2).

5. A variable self-attention network-based sequence recommendation device, comprising:

and a prediction module: the candidate item with the largest interaction probability is determined according to the user preference representation by using a prediction layer to serve as a recommendation result;

the inference module is specifically configured to determine an inferred projection result according to the input embedding matrix and the inferred projection matrix by using a projection layer in the inferred self-attention network;

wherein the inference module is specifically configured to, according to the self-attention vector,determining the mean and variance as an approximate posterior distribution q _λ (z|S ^u ) Wherein the mean value isThe variance is->l ₁ (. Cndot.) represents a linear transformation, l ₂ (. Cndot.) represents another linear transformation, lambda represents the approximate parameters of the variational self-attention network, S ^u Representing the historical interaction sequence, z representing the potential variable;

the parameterization module is specifically configured to determine potential variables according to the variation parameters by using a re-parameterization method, where the potential variables are: z=u _λ +σ _λ Epsilon, where epsilon represents a standard gaussian variable;

the generation module is specifically configured to determine, by using a projection layer in the generated self-attention network, to generate a projection result according to the latent variable and the generated projection matrix;

6. A variable self-attention network-based sequence recommendation device, comprising:

a memory: for storing a computer program;

a processor: steps for executing the computer program for implementing the variant self-attention network based sequence recommendation method according to any of the claims 1-4.