CN114741590A

CN114741590A - Multi-interest recommendation method based on self-attention routing and Transformer

Info

Publication number: CN114741590A
Application number: CN202210312759.8A
Authority: CN
Inventors: 陈莉; 殊金鹏; 陈培榕; 高涵; 郝星星; 明亮亮; 李文强
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-12
Anticipated expiration: 2042-03-28
Also published as: CN114741590B

Abstract

The invention discloses a multi-interest recommendation method based on self-attention routing and a Transformer, which comprises the following steps: acquiring a public user historical behavior sequence, preprocessing the public user historical behavior sequence to obtain a vector containing project basic information and behavior occurrence time, and converting the project vector and a corresponding position vector into a fixed-dimension low-dimension vector through an Embedding technology; capturing the sequential dependency relationship in the historical behavior sequence of each user by using a Transformer model, and obtaining a behavior vector containing more effective information according to the item vector and the position vector processed by the Embedding technology; extracting a plurality of interest vectors of a user from a large number of behavior vectors; obtaining the interaction probability of the user and the target object by determining the relationship between each interest vector and the target object; learning parameters of the model by minimizing a loss function; and by calculating the similarity, N items which are most likely to be interested by the user are retrieved from the to-be-selected item pool by utilizing a plurality of interest vectors of the user, and are recommended to the to-be-selected item pool.

Description

Multi-interest recommendation method based on self-attention routing and Transformer

Technical Field

The invention belongs to the technical field of images, relates to a personalized recommendation algorithm, and particularly relates to a multi-interest recommendation method based on self-attention routing and a transform.

Background

With the continuous progress of information technology and the increasing number of netizens, the amount of information carried by the internet is also increasing explosively, thus causing a serious information overload problem. Information overload prevents information consumers from quickly retrieving the really needed information from information that is too expensive, and information producers cannot smoothly present their excellent products in the field of view of the masses. To alleviate this problem, scholars developed epoch-making personalized recommendation systems. The personalized recommendation system analyzes the interest preference of the user according to the basic information, the historical behaviors and the item information of the user, recommends the articles or information which the user is interested in to the user, and can also adjust the recommended items in time when the interest of the user changes. The personalized recommendation system is beneficial to increasing the income of enterprises and promoting the social prosperity and development while saving the time cost of the user and improving the life quality of the user, and has important practical significance for individuals and society.

After an image classification network AlexNet is proposed from the image field, the research level of deep learning is endless, and the research idea of a recommendation system is also influenced by the infinite research level. Zhou et al propose a deep interest network DIN that weights the behavior vectors by a model through a mechanism of attention to distinguish the importance of different items to the user's interest. However, the user's interests are dynamically changing and it is not sufficient to capture only the user's static interests. Zhou et al have also proposed a deep interest evolution network based on DIN, and this model captures the dynamic changes of user interests using GRU, thereby extracting more accurate user interests. Although the history behavior information such as the comments of the user includes rich preference features, which helps to reduce the deviation caused by different popularity of each item, not every piece of history behavior information can play such a role. DIEN can dig deep-level user preference information, but the recommendation result is influenced by the popularity of the item to be recommended. Volkovs et al propose an auto-encoder recommendation model using double-headed attention, which can effectively combine two input sources of user comments and implicit feedback information to further reduce recommendation deviation brought by article popularity.

With the development of society, the interests of users are more and more extensive, the expression capacity of a single vector is very limited, and diversified interest characteristics of the users cannot be accurately expressed. Aiming at the problem, Lichao et al provides a multi-interest network MIND based on a dynamic routing algorithm, wherein the MIND utilizes the dynamic routing algorithm and a capsule network to extract a plurality of interest vectors from a historical behavior sequence of a user to represent different interests of the user, and the problem that the interest of the user cannot be accurately expressed by a single vector is effectively solved. However, the dynamic routing algorithm is not completely suitable for the scenario of extracting the user interest, and the iteration number of the routing algorithm needs to be specified manually, for example, the user interest may not be accurately represented if the iteration number is improperly set. Furthermore, the lack of specific extraction of sequence information in the MIND model may result in inefficient use of the sequence information.

Disclosure of Invention

Aiming at the technical problem that the user interest cannot be accurately expressed because the traditional personalized recommendation algorithm only uses a single vector, the invention aims to provide a multi-interest recommendation method based on self-attention routing and a Transformer.

In order to realize the task, the invention adopts the following technical scheme:

a multi-interest recommendation method based on self-attention routing and a Transformer is characterized in that the method comprises the following steps. The method specifically comprises the following steps:

step 1, obtaining a public user historical behavior sequence, and preprocessing the public user historical behavior sequence to obtain a vector containing project basic information and behavior occurrence time;

step 2, converting the project vectors and the corresponding position vectors into fixed-dimension low-dimension vectors through an Embedding technology;

step 3, capturing the sequential dependency relationship in the historical behavior sequence of each user by using a Transformer model, and obtaining a behavior vector containing more effective information according to the item vector and the position vector processed by the Embedding technology;

step 4, extracting a plurality of interest vectors of the user from a large number of behavior vectors;

step 5, obtaining the interaction probability of the user and the target object by determining the relation between each interest vector and the target object;

step 6, learning the parameters of the model by minimizing a loss function;

and 7, searching N articles which are most likely to be interested by the user from the article pool to be selected by calculating the similarity and utilizing a plurality of interest vectors of the user, and recommending the N articles.

Specifically, the historical behavior sequence in step 1 is formed according to information such as item ID, category, and behavior occurrence time of past behaviors of the user;

the pretreatment process comprises the following steps: arranging all historical behavior data according to the occurrence time sequence, and converting the historical behavior data into a vector form by using an One-hot coding mode, wherein the vector form comprises a project vector and a position vector. The item vector is used for representing items interacted by the user in actions of clicking, purchasing and the like each time, and comprises information such as an ID, a category, an amount and the like of the item; the position vector is used to represent the position of the secondary behavior in the sequence of user behaviors.

Further, the Embedding technique described in step 2 employs an Item2vec model, and the input of the Item2vec model can be various sequences of user's purchase record, viewing record, etc., which can map any Item to a corresponding Embedding.

Further, the step 4 adaptively clusters the behavior vector of the user into a plurality of interest vectors using a modified self-attention routing algorithm in combination with the capsule network, each interest vector representing an aspect of the user's interest.

Further, the obtaining of the probability of the user interacting with the target item in step 5 is calculated using a sample softmax function.

Further, the expression of the loss function described in step 6 is:

in the formula, U represents a user set, I represents an article set, v_uRepresenting the interest vector, e_iAn Embedding vector representing the target item.

Compared with the prior art, the multi-interest recommendation method based on the self-attention routing and the Transformer brings technical innovation that:

1. by using the capsule network and the self-attention routing algorithm, a plurality of interest vectors of the user are extracted from the historical behavior of the user, information loss caused by the fact that a single vector is used for representing various interests of the user is avoided, and diversity and accuracy of recommendation results can be improved.

2. The sequence relation among the user behaviors is mined by using the Transformer model, and the accuracy of the recommendation system can be effectively improved because the Transformer model has natural advantages when processing the sequence data.

Drawings

FIG. 1 is a model diagram of a multi-interest recommendation method based on self-attention routing and a Transformer according to the present invention;

FIG. 2 is a block diagram of the transform model encoder portion;

FIG. 3 is a diagram of a capsule network architecture;

FIG. 4 is a block diagram of a self-attention routing algorithm;

the invention is further explained below with reference to the figures and examples.

Detailed Description

Referring to fig. 1, the embodiment provides a multi-interest recommendation method based on self-attention routing and a Transformer, which excavates a sequence relationship in a user historical behavior record through the Transformer, and then extracts a plurality of interest vectors from the behavior vectors by using an improved self-attention routing algorithm and a capsule network to express a plurality of interests of a user, thereby implementing multi-interest recommendation, and specifically includes the following steps:

according to a general data processing manner, the preprocessing process in this embodiment is: and arranging all historical behavior data according to the sequence of occurrence time, and converting the historical behavior data into a vector form by using a One-hot coding mode, wherein the vector form comprises a project vector and a position vector. The item vector is used for representing items interacted by the user in actions of clicking, purchasing and the like each time, and comprises information such as an ID, a category, an amount and the like of the item; the position vector is used to represent the position of the secondary behavior in the sequence of user behaviors.

The location vector is calculated using the difference between the recommended time and the user's historical behavior time, and the calculation formula is as follows:

in the formula (I), the compound is shown in the specification,

representing the time at which the algorithm makes recommendations for the user,

representing the occurrence time of the user behavior vi.

Step 2, converting the Item vector and the corresponding position vector into a low-dimensional vector with fixed dimensionality by using an Embedding technology, and carrying out Embedding operation by using an Item2vec model;

let the historical sequence of rows of length K be w₁，w₂，…，w_kThen Item2vec modelThe optimization objective of (a) is the following formula:

and 3, capturing the sequence relation among the historical behaviors:

the present embodiment uses the encoder part of the transmomer model to capture the sequential dependency relationship in the historical behavior sequence of each user, and the structure of the transmomer model encoder is shown in fig. 2, and it mainly consists of a multi-head self-attention layer and a feed-forward network layer. The specific process is as follows:

(1) multi-head self-attention layer

Embedding the user historical behaviors into a plurality of subspaces by using a multi-head self-Attention layer to respectively carry out scaling dot product Attention, splicing results obtained by scaling the dot product Attention for multiple times, and finally obtaining a multi-head self-Attention result through linear transformation.

The use of the multi-head self-attention layer enables the model to learn important information in a plurality of different subspaces respectively, and the capability of the model in learning various contents is enhanced. The specific calculation formula is as follows:

S＝Concat(head₁,head₂,...,head_h)W^H

head_i＝Attention(EW^Q,EW^K,EW^V)

wherein, W^Q、W^K、W^VFor the weight matrix, each of the self-attentional heads has a set of corresponding weight matrices, which are initialized randomly. E is a matrix consisting of behavior embedding and position embedding of all historical behaviors, and h represents the number of heads of multi-head self-attention.

The method for calculating the scaling dot product Attention used by the multi-head self-Attention layer is as follows:

wherein Q represents a query matrix, and K and V representAll of the keys and the values are used,

to scale the parameters in the dot product attention, it helps to mitigate the effect of the vector dimensions on the attention weight.

(2) Feedforward network layer

In order to enhance the nonlinear expression capability of the model, a point type feedforward neural network is added behind a multi-head self-attention layer. The specific calculation method is as follows:

F＝FFN(S)

in the formula, S represents the calculation result of the multi-head self-attention layer, and F represents the final result of the feedforward network layer.

The model uses Dropout techniques at both the multi-head self-attention layer and the feed-forward network layer. By temporarily discarding some neurons with a certain probability in each training batch of the neural network, the overfitting phenomenon can be effectively prevented, and the generalization capability of the model is improved. The following is the calculation formula for the final output of the multi-headed self-attention layer and the feedforward network layer:

S'＝LayerNorm(S+Dropout(S))

F＝LayerNorm(S'+Dropout(LeakyReLU(S'W⁽¹⁾+b⁽¹⁾)W⁽²⁾+b⁽²⁾))

in the formula, W⁽¹⁾And W⁽²⁾Weight representing interlayer connection line, b⁽¹⁾And b⁽²⁾Representing the bias of the layers.

LayerNorm represents a normalization operation, which can enhance the stability of feature distribution and accelerate the convergence rate of the model. The LeakyReLU activation function can also find the gradient when the input is less than zero, and the output is also 0 when the input is less than 0 in the ReLU activation function.

Step 4, extracting a plurality of interest vectors of the user from a large amount of behavior vectors:

the present embodiment utilizes a capsule network in conjunction with a self-attention routing algorithm to cluster the historical behavior of the user into a plurality of vectors, each vector representing some aspect of the user's interest.

Fig. 3 is a capsule network model, which is composed of a convolution layer, a primary capsule layer and an advanced capsule layer. The capsule network is made up of a series of capsules, each of which is in turn made up of a set of neurons. The first two layers are only used as tools for the high-level capsule layer to interact with the input pictures, and because the input data of the invention is a user interaction sequence, only the high-level capsule layer is used for forming the capsule network.

The self-attention routing algorithm is responsible for updating data between capsule layers, and the overall structure of the algorithm is shown in fig. 4.

The input vector of the algorithm is primary capsules containing basic features, for each primary capsule, the primary capsule is multiplied by a weight matrix to obtain a prediction vector so as to complete affine transformation between two adjacent capsule layers, and the calculation formula of the prediction vector is as follows:

wherein, V^lNamely an input matrix containing the historical behavior record information of the user. W^lI.e. a weight matrix of l layers with a dimension of (n)^l，n^l+1，d^l，d^l+1) Wherein n is^lIs the number of capsules of the first layer, d^lIs the vector dimension of the l layers of capsules.

And obtaining an output vector of the capsule network by performing weighted summation on all the prediction vectors, wherein the calculation formula is as follows:

in the above formula p^lIs a logarithmic prior matrix with dimension (n)^l，n^l+1) The weight values contained in the weight learning table and other weight values are learned differently at the same time, and certain deviation is given to capsules which are more related to other capsules, so that the overall performance of all capsules is balanced. C^lIs a coupling coefficient matrix obtained by a self-attention algorithm with a dimension of (n)^l，n^l+1) It is responsible for distributing each low-level capsule according to a certain proportionThe capsule corresponding to the entity to which they belong is also the most important function of the self-attention routing algorithm.

C^lThe calculation formula of (a) is as follows:

wherein A is^lThe attention matrix is used for calculating attention scores of prediction vectors of l layers so as to obtain the output of the capsule network, and the attention moment matrix is calculated by adopting the zoom dot product attention, and the calculation formula is as follows:

wherein the content of the first and second substances,

the method is beneficial to the model to deal with the condition with a large attention score, and can balance the performances of the logarithm prior matrix and the coupling coefficient matrix.

The modular length of the output vector from the attention routing algorithm is not necessarily less than 1, resulting in a probability that it is not representative of the presence of the object represented by the capsule. Therefore, the present embodiment introduces a compression function to compress the modular length of the output vector to make it located in the [0, 1] interval, and the specific calculation formula is as follows:

the compression function only compresses the length of the vector, so that the length of the short vector is close to 0, the length of the long vector is close to 1, the direction of the vector is not changed, and a plurality of compressed output vectors can express the interest of a user.

after obtaining a plurality of interest vectors, for the current target project, performing argmax operation to find out the interest vector most relevant to the target project from all interest vectors of the user, wherein the calculation formula is as follows:

wherein e is_iIndicating the embedding of the target item, which is calculated from the sequence information layer. V_uIs an interest matrix, consisting of all the interest vectors of the users.

Step 6, learning the parameters of the model by minimizing the loss function:

for user vector v_uAnd target item e_iThe probability of user u interacting with the target item should be maximized, and for computational cost considerations, a sample softmax function is used to calculate the probability of user interaction with the target item, and the proposed model is trained by minimizing the following objective function:

in the formula, U represents a user set, I represents an article set, v_uRepresenting the interest vector, e_iThe Embedding vector representing the target item.

Step 7, searching N articles which are most likely to be interested by the user from the article pool to be selected by calculating the similarity and utilizing a plurality of interest vectors of the user, and recommending the N articles;

for each interest vector extracted in step 4, Z similar items of the interest can be retrieved from the candidate item library by the nearest neighbor method. If M interests are extracted, a total of M × Z candidates can be retrieved. The present embodiment aims to perform TopN recommendation for a user, that is, to select N items from a candidate item library that are most likely to interact with the user for recommendation, so that the top N items with the highest similarity to the user interest are found from M × Z items as a final recommendation list.

The above description is only for the preferred embodiment of the present invention, but the present invention is not limited to the above embodiment. The technical solutions of the present invention should be added or substituted with other technical features by those skilled in the art on the premise of the technical solutions disclosed in the present invention, and the technical solutions should belong to the protection scope of the present invention.

Claims

1. A multi-interest recommendation method based on self-attention routing and a Transformer is characterized by comprising the following steps:

step 6, learning the parameters of the model by minimizing a loss function;

2. The method of claim 1. The method is characterized in that the historical behavior sequence in the step 1 is formed according to information such as item ID, category, behavior occurrence time and the like of behaviors which are generated by a user in the past;

the pretreatment process comprises the following steps: and arranging all historical behavior data according to the sequence of occurrence time, and converting the historical behavior data into a vector form by using a One-hot coding mode, wherein the vector form comprises a project vector and a position vector. The item vector is used for representing items interacted by the user in actions of clicking, purchasing and the like each time, and comprises information such as an ID, a category, an amount and the like of the item; the position vector is used to represent the position of the sub-action in the sequence of user actions.

3. The method of claim 1, wherein the Embedding technique of step 2 employs an Item2vec model whose inputs are various sequences of user purchase records, viewing records, etc., which can map any Item to a corresponding Embedding.

4. The method of claim 1, wherein step 4 is performed by adaptively clustering the behavior vectors of the user into a plurality of interest vectors, each representing an aspect of the user's interest, using a modified self-attention routing algorithm in conjunction with the capsule network.

5. The method of claim 1, wherein the obtaining the probability of user interaction with the target item in step 5 is calculated using a sample softmax function.

6. The method of claim 1, wherein the minimizing loss function of step 6 is expressed by:

in the formula, U represents a user set, I represents an article set, v_uRepresenting an interest vector, e_iAn Embedding vector representing the target item.